Nalanda

October 30, 2008

P2P systems and the Chord protocol

Filed under: Networks — Tags: , , , — Ashwin @ 1:19 pm

Citations:

Balakrishnan, H., Kaashoek, M. F., Karger, D., Morris, R., and Stoica, I. 2003. Looking up data in P2P systems. Commun. ACM 46, 2 (Feb. 2003), 43-48. DOI=http://doi.acm.org/10.1145/606272.606299

Stoica, I., Morris, R., Karger, D., Kaashoek, M. F., and Balakrishnan, H. 2001. Chord: A scalable peer-to-peer lookup service for internet applications. In Proceedings of the 2001 Conference on Applications, Technologies, Architectures, and Protocols For Computer Communications (San Diego, California, United States). SIGCOMM ‘01. ACM, New York, NY, 149-160. DOI=http://doi.acm.org/10.1145/383059.383071

These two papers deal with the problem of storing and locating data in P2P systems, the first surveying a number of different systems, and the second describing the Chord system in detail.

The principal question that the first paper addresses is that of lookup: how is data to be located in a P2P system efficiently, lacking any centralised index or hierarchy? The authors take us through a number of different approaches to lookup, mainly structured and symmetric lookup.

In a structured lookup system, data may be stored in a central server, or a hierarchy may be imposed to provide scalability. These systems can provide guarantees for location of data, but are vulnerable to the failure of the central database, or of higher-level nodes. In addition, a disproportionately high percentage of load on the system may have to be borne by higher level nodes.

In a symmetric lookup system, no node is more important than any other, and each node participates in handling a fraction of the total load on the system. These vary from systems where lookup requests are broadcast to all neighbours (introducing significant overhead, and scaling problems), to others where “superpeers” are used to introduce some hierarchy to the system to scale up more efficiently. However, more significant superpeers become points of failure, and such systems provide no guarantees for lookup.

Modern P2P algorithms, including CAN, Chord, Kademlian, Pastry, Tapestry and Viceroy, all implement the Distributed Hash Table (DHT) abstraction, with implementations driven by considerations of scaling well to large numbers of nodes, low latency lookup, efficient handling of dynamic node arrival and departure and an even distribution of keys amongst nodes.

A DHT identifies both keys and nodes using a common hash function. Keys are then stored on nodes “closer” to them; “closeness” is evaluated using a function that compares the key and node identifiers. When a node receives a lookup request, it needs sufficient information to forward the request on to a node which is closer to the requested key. Routing table to maintain this information may be one dimensional (implemented a skip list in Chord, and as a tree in Pastry, Tapestry and Kademlia), or multi-dimensional (as in CAN). The multidimensional approach is interesting: I could imagine this being used for a data retrieval system, where queries may be run to locate items based on arbitrary combinations of dimensions.

The discussion of the unidirectionality and symmetry properties of the different algorithms was interesting, but also a little confusing: why, for instance, do symmetric protocols not require stabilization routines? I particularly liked the way in which malicious activity can be dealt with: since all these algorithms can easily perform cross-checks to verify correct routing and so identify malicious nodes, the worst that can happen is that a malicious node denies the existence of available data.

The second paper describes an implementation of Chord, which the authors suggest has advantages of simplicity, provable correctness, and provable performance over other P2P lookup protocols. The Chord implementation uses SHA-1 hashing to generate identifiers for keys and nodes, proceeding on the assumption that given a sufficient number of nodes, there will be a roughly equal distribution of keys over nodes. Nodes form their routing (”finger”) tables by maintaining information about successor nodes: each successor node is at increasingly exponential distances from the source node. This creates the interesting property that a node “knows” more of its closer neighbours than its distant neighbours. Keys are assigned to the first nodes with an identifier which is equal to, or follows the key. By maintaining a relatively small amount of information (about O(log N) successors), nodes can efficiently route lookup requests to the node holding the key.

When a node joins or leaves Chord, two properties must be preserved: each node’s successor must be correctly maintained, and keys must be distributed properly amongst nodes. Chord cannot transfer keys itself; it only notifies application software that this transfer is necessary, so that the application layer can take care of also transferring associated data. Chord periodically runs a stabilization protocol to update the finger tables for all nodes.

Since keys are uniquely associated with nodes, applications are also responsible for maintaining replicas of data, by associating them with different keys. I’m not sure if this is a reasonable expectation, since this means that the application layer would have to be aware of the key distribution across nodes, to ensure that replicas are actually stored on different nodes.

In simulations, it was found that mean path length increases logarithmically with the number of nodes, as expected. Under simultaneous node failure conditions, the lookup failure rate is proportional to the fraction of failed nodes, again as expected, since these would map to the fraction of lost keys. Lookup failures during stabilization depend on the number of failures, the average path length and number of nodes. In experimental testing on the RON testbed, lookup latency was found to grow slowly with the total number of nodes, confirming simulation results, and demonstrating Chord’s scalability.

October 28, 2008

Active Network Vision and Reality: Lessons from a Capsule-based System

Filed under: Networks — Tags: , , — Ashwin @ 12:57 pm

Citation: Wetherall, D. 1999. Active network vision and reality: lessons from a capsule-based system. In Proceedings of the Seventeenth ACM Symposium on Operating Systems Principles (Charleston, South Carolina, United States, December 12 - 15, 1999). SOSP ‘99. ACM, New York, NY, 64-79. DOI=http://doi.acm.org/10.1145/319151.319156

This paper presents an interesting idea: rather than have routing algorithms statically built into routers, allow applications to bundle their own algorithms along with their data packets, and have routers execute this application-determined code to make routing decisions. The combination of code and data is termed a capsule. Evaluation of the system is presented based on the ANTS toolkit.

Capsules declare a type in their headers, which is used to determine the code to be executed at the router. If this code is not already in the router’s cache, it is obtained from the neighbour which dispatched the capsule. This is an interesting choice of code distribution mechanism; as long as trust is not violated amongst the active routers, more explicit security checks on the code are needed only at the edge of the network. Indeed, security is a recurring theme in the paper, as it is a primary goal that untrusted users be capable of deploying code to the network without compromising the capacity of individual nodes, or of the network as a whole.

The idea presented here is certainly compelling as a thought experiment, but I found it difficult to place in context without examples of real usage. The performance numbers presented, for instance, were confusing, as they relate to no real application; I would imagine that performance would vary wildly depending on the capsule code.

Resilient Overlay Networks

Filed under: Networks — Tags: , , , — Ashwin @ 11:11 am

Citation: Andersen, D., Balakrishnan, H., Kaashoek, F., and Morris, R. 2001. Resilient overlay networks. SIGOPS Oper. Syst. Rev. 35, 5 (Dec. 2001), 131-145. DOI=http://doi.acm.org/10.1145/502059.502048

In order to operate at the scale of the Internet, the Internet’s current interdomain routing protocol, BGP-4, aggregates routes, but at the same time hides details of topology and traffic conditions, which leads to time intervals of the order of tens of minutes (or even hours) for BGP’s recovery mechanisms to converge on a consistent routing table. This paper presents the Resilient Overlay Network (RON) as a remedy to BGP’s problems.

A RON consists of a set of nodes residing in a variety of routing domains, which maintain a full mesh of routes between themselves using a link state routing protocol. Link metrics are collected using a combination of active periodic probing, and passive listening. This allows RON to route around failures in the order of seconds, a significant improvement over BGP. For a RON of 50 nodes, an approximate overhead of 30 kbps will be required with the implementation described in the paper; this does not seem excessive considering the capacity of current Internet links.

An interesting feature of RON is an explicit interaction with applications for the purposes of deciding metrics for route selection. By default, RON offers latency, loss and throughput metrics, though this set may be extended to include application-specific metrics such as jitter. This allows applications to choose trade-offs between different metrics for route selection, and in effect enables application-specific routing. In this context, the paper introduces the useful notion of a performance failure, where a link does not provide adequate performance to the application using it.

RON also puts in place rich routing policies, which may restrict the flow of certain kinds of traffic over certain routes: for instance, in the tests the authors conducted, policies were put in place to prevent commercial traffic from flowing over the non-commercial Internet2 backbone.

Two sets of tests were run, one of 12 academic networks (several of which are on the Internet2 backbone), and another which added 4 commercial networks. In the former case, RON was able to overcome all observed outages, and in the latter, 60%, with an average of 18 seconds to recover from a fault. In addition, thanks to its explicit attention to application-dependent metrics, RON is able to improve loss rate, latency and TCP throughput.

As the authors themselves note, individual RONs will not scale to the level of the Internet: RONs operate best with relatively small numbers of nodes. This smaller scale is also important administratively, from the perspective of attempting to deal with malicious nodes, as such problems are obviously easier to deal with at an administrative, rather than a technical level. In my reading, RON is an important and useful adjunct to BGP routing, both for its rapid recovery from fault, but especially for its ability to route according to application-specified metrics.

October 21, 2008

On the (in)validity of Internet topology models

Filed under: Networks — Tags: , , , , — Ashwin @ 12:04 pm

Citations:

Faloutsos, M., Faloutsos, P., and Faloutsos, C. 1999. On power-law relationships of the Internet topology. In Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols For Computer Communication (Cambridge, Massachusetts, United States, August 30 - September 03, 1999). SIGCOMM ‘99. ACM, New York, NY, 251-262. DOI=http://doi.acm.org/10.1145/316188.316229

Li, L., Alderson, D., Willinger, W., and Doyle, J. 2004. A first-principles approach to understanding the internet’s router-level topology. In Proceedings of the 2004 Conference on Applications, Technologies, Architectures, and Protocols For Computer Communications (Portland, Oregon, USA, August 30 - September 03, 2004). SIGCOMM ‘04. ACM, New York, NY, 3-14. DOI=http://doi.acm.org/10.1145/1015467.1015470

These are an interesting pair of papers, the first from what seems to be the beginning of attempts to extract “laws” of Internet topology, and the second more recent, offering constructive criticisms, and what seems to me a more realistic approach to modeling.

The Faloutsos’ paper studies the topology of the Internet circa 1997-1998, with three datasets of interdomain topology from BGP routing tables gathered at 6 month intervals, and one dataset of the topology of routers on the Internet. They analyse the data using a number of metrics: frequency of outdegree, node rank in order of decreasing outdegree, pairs of nodes and average number of nodes in a neighbourhood as a function of hop distance, eigenvalues of the graph, and order of eigenvalues. Using these metrics, they come to the conclusion that the Internet is subject to three power laws, in spite of a 45% growth over the period of their interdomain topology datasets.

First, the outdegree of a node is proportional to the rank of the node by outdegree raised to a constant rank exponent. They make the claim that this is indicative of the financial and functional considerations in adding a new edge to a node, providing a view into the dynamics of the ways in which domains interconnect. Second, the frequency of a given outdegree is proportional to the outdegree raised to the power of a constant outdegree exponent. From this, they propose the outdegree exponent as a useful metric for estimating the distribution of outdegree amongst Internet nodes. Finally, the eigenvalues of the graphs are proportional to the order raised to a constant eigen exponent. Finding that the eigen exponent is practically the same for the interdomain graphs, but significantly different for the router graph, they conclude that the eigen exponent can be used to distinguish amongst families of graphs.

In addition, they propose an approximation whereby the number of pairs of nodes within a hop distance is proportional to the hop distance raised to a hop-plot exponent. From this, they define the effective diameter of a graph as a function of the numbers of nodes and edges in the graph, and the hop-plot exponent. If accurate, this could have material implications for selection of TTL for broadcast, as the effective diameter provides a high probability estimate of the distance between any pair of nodes. It seems odd to me that they forged ahead with this particular analysis when their datasets lacked sufficient data to prove or disprove this hypothesis.

The Faloutsos’ metrics are interesting and valuable, but I find the failing of this paper in the assumption that certain observations about properties of the Internet amount to laws governing the growth of the Internet. As the second paper mentions, there have been arguments put forward that power law type distributions may arise for purely mathematical and statistical reasons. There is no more reason to believe that there are natural laws governing the global spread of the Internet than that there are natural laws governing the “free” market conditions under which interconnection takes place. The dynamics of differing regulatory regimes, the range of commercial agreements from peering to transit, varying physical geography, and overall economic development are just some of the factors that contribute to the manner in which the Internet spreads.

The second paper seems an explicit attempt to debunk some of the network models that have evolved from the Faloutsos’ work. As the authors point out, for a given distribution of node degree, there are a number of possible networks of distinctly different forms. This paper makes two broad contributions: an understanding of router-level constraints on network performance, and a definition of characteristics that form a heuristically optimal network.

The maximum number of packets that a router can process per unit time constrains the number of links it can handle, and the connection speed per link, which form what the paper calls a feasible region, and corresponding efficient frontier of possible bandwidth-degree combinations. Different kinds of demands may be placed on a router depending on where it is located in the network: core routers may handle fewer links, but larger numbers of flows, while edge routers must deal with the converse case, many links, and relatively fewer flows, aggregating traffic to the core. From this, a notion of a heuristically optimal network is put forward, with a loose mesh of high speed low degree routers at the core, and a hierarchical tree-like structure of slower high degree routers at the edge.

Performance comparisons are made for heuristically optimal, sub-optimal and Abilene-inspired networks against networks formed by two random generation mechanisms: preferential attachment and a general model of random graphs. The paper shows that performance in terms of overall network throughput as well as end-user bandwidth and router utilization is significantly higher for the heuristically optimal network, followed by the Abilene-inspired and sub-optimal networks, and finally the randomly generated networks.

A very useful concept introduced here is that of a likelihood-performance plane, plotting the performance of networks against the likelihood of a network being produced through a random generation process. As the paper demonstrates, heuristically optimal and Abilene-inspired networks have both a far lower likelihood, and a far greater performance than randomly generated networks of the same node degree distribution. As with the paper we read on modeling wireless networks, this paper draws our attention to the fact that models of the Internet topology must take into account specific engineering considerations.

October 16, 2008

Directed Diffusion: A Scalable and Robust Communication Paradigm for Sensor Networks

Filed under: Networks — Tags: , , , — Ashwin @ 1:41 pm

Citation: Intanagonwiwat, C., Govindan, R., and Estrin, D. 2000. Directed diffusion: a scalable and robust communication paradigm for sensor networks. In Proceedings of the 6th Annual international Conference on Mobile Computing and Networking (Boston, Massachusetts, United States, August 06 - 11, 2000). MobiCom ‘00. ACM, New York, NY, 56-67. DOI= http://doi.acm.org/10.1145/345910.345920

This paper describes the general concept of directed diffusion for requesting, and retrieving, data from sensor networks. In such a system, users express an interest in certain kinds of data, and this interest then diffuses across the network towards the nodes most capable of responding, and is then relayed back along the same paths, a smaller set of which may be reinforced by the sensor network itself.

Tasks, or interests, are defined as a set of name/value pairs, along with a specification of the region of the network from which data is to be retrieved. Interests are periodically broadcast by the node to which a user connects, called the sink node, initially at a long interval, to probe the network for the existence of nodes which match the interest. Every nodes maintains an interest cache, with gradients on each interest pointing back to the node from which the interest was received and specifying a data rate.

Nodes, starting with the sink, initially broadcast interests to all their neighbours at a relatively low rate. As the sink starts receiving data, it selectively reinforces particualr neighbours, by resending the interest to them with a higher specified data rate. Nodes in turn perform the same process of reinforcement, selecting neighbours based on matches in their data cache to the interest. In addition, neighbour node selection is affected by the delay to respond to an interest: neighbours with lower delays are preferred. As the conditions along paths vary, data rates may also be lowered for negative reinforcement to remove a path from the preferred set.

The approach described here operates based only on local knowledge, which seems an important consideration for networks with power-constrained nodes. The authors suggest, as evaluated more completely in the other reading for today, that intermediate processing of data may yield even greater benefits.

TAG: a Tiny AGgregation Service for Ad-Hoc Sensor Networks

Filed under: Networks — Tags: , , — Ashwin @ 11:58 am

Citation: S. Madden, M.F. Hellerstein, and W. Hong, “TAG: A Tiny AGgregation Service for Ad-Hoc Sensor Networks,” Proc. 5th Usenix Symp. Operating Systems Design and Implementation, Usenix Assoc., 2002, pp. 131–146.

Since the data extracted from sensor networks is often in the form of summaries, or aggregations, this paper makes the argument that aggregation of data should be considered, and made available, as a core service on sensor networks. TAG, an aggregation service for ad hoc networks based on the TinyOS platform is presented and evaluated.

The two essential features of TAG are a declarative data collection language, inspired by SQL, and the mechanisms for distribution and aggregation of queries across the sensor network, keeping in mind power, time and other constraints, as well as lossy links. The use of a high level language for data collection allows a number of end-to-end optimizations which result in lower data demands on the network, and easier recovery from data losses in the aggregation process. A classification is presented of different types of aggregate functions, with interesting implications for the processing requirements and sensitivity to network losses, for different kinds of aggregates.

One of the decisions taken for design of the routing protocol is to use only symmetric links; asymmetric links are blacklisted. The authors state that this is a common decision in wireless ad hoc networks, but I am not clear as to why this would be so. TAG uses a tree-based routing scheme, with periodic announcements to keep the tree up-to-date. The number of messages transmitted by a node obviously has significant effects on power consumption, which is why the authors make the argument that it therefore makes sense to process data within the network, a trade-off of power consumed due to aggregation-related processing against that consumed due to message transmission. To minimize transmissions, parents in the routing tree aggregate data from children, received over parent-specified time intervals, before passing it up the tree. In contrast, a centralized aggregation mechanism would require parents to forward every message received from children as-is.

In simulations, it is demonstrated that TAG, at worst, approximates the behaviour of centralized approaches in terms of the number of messages transmitted. At best, with aggregation functions which return single, or small sets, values, it results in an order of magnitude reduction in the number of messages transmitted.

I found this paper was useful in how it made me think about the ways in which application layer abstractions (the query interface) could be optimized in different ways (aggregation functions) over the functioning of a network.

October 8, 2008

XORs in the Air: Practical Wireless Network Coding

Filed under: Networks — Tags: , , — Ashwin @ 9:47 pm

Citation: Katti, S., Rahul, H., Hu, W., Katabi, D., Médard, M., and Crowcroft, J. 2006. XORs in the air: practical wireless network coding. SIGCOMM Comput. Commun. Rev. 36, 4 (Aug. 2006), 243-254. DOI=http://doi.acm.org/10.1145/1151659.1159942

In a similar vein to the other reading for today, the approach presented in this paper, COPE, attempts to take advantage of, rather than work against, the characteristics of the wireless medium. The COPE approach is, however, fundamentally different from the other papers that we’ve read: instead of trying to optimize the routes that a packet takes through a network, COPE attempts to optimize utilization. It does so by coding packets being sent to the same next hop node together into a single packet, increasing the effective bit rate of a network.

COPE relies on knowledge of packets that neighbours may have received in order to perform coding; receivers must have all “native” packets used to form an encoded packet other than the one meant for them. When choosing packets to code, an estimate is made of the probability that neighbours for whom these packets are destined have already heard them. While this would work well in a relatively dense network, a network topology where most nodes are not in range of one another would probably not be served well by COPE.

Packets are coded together only when the probability that they are known to the next hops remains sufficiently high. Note that this also means that COPE will not function well in a network with high degrees of mobility, where neighbours are constantly changing. It is best applied to cases such as that presented in this paper, of fixed nodes connected to form a wireless backhaul. This also means that gains from COPE become apparent only as the diversity of packets increases, which could happen both with increased traffic, but also, I imagine, as the number of source/destination pairs increases.

Various theoretically derived ideal coding gains are presented, showing that the maximum coding gain without opportunistic listening is 2, and with opportunistic listening tends to infinity. Interestingly, the authors found that coding gain increased beyond this value in their testbed, since the queues at bottleneck routers become smaller with the use of coding, fewer packets are dropped at the router’s queue.

The results of testing on a 20-node wireless testbed are presented, with a very useful discussion of how COPE interacts with both UDP and TCP flows. UDP flows see a large throughput gains, in the range of 3-4 times the same network without COPE. Initial tests for TCP showed only marginal gains (2-3%) due to hidden terminal problems causing sufficient collision-related losses to trigger timeout and backoff in the TCP flows. When the nodes were brought closer together to a range where carrier sense could function, an increase of 38% in throughput was observed.

Where most research in this space has continued to emphasize newer approaches to routing, this paper, as the authors put it, presents a new “orthogonal axis” to extracting more throughput. I really liked this paper, both for the compelling ideas it puts forward, and also for the extensive discussion of their implementation.

ExOR: Opportunistic Multi-Hop Routing for Wireless Networks

Filed under: Networks — Tags: , , — Ashwin @ 2:49 pm

Citation: S. Biswas and R. Morris, “ExOR: opportunistic multi-hop routing for wireless networks,” in SIGCOMM ‘05: Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications, vol. 35, no. 4. New York, NY, USA: ACM Press, October 2005, pp. 133-144. Available: http://dx.doi.org/10.1145/1080091.1080108

Biswas and Morris present draw from cooperative diversity schemes to present an alternative to traditional routing mechanisms. Rather than route a packet along a single best possible route, ExOR relays packets along multiple routes. This allows ExOR to take advantage of longer lossy links: even though the probability of packet delivery may be lower on such a link, even the few packets that manage to get through on many of these links can add significantly to overall throughput. Additionally, ExOR sends packets in batches, rather than one at a time, which can improve throughput even on short low loss links. Since probabilistic forwarding may not account for all packets in a batch, ExOR is used until the destination has at least 90% of the packets, and then the remaining are sent using conventional routing.

The sender chooses a subset of forwarders from all nodes in the network (to allow scaling up to larger network sizes) using a metric such as ETX. The forwarders coordinate amongst themselves to ensure that their transmissions do not interfere by setting timers which anticipate when other forwarders, ordered by priority, would probably have completed transmission. For their tests, the authors used a centralized server to maintain and distribute the routing metric. It is not clear from the paper how decentralized relay of the metric might affect ExOR performance. The requirement for ordering of transmissions amongst all forwarders surprised me; if only one node transmits at a time, this would seem to imply an overall loss of network capacity. While the results of the authors’ tests for single large files are compelling, I think it would have been very useful to get some sense of how ExOR would function with multiple sources.

The problem of coordination amongst forwarders aside, the authors also suggest that 802.11 fairness mechanisms would take care of coordination with competing transmissions. It would be useful to have some discussion in class of 802.11 fairness, and how ExOR might work with this.

The principal contribution of the paper, for me, is the way in which a property of the wireless medium which makes it problematic for traditional routing protocols - every terminal can hear every other terminal - is effectively leveraged to make routing more efficient. This is a compelling idea. However, I would really have liked to see some of the details of the protocol worked out a bit more.

October 7, 2008

A Performance Comparison of Multi-Hop Wireless Ad Hoc Network Routing Protocols

Filed under: Networks — Tags: , , — Ashwin @ 10:15 am

Citation: Broch, J., Maltz, D. A., Johnson, D. B., Hu, Y., and Jetcheva, J. 1998. A performance comparison of multi-hop wireless ad hoc network routing protocols. In Proceedings of the 4th Annual ACM/IEEE international Conference on Mobile Computing and Networking (Dallas, Texas, United States, October 25 - 30, 1998). W. P. Osborne and D. Moghe, Eds. MobiCom ‘98. ACM, New York, NY, 85-97. DOI= http://doi.acm.org/10.1145/288235.288256

This paper presents a comparison of four wireless routing protocols used in ad hoc networks - DSDV, TORA, DSR and AODV - with tweaks to DSDV and AODV. Tests are run in a simulator, with a number of scenarios involving nodes moving at varying speeds. The metrics used for comparison are the packet delivery ratio, routing overhead and path optimality.

The most surprising of the four, to me, was TORA, for its in-order delivery requirement, which seems an impractical choice, especially for a wireless environment. Unsurprisingly, TORA fares the worst in the performance tests, with wildly varying performance under similar conditions, and undergoing congestion collapse beyond a certain number of sources in the network.

DSR provides a consistently high packet delivery ratio, closely followed by AODV, albeit both with increasing routing overhead as the number of sources increase (since they are on-demand protocols), though the incremental cost of adding a source does decrease with the number of sources. Even so, the routing overhead for both these protocols is lower than either DSDV or TORA. DSDV also exhibits little variation in the packet delivery ratio, though it does function best under lower degrees of mobility; at higher mobility, it can completely fail to converge. As might be expected, DSDV’s routing overhead remains largely constant.

The discussion of the effect of packet size was interesting: DSR is more expensive than AODV when routing overhead is measured in bytes. However, the cost to acquire the medium for transmission of a packet is much higher than the incremental cost of adding these extra bytes to a packet.

A High-Throughput Path Metric for Multi-Hop Wireless Routing

Filed under: Networks — Tags: , , — Ashwin @ 9:23 am

Citation: De Couto, D. S., Aguayo, D., Bicket, J., and Morris, R. 2003. A high-throughput path metric for multi-hop wireless routing. In Proceedings of the 9th Annual international Conference on Mobile Computing and Networking (San Diego, CA, USA, September 14 - 19, 2003). MobiCom ‘03. ACM, New York, NY, 134-146. DOI= http://doi.acm.org/10.1145/938985.939000

This paper, by some of the same authors as the Roofnet paper, presents the idea of the Expected Transmission Count (ETX) metric that was developed further and used in Roofnet’s routing protocol. ETX is evaluated on a 29-node wireless testbed, as an extension to the DSDV and DSR protocols.

In presenting ETX, the authors contrast it to the commonly used metric used in ad hoc networks, minimum hop count. While useful as a metric in wired networks, minimum hop count suffers from a number of flaws when used in wireless networks. The most important of these, perhaps, is the assumption that links either work, or do not; wireless links typically have intermediate loss ratios. Lower hop counts may also maximize the distance traveled per hop, leading to increased signal attenuation and higher probabilities of loss. Finally, a random choice amongst routes with the same hop count is not likely to pick the best route, as the characteristics of the individual links in this route are not taken into account. This last problem worsens as a network grows, and the number of possible paths increases.

To remedy these problems, ETX attempts to find routes which would have the fewest transmissions, and retransmissions, of packets, by measuring packet loss ratios in both directions on each link. Measurement in both directions is important, to account for 802.11b ACKs being sent back along a link, as it is not unlikely that links may be asymmetric to some degree in an ad hoc network. Packet loss is measured as the delivery ratio for a link: the number of packets received, divided by the number of packets transmitted by the sender. Since ETX is designed to minimize the number of packets transmitted, this also results is lower spectrum use, thereby increasing overall network capacity, and a decrease in energy per packet, which is a particularly an important consideration for devices with limited battery life.

From measurements, it appears that packet size is an important consideration which ETX does not account for. ETX sends small probe packets of fixed size to determine link quality, but these are somewhat larger than 802.11b ACK packets, and inevitably much smaller than data packets. As a result, ETX underestimates the delivery ratio for ACK packets, and, more importantly, may overestimate the delivery ratio for large packets.

In order to be useful, ETX makes a few assumptions: that there are link layer retransmissions, and that radios operate on fixed, rather than variable, power levels. Additionally, the authors suggest that ETX may not be responsive to mobility, as link quality may change too quickly for propagation of accurate route metrics. I’m not so sure about this last point; it seems to me that ETX might be a useful addition to a protocol like DSR in a network with high mobility.

Older Posts »

Powered by WordPress