Nalanda

November 25, 2008

A Policy Aware Switching Layer for Data Centers

Filed under: Networks — Tags: , — Ashwin @ 12:36 pm

Joseph, A.J., Tavakoli, A, Stoica, I. 2008. A Policy Aware Switching Layer for Data Centers. UC Berkeley Technical Report No. UCB/EECS-2008-82.

The authors deal with the problem of configuring middleboxes in datacenters. Current architectures call for middleboxes to be placed on the physical network path, which leads to a number of sticky configuration problems. These include removal of physical connectivity paths which do not cross the middlebox, manipulation of link costs and separation into VLANs. All these approaches carry penalties with them: loss of fault tolerance, difficulty of predicting behaviour, fatesharing of flows with middleboxes, and the loss of ability to run clustering and virtual server mechanisms which require layer 2 connectivity.

The authors propose a new approach, PLayer, consisting of policy aware switched, pswitches, which allow middleboxes to be taken off the physical network path, and allows for the explicit specification of middlebox routing policy, rather than the implicit mechanisms currently in use. Though conceptually simple, this is a difficult problem in practice, since a principal, though unstated, design goal is to not require any changes of the middleboxes themselves. Even for simple middleboxes, Ethernet frames have to be encapsulated for delivery. More complex middleboxes that require layer 3 and layer 4 data need assurance that streams are always directed to the same middlebox instance; this is achieved using consistent hashing to choose instances.

An interesting problem that the authors deal with is the dissemination of policy updates. Each pswitch maintains a copy of all policy rules for the datacenter, to allow for continued correct function in the event of any failures elsewhere on the network. When policies are updated, these must be adopted concurrently by all pswitches. The authors propose a mechanism where policies are pushed out pswitches, but not immediately adopted; a separate small control packet is dispatched to signal the switch to a new policy. As the packet is small, there is a greater likelihood that it will reach all switches synchronously.

Even with this mechanism in place, there are several scenarios under which flows will be processed by different policies, which call for very specific approaches to policy configuration in order to enable reliable and consistent dissemination. This did make me wonder about how such a mechanism might be deployed to a real datacenter. Could it be that, depending on topology, different portions of a network could be taken offline and updated independent of one another? For smaller, controlled functionality (e.g., load balancers and firewalls dedicated to a web server farm), it may be that this could provide a more reliable, albeit also more manual, update mechanism.

Improving MapReduce Performance in Heterogeneous Environments

Filed under: Networks — Tags: , , — Ashwin @ 12:11 pm

Zaharia, M., Konwinski, A., Joseph, A. D., Katz, R. and Stoica, I. 2008. Improving MapReduce Performance in Heterogeneous Environments.

The authors examine the performance of the MapReduce implementation in Hadoop, and find several flaws in Hadoop’s scheduler which can cause severe degradation in performance.The principal problems that are analysed are those of the identification of laggard tasks for speculative execution, and the identification of nodes to which these tasks should be assigned. These problems arise in Hadoop due to assumptions of homogeneity in tasks, and in the network itself.

To remedy these problems, the authors propose a new scheduling algorithm, LATE, which is sensitive both to the variance in tasks and also to variance in node performance. LATE chooses tasks for speculative execution based on estimated time to completion, rather than the simple score metric that Hadoop uses. Only nodes with performance above a specified threshold are chosen for the execution of speculative tasks. In addition, a cap is maintained on the total number of speculative tasks that may be run at a time. Testing on various configurations of EC2, and also on a testbed with virtual machines, demonstrates that LATE provides significant performance benefits.

The one question I have is with regards to the choice of virtual machines as testbeds. While I undertstand that these could be useful to simulate a heterogeneous environment, it also seems like they are a worst case scenario. MapReduce is itself a virtualization scheme that makes certain assumptions about the locality of data; layering this on top of another virtualization scheme seems like overkill. It would be interesting to know how much of an advantage LATE delivers in a carefully planned data center, even one with a degree of heterogeneity.

November 18, 2008

Data-oriented networking and DTNs

Filed under: Networks — Tags: , , — Ashwin @ 1:27 pm

Tolia, N., Kaminsky, M., Andersen, D. G., and Patil, S. 2006. An architecture for internet data transfer. In Proceedings of the 3rd Conference on Networked Systems Design & Implementation - Volume 3 (San Jose, CA, May 08 - 10, 2006). USENIX Association, Berkeley, CA, 19-19.

This paper presents one very simple, and powerful idea: that data transfer is a common activity across many classes of protocols, and so may be offered as a common service. Protocols may perform content negotiation as necessary, and delegate data transfer to the DOT service. This allows for reuse and innovation at the data transfer layer, with no need for protocols to reinvent the wheel, as it were.

The measurements of performance for the different DOT plugins was interesting, but not compelling; especially that for the USB device. All that this really shows us is that DOT achieves its goal of supporting multiple transports. The most interesting metric, for me, was that a single graduate student was able to add DOT support to Postfix in less than a week, which speaks volumes for the simplicity of the protocol. I especially like the attention to adoption, that DOT functions as an optional protocol extension, rather than a requirement.

Fall, K. 2003. A delay-tolerant network architecture for challenged internets. In Proceedings of the 2003 Conference on Applications, Technologies, Architectures, and Protocols For Computer Communications (Karlsruhe, Germany, August 25 - 29, 2003). SIGCOMM ‘03. ACM, New York, NY, 27-34. DOI=http://doi.acm.org/10.1145/863955.863960

The discussion of the assumptions that TCP/IP makes of the link layer (end-to-end paths, low RTT and drop rate) and of the characteristics of “challenged” networks, were alone worth the price of admission for this paper. To list a few of the latter: high latencies, predictable interruption, asymmetric data rates, non-existence of end-to-end routing paths, lack of continuous power or adequate memory.

I particularly liked the point that Fall makes in justifying this work, that there may exist conventional networks with Internet connectivity only available via a challenged network. The use of challenged networks for transit capabilities could provide significant benefits.

I thought the initial presentation was interesting, the use of an asynchronous transfer mechanism, but as the paper goes on, it seems like it’s trying to do too much. I can see that DTN could be useful to create guidelines for the design of protocol gateways. I got the most out of this paper in reading it as a thought piece on protocol design.

However, it sounds trying to get DTN functioning as a viable end-to-end overlay on the Internet sounds like it would be more trouble than it’s worth. As we’ve seen in our readings, it’s hard enough to get one set of protocols working, and harder yet to bridge disparate environments; I’m not convinced that the One True Protocol solution is workable.

November 13, 2008

Internet Measurement

Filed under: Networks — Tags: , , — Ashwin @ 2:12 pm

Paxson, V. 1997. End-to-end Internet packet dynamics. SIGCOMM Comput. Commun. Rev. 27, 4 (Oct. 1997), 139-152. DOI=http://doi.acm.org/10.1145/263109.263155

Paxson measures TCP transfers between 35 Internet sites to gather his data, with particular attention to dynamics of both sender and receiver, unlike many prior studies which only study sender behaviour. In doing so, he is able to study the end-to-end asymmetry in paths for individual transfers, and also deduce interesting characteristics of routers and links along these paths.

He also importantly draws our attention to several ways in which the effects of TCP may be disentangled from those of the network. One of the most interesting to me, for instance, was a reminder to consider that loss rate measurement for ACKs is very different from that for data packets; TCP adjusts data packet rates in response to loss, which illustrates how we could end up measuring an effect of TCP, rather than of the underlying network path.

Another interesting consideration was in the use of packet pairs for measurement. As the network evolves, packets may not traverse the same end-to-end link: they may be switched along different paths at a router. To deal with this problem, Paxson introduces the notion of “packet bunch modes”, where packets are sent in bunches of varying sizes to account for the different paths that they may traverse.

This was really deep and interesting paper that will take several more reads before I manage to absorb it completely.

R. Fonseca, G. Porter, R. H. Katz, S. Shenker, and I. Stoica. 2007. X-trace: A pervasive network tracing framework. In 4th USENIX Symposium on Networked Systems Design & Implementation, pp. 271-284. [Online]. Available: http://www.usenix.org/events/nsdi07/tech/fonseca.html

X-Trace provides the means to trace network requests across different services and layers of a network, by associating each request with a unique identifier. Protocols need to be modified, if required, to handle the extra X-Trace metadata.

Network elements need to support only two simple primitives - pushDown() and pushNext() - which are used to update request metadata to indicate it being pushed down a layer (e.g., HTTP to TCP), or across a layer (e.g., web browser to web server). Network elements also need to log reports based on this metadata to a persistent store.

Using the stored metadata, a tree may be reconstructed to show the sequence of requests, across, as well as up and down, network layers. I could imagine that this could potentially be extended to also capture logs within an application.

I like this approach for its simplicity, as it requires very little for instrumentation, and is easy to support in protocols that already allow for extension (e.g., HTTP headers). The challenge, of course, as the authors point out, would be to drive adoption to the point where it is sufficiently widespread that X-Trace reports could become useful across multiple administrative domains.

November 6, 2008

Middleboxes and Indirection

Filed under: Networks — Tags: , , — Ashwin @ 1:05 pm

Citations:

Walfish, M., Stribling, J., Krohn, M., Balakrishnan, H., Morris, R., and Shenker, S. 2004. Middleboxes no longer considered harmful. In Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation - Volume 6 (San Francisco, CA, December 06 - 08, 2004). Operating Systems Design and Implementation. USENIX Association, Berkeley, CA, 15-15

Stoica, I., Adkins, D., Zhuang, S., Shenker, S., and Surana, S. 2002. Internet indirection infrastructure. SIGCOMM Comput. Commun. Rev. 32, 4 (Oct. 2002), 73-86. DOI=http://doi.acm.org/10.1145/964725.63303

Both these papers build on our recent readings to explore new addressing and routing mechanisms that would allow easier deployment of existing network elements (such as middleboxes), and enable a wide variety of modern network services. The approaches outlined in these papers - Delegation Oriented Architecture (DOA) and Internet Indirection Infrastructure (i3) - have some features in common: a reliance on a new kind of host identifier to be embedded in packets, the use of DHTs as address lookup services, and implementation as an overlay network to provide the necessary function and at the same time allow them to coexist with the current Internet architecture.The principal difference between these approaches is perhaps in their use of identifiers: where DOA maps identifiers to individual hosts, i3 uses identifiers to map to potentially multiple hosts.

The authors of DOA make the point that although reviled, middleboxes provide essential network services; we should therefore ask how they might be better integrated into the Internet architecture without violating the fundamental tenets that every Internet entity has a unique network identifier and that network elements should not not process packets that are not addressed to them. DOA packets may specify a sequence of hosts to which the packet must be routed, allowing middleboxes to exist at locations other than the topological “middle”. With this abstraction in place, packets are source routed across the specified sequence of hosts before reaching the receiver. Users may adaptively select new services that they wish to process their packets, and outsource them accordingly.

i3 requires receivers to register “triggers” against common identifiers, indicating that packets to an identifiers should be forwarded to them. A mobile host may update its triggers to reflect its new network location, with no need for senders to be aware of this change. Multicast groups may be formed by multiple hosts registering triggers for the same identifier. The most interesting application of i3, to me, was for anycast, where the hosts in the anycast group maintain triggers with identical prefixes; senders form identifiers with a matching prefix, and the request is routed to the trigger with the longest matching prefix. Such services may provide load balancing by having senders generate random identifier suffixes, or location awareness by having receivers and senders encode their location into the identifier suffix. As with DOA, i3 provides a means by which packets may be source routed - an “identifier stack” - to enable service composition.

Each of these proposals is put forward to address fundamental mismatches between current network architecture, and the uses to which this architecture is being put. DOA deals with the problem of middleboxes, and how these might be better integrated into the network architecture, while i3 deals with the problem of network services which do not follow the point-to-point abstraction of Internet routing (multicast, anycast and mobility).

November 4, 2008

DNS Scalability

Filed under: Networks — Tags: , , — Ashwin @ 1:58 pm

Citations:

Mockapetris, P. and Dunlap, K. J. 1988. Development of the domain name system. SIGCOMM Comput. Commun. Rev. 18, 4 (Aug. 1988), 123-133. DOI=http://doi.acm.org/10.1145/52325.52338

Jung, J., Sit, E., Balakrishnan, H., and Morris, R. 2001. DNS performance and the effectiveness of caching. In Proceedings of the 1st ACM SIGCOMM Workshop on internet Measurement (San Francisco, California, USA, November 01 - 02, 2001). IMW ‘01. ACM, New York, NY, 153-167. DOI=http://doi.acm.org/10.1145/505202.505223

These two papers provide interesting views into the process of design of DNS, and of actual measurements of those design choices.

DNS was originally created to address problems with the HOSTS.TXT approach to maintaining host name to IP address mappings: the technical problem that the file was growing too large with increasing numbers of hosts, and the administrative problem that it was becoming difficult to maintain a single master file as the number of independent administrations on the network increased. To address these problems, a hierarchical system was created, with different servers accounting for different sections of the DNS tree. Administrations could maintain authoritative servers for their own domains, while higher level servers, leading up to the root servers, could provide pointers to tie the different administrations together.

It was interesting to see how many of the problems mentioned in the first paper were social, rather than technical: the difficulty of introducing new classes (which seems to me unrelated to DNS itself), the incentives for transitioning HOSTS.TXT-based to DNS, explicit requirements for redundancy in all DNS servers (which seems unachievable at Internet-scale), following sample settings word-for-word.

The first paper presents caching as one of the successes of DNS, in the context of the “unexpectedly bad performance” of the Internet of the time. Without caching, the DNS system may have been doomed to failure, making wider adoption more difficult. The second paper calls caching into question, specifically with reference to the predominant web traffic on the networks sampled.

As the second paper observes, most wide area traffic is web traffic, where a cluster of requests to a server are preceded by a single lookup. Since the distribution of names is Zipf-like, the usefulness of caching for A records is limited. The evaluation in this paper suggests that small groups of clients, of the order of 25 or so, provide the benefits of caching close to providing a common cache across all clients. However, since HTTP 1.1 is now fairly widely deployed, it seems likely that there the correspondence of TCP to DNS flows will be closer to 1:1, which begs the question whether caching of A records provides any utility today.

The authors also show that caching of NS records provides important performance benefits, as it reduces the number of referrals per lookup, and also reduces the load on the root and gTLD servers. This make intuitive sense; I would imagine the hierarchy of DNS servers to be relatively stable, which would lend itself well to caching with large TTL values.

Powered by WordPress