A Policy Aware Switching Layer for Data Centers
Joseph, A.J., Tavakoli, A, Stoica, I. 2008. A Policy Aware Switching Layer for Data Centers. UC Berkeley Technical Report No. UCB/EECS-2008-82.
The authors deal with the problem of configuring middleboxes in datacenters. Current architectures call for middleboxes to be placed on the physical network path, which leads to a number of sticky configuration problems. These include removal of physical connectivity paths which do not cross the middlebox, manipulation of link costs and separation into VLANs. All these approaches carry penalties with them: loss of fault tolerance, difficulty of predicting behaviour, fatesharing of flows with middleboxes, and the loss of ability to run clustering and virtual server mechanisms which require layer 2 connectivity.
The authors propose a new approach, PLayer, consisting of policy aware switched, pswitches, which allow middleboxes to be taken off the physical network path, and allows for the explicit specification of middlebox routing policy, rather than the implicit mechanisms currently in use. Though conceptually simple, this is a difficult problem in practice, since a principal, though unstated, design goal is to not require any changes of the middleboxes themselves. Even for simple middleboxes, Ethernet frames have to be encapsulated for delivery. More complex middleboxes that require layer 3 and layer 4 data need assurance that streams are always directed to the same middlebox instance; this is achieved using consistent hashing to choose instances.
An interesting problem that the authors deal with is the dissemination of policy updates. Each pswitch maintains a copy of all policy rules for the datacenter, to allow for continued correct function in the event of any failures elsewhere on the network. When policies are updated, these must be adopted concurrently by all pswitches. The authors propose a mechanism where policies are pushed out pswitches, but not immediately adopted; a separate small control packet is dispatched to signal the switch to a new policy. As the packet is small, there is a greater likelihood that it will reach all switches synchronously.
Even with this mechanism in place, there are several scenarios under which flows will be processed by different policies, which call for very specific approaches to policy configuration in order to enable reliable and consistent dissemination. This did make me wonder about how such a mechanism might be deployed to a real datacenter. Could it be that, depending on topology, different portions of a network could be taken offline and updated independent of one another? For smaller, controlled functionality (e.g., load balancers and firewalls dedicated to a web server farm), it may be that this could provide a more reliable, albeit also more manual, update mechanism.