Scaling Internet Routers Using Optics
Citation: Keslassy, I., Chuang, S., Yu, K., Miller, D., Horowitz, M., Solgaard, O., and McKeown, N. 2003. Scaling internet routers using optics. In Proceedings of the 2003 Conference on Applications, Technologies, Architectures, and Protocols For Computer Communications (Karlsruhe, Germany, August 25 - 29, 2003). SIGCOMM ‘03. ACM, New York, NY, 189-200. DOI= http://doi.acm.org/10.1145/863955.863978
The authors are concerned with the problems faced by the current generation of Internet routers, especially in the face of expected increases in Internet traffic. Network operators’ central offices were already reportedly full at the time this paper was written, requiring that new router designs either reduce their power density, and/or consume less power. Router configurations have shifted from single-rack to multi-rack to reduce power density. However, both these configurations have their own problems: multi-rack systems suffer from unpredictable performance and poor scalability, while single-rack systems use sub-optimal centralized schedulers in practice, which will not scale with increases in ports or line rate.
These concerns limit the capacity of a rack to about 2.5 Tb/s, constrained by power consumption. In this paper, the authors present a novel architecture using optics, which they estimate will be able to scale to 100 Tb/s with practically zero power requirements, and guaranteed throughput.
They do this by extending the Load-Balanced switch architecture described by C-S. Chang et al., to address the problems inherent in this architecture: requirement for a rapidly configuring switch fabric, possibilities of mis-sequenced packets, vulnerability to pathological traffic patterns, and inability to deal with failed linecards.
The Load-Balanced switch architecture breaks a router down into three parts: input and output switching stages, separated by a middle stage of buffers maintaining Virtual Output Queues, which smooth traffic from inputs to outputs. The input stage acts as a load balancer spreading traffic over the VOQs, while the output stage serves each VOQ at a fixed rate. This allows the architecture to function entirely based on local knowledge, with no need for a centralized scheduler, or knowledge of state of all queues, resulting in greater reliability, and a simpler implementation. I had a little difficulty understanding the justifications for splitting input packets into smaller fixed-length packets; while I can understand why this is necessary, I wonder how much it would add to processing requirements. Could it be that this could become a bottleneck?
The problem of packet mis-sequencing is addressed using the Full Ordered Frames First (FOFF) scheme. This scheme processes input queues holding at least N packets (followed by any non-empty queues) in a round-robin fashion, reading at most N packets at a time, and transferring each packet to a different intermediate queue in the middle stage. This approach bounds the amount of mis-sequencing of packets, and corrects what mis-sequencing occurs with a re-sequencing buffer in the output stage. FOFF is resilient to pathological traffic patterns due to the manner in which it spreads flows across the intermediate linecards.
The problem of failed linecards is addressed by splitting input and output linecards into G groups, with M intermediate GxG switches, which each maintain a fixed configuration, rotating the matching of inputs to outputs in sequence. This seems, in essence, to be a hardwired load-balancing scheme; if I understand right, extra capacity needs to be added to the intermediate switches proportional to the maximum number of linecards that need to switched out at any given time.
There were segments of this paper that I had trouble grasping completely, particularly the discussion of hybrid electro-optical switches and optical switches, since I lack almost any familiarity with the EE side of EECS. Regardless, it was really interesting as a way to think through how some of the scheduling algorithms we’ve discussed would be built as hardware implementations.
You have an excellent summary of the high points above. I will try to put the paper into some context today.
Comment by Randy Katz — September 23, 2008 @ 1:23 pm