On the (in)validity of Internet topology models
Citations:
Faloutsos, M., Faloutsos, P., and Faloutsos, C. 1999. On power-law relationships of the Internet topology. In Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols For Computer Communication (Cambridge, Massachusetts, United States, August 30 - September 03, 1999). SIGCOMM ‘99. ACM, New York, NY, 251-262. DOI=http://doi.acm.org/10.1145/316188.316229
Li, L., Alderson, D., Willinger, W., and Doyle, J. 2004. A first-principles approach to understanding the internet’s router-level topology. In Proceedings of the 2004 Conference on Applications, Technologies, Architectures, and Protocols For Computer Communications (Portland, Oregon, USA, August 30 - September 03, 2004). SIGCOMM ‘04. ACM, New York, NY, 3-14. DOI=http://doi.acm.org/10.1145/1015467.1015470
These are an interesting pair of papers, the first from what seems to be the beginning of attempts to extract “laws” of Internet topology, and the second more recent, offering constructive criticisms, and what seems to me a more realistic approach to modeling.
The Faloutsos’ paper studies the topology of the Internet circa 1997-1998, with three datasets of interdomain topology from BGP routing tables gathered at 6 month intervals, and one dataset of the topology of routers on the Internet. They analyse the data using a number of metrics: frequency of outdegree, node rank in order of decreasing outdegree, pairs of nodes and average number of nodes in a neighbourhood as a function of hop distance, eigenvalues of the graph, and order of eigenvalues. Using these metrics, they come to the conclusion that the Internet is subject to three power laws, in spite of a 45% growth over the period of their interdomain topology datasets.
First, the outdegree of a node is proportional to the rank of the node by outdegree raised to a constant rank exponent. They make the claim that this is indicative of the financial and functional considerations in adding a new edge to a node, providing a view into the dynamics of the ways in which domains interconnect. Second, the frequency of a given outdegree is proportional to the outdegree raised to the power of a constant outdegree exponent. From this, they propose the outdegree exponent as a useful metric for estimating the distribution of outdegree amongst Internet nodes. Finally, the eigenvalues of the graphs are proportional to the order raised to a constant eigen exponent. Finding that the eigen exponent is practically the same for the interdomain graphs, but significantly different for the router graph, they conclude that the eigen exponent can be used to distinguish amongst families of graphs.
In addition, they propose an approximation whereby the number of pairs of nodes within a hop distance is proportional to the hop distance raised to a hop-plot exponent. From this, they define the effective diameter of a graph as a function of the numbers of nodes and edges in the graph, and the hop-plot exponent. If accurate, this could have material implications for selection of TTL for broadcast, as the effective diameter provides a high probability estimate of the distance between any pair of nodes. It seems odd to me that they forged ahead with this particular analysis when their datasets lacked sufficient data to prove or disprove this hypothesis.
The Faloutsos’ metrics are interesting and valuable, but I find the failing of this paper in the assumption that certain observations about properties of the Internet amount to laws governing the growth of the Internet. As the second paper mentions, there have been arguments put forward that power law type distributions may arise for purely mathematical and statistical reasons. There is no more reason to believe that there are natural laws governing the global spread of the Internet than that there are natural laws governing the “free” market conditions under which interconnection takes place. The dynamics of differing regulatory regimes, the range of commercial agreements from peering to transit, varying physical geography, and overall economic development are just some of the factors that contribute to the manner in which the Internet spreads.
The second paper seems an explicit attempt to debunk some of the network models that have evolved from the Faloutsos’ work. As the authors point out, for a given distribution of node degree, there are a number of possible networks of distinctly different forms. This paper makes two broad contributions: an understanding of router-level constraints on network performance, and a definition of characteristics that form a heuristically optimal network.
The maximum number of packets that a router can process per unit time constrains the number of links it can handle, and the connection speed per link, which form what the paper calls a feasible region, and corresponding efficient frontier of possible bandwidth-degree combinations. Different kinds of demands may be placed on a router depending on where it is located in the network: core routers may handle fewer links, but larger numbers of flows, while edge routers must deal with the converse case, many links, and relatively fewer flows, aggregating traffic to the core. From this, a notion of a heuristically optimal network is put forward, with a loose mesh of high speed low degree routers at the core, and a hierarchical tree-like structure of slower high degree routers at the edge.
Performance comparisons are made for heuristically optimal, sub-optimal and Abilene-inspired networks against networks formed by two random generation mechanisms: preferential attachment and a general model of random graphs. The paper shows that performance in terms of overall network throughput as well as end-user bandwidth and router utilization is significantly higher for the heuristically optimal network, followed by the Abilene-inspired and sub-optimal networks, and finally the randomly generated networks.
A very useful concept introduced here is that of a likelihood-performance plane, plotting the performance of networks against the likelihood of a network being produced through a random generation process. As the paper demonstrates, heuristically optimal and Abilene-inspired networks have both a far lower likelihood, and a far greater performance than randomly generated networks of the same node degree distribution. As with the paper we read on modeling wireless networks, this paper draws our attention to the fact that models of the Internet topology must take into account specific engineering considerations.