Internet Routing Instability

 

Olev Kartau, HUT

October 1998

 

 

Abstract

 

This article discusses internet routing instability. Some reasons are listed, and the routing instability analysis done during 1997 and 1998 in Michigan University is described.

 

Routing problem causes

Introduction

Establishing and maintaining route stability within and between networks is critical for maintaining reliable connectivity. In the first part of this paper the main reasons for routing instability are described shortly. In the second part the two studies of routing instability [Instability] ,[Origins-Of-Instability] is referred.

 

Reasons causing exterior routing instabilities [Routing-Arch]

Instabilities of interior routers

Dynamic injection/removal of interior routes may cause problems also for BGP. It is recommended to use only static injection of routes into BGP. Also, route aggregation at the border routes can reduce the risk of problems associated with route injection.

Hardware failures

Software problems

Bugs – some are found, some fixed.

Insufficient horsepower

CPU performance is needed most when handling route updates, or starting the router (extreme update case). Dangerous race condition appears if CPU is too busy to handle all updates, which causes updates to drop, which causes more and more updates.

Insufficient memory

There are well over 40000 routes in Internet. In typical router, more than 32 MB of RAM is needed to handle tables.

Network upgrades

Problems typically start only at the next day, when there is higher load than at the nighttime when operator did the change. At that time, it is more difficult to revert the system(s), so operators start changing manually, making things usually worse.

Human error

In most cases, administrator breaks administrative policies or has too little knowledge about effects and side-effects.

Backup link overloads

Caused by failure on main link. Backup links typically are not measured to handle the double traffic.

 

 

Internet routing instability

Since the end of the NSFNet backbone in April of 1995, the Internet has been growing "explosivly" in both size and topological complexity. Bandwith shortages and lack of router capacity have lead to a vision about the near death of the Internet [Internet-Collapse].

Routing instability is informally defined as the rapid change of network reachability and topology information. The routing instability has many origins (see previous Chapter). Many of the causes lead to a large number of routing information updates.

The main symptom of route instability is the disappearance of an existing route. If the route is reappearing soon, it is called flapping. With BGP (most of autonomous systems exchange routing information with BGP [BGP]), it happens if a router sends a routing update and then withdraws it after a short period of time, causing its peers to propagate and withdraw the updates further in the network. As the result, performance of routers may suffer significantly. At the extreme, route flaps have led to the transient loss of connectivity for large portions of the Internet.

Routing information in BGP has two forms: announcements and withdrawals. In [Instability], the authors measured the BGP updates generated by service provider backbone routers at the major U.S. public exchange points.

The analysis was based on data collected from the experimental instrumentation of key portions of the Internet infrastructure. Over the course of nine months, they logged BGP routing messages exchanged with the Routing Arbiter project’s route servers (Unix-based systems providing aggregate route server BGP information to client peers) at five of the major U.S. network exchange points: Mae-East, Sprint, AADS, PacBell and Mae-West. The largest public exchange, Mae-East located near Washington D.C., currently hosts over 60 service providers, including MCI, ANS, Sprint, BBN and UUNet. The Routing Arbiter project collected 12 gigabytes of compressed data.

 

Analysis of the gathered data leads to a number of findings:

 

 

The authors define instability as an instance of either forwarding instability or policy fluctuation. Overall, the study showed that the Internet continues to exhibit high levels of routing instability despite the increased emphasis on aggregation (combining smaller IP prefixes into a single route announcement) and the deployment of route dampening technology (refusing to believe updates that exceed certain parameters of instability).

 

The authors distinguish between three types of routing updates:

 

Analysis indicates that the majority of BGP updates consist entirely of pathological, duplicate withdrawals. Most of these withdrawals are transmitted by routers belonging to autonomous systems that never previously announced reachability for the withdrawn prefixes. Many of the routers withdraw an order of the magnitude more routes than they announce. This illustrates an important property of inter-domain routing – the disproportionate effect that a single ISP can have on the global routing mesh.

Further analysis indicated that the majority of these extraneous, pathological withdrawals were caused by specific BGP software implementation on at least one widely deployed commercial router. The router vendor had made a time-space tradeoff implementation decision: not to maintain state regarding information advertised to the BGP peers. This implementation is called stateless BGP withdrawals [Instability]. Upon receipt of any topology change, the stateless BGP router transmits withdrawals to all BGP peers regardless of whether they had previously sent the peer an announcement of that route. It is worth of noting that stateless implementation is compliant with the current IETF BGP standard.

 

 

 

The work documented in [Instability] has led to specific architectural and protocol changes in commercial Internet routers through the collaboration with vendors.

 

Same authors extended the work documented in [Instability] in [Origins-Of-Instability]. They collected more data at same exchange points and they were able to verify that as a result of changes to specific router vendor software, suggested by analysis in the [Instability], the volume of routing updates had decreased by an order of magnitude. The decrease had happened because of suppression of pathological withdrawals. They also describe new potential changes to router software that can decrease the volume by an additional 30 percent or even more.

 

The research through experimental measurement of real inter-domain routing is quite young, but provided significant improvement already.

 

 

References

[Routing-Arch] Bassam Halabi "Internet Routing Architectures", New Riders Publishing/Cisco Press 1997

[BGP] Lougheed K., Rekhter Y., "A Border Gateway Protocol (BGP).", RFC-1163 June 1990

[Instability] Labovitz C., Malan G., Jahanian F., "Internet Routing Instability" University of Michigan Technical Report, CSE-TR-332-97 in Proceedings of ACM SIGCOMM, September 1997

[Origins-Of-Instability] Labovitz C., Malan G., Jahanian F., "Origins of Internet Routing Instability" University of Michigan Technical Report, CSE-TR-368-98 July 20, 1998

[IPMA] Internet Performance Measurement and Analysis Project (IPMA) http://www.merit.edu/ipma

[Internet-Collapse] B.Metcalf, "predicting the Internet’s Catastrophic Collapse and Ghost Sites Galore in 1996." Infoworld, December 4, 1995