BGP ROUTING PART I: BGP AND MULTI-HOMING

Everyone wants to know about BGP. What is it? How do you use it? What is it used for? We'll try to explain at least the basics of BGP in this document.

This document is Copyright Avi Freedman, 1997. Distribution of the original or modified versions for profit is prohibited, but please feel free to give it away.


Index

BGP
A WARNING
PREREQUISITES
BGP ROUTING: INTERNAL (INTERIOR) AND EXTERNAL
SO WHY IS BGP INTERESTING?
BEING "CONNECTED" TO THE INTERNET
HARDWARE AND SOFTWARE FOR SPEAKING BGP
PEERING SESSIONS AND ASNs: PART I
WHAT DO YOU DO WITH BGP?
PEERING SESSIONS
eBGP vs. iBGP
BGP AND THE SINGLE-HOMED
AS-PATHS
AS-PATH LENGTH AND BGP ROUTE SELECTION
AS-PATH ACCESS LISTS (FILTERS)
ENTERING, MODIFYING, AND DELETING as-path access-lists
BGP METRICS (ATTRIBUTES) AND ROUTE SELECTION: INTRODUCTION
BGP PATH SELECTION PROCESS ACCORDING TO CISCO
BGP ATTRIBUTE TYPES
EGP vs. IGP
WHAT IS ROUTE FLAP AND WHY IS IT BAD?
WHAT TO KEEP IN MIND WHEN CONFIGURING BGP
BGP AND PEERING
INTERNET CONNECTIVITY WITHOUT BGP
BGP AND THE SINGLE-HOMED
BGP AND THE MULTI-HOMED
MULTI-HOMING AND LOAD-BALANCING
HOW TO ANNOUNCE YOUR NETWORKS
BEING ADVERTISED BY MULTIPLE PROVIDERS WITHOUT PI-SPACE
CONTROLLING OUTGOING DATA FLOW: "FULL ROUTING"
CONTROLLING OUTGOING DATA FLOW: "PARTIAL ROUTING": "CUSTOMER ROUTES ONLY"
SO WHAT'S TO BE DONE?
AS-PATH PADDING
QUESTIONS AND COMMENTS
THANKS TO
TO BE DONE

Sidebars

Sidebar on Cisco BGP commands
Sidebar on next-hop-self
Sidebar on Outgoing Data Flow Control Without BGP

A WARNING

This is dangerous stuff. It's always best if you can test BGP configurations in a "lab" made up of a few Cisco 2501s before implementing them in a live network connected to the Internet. Unfortunately, there's no good reference on "using BGP" to refer people to. Reading the RFCs (the Request For Comment documents that define the protocol at a low-to-mid-level), or even Cisco documentation (Cisco did not invent BGP, but Cisco's BGP implementation is almost definitely the most widely-used) does not really tell you enough. Many of the "routing gurus" out there got started by looking at and working on running networks, where the architecture and implementation were already done. Most of the rest, however, started with the basics and expanded their knowledge and experience as their networks grew.


PREREQUISITES

You need to know a bit about IP routing to digest this material. It also doesn't hurt to have a few of the aforementioned test routers (at least two, one configured as you and one configured as your provider). Don't be afraid to ask for help. Read your vendor's BGP documentation - all of it, even the parts you don't understand. Try to get a number of "live configs" for whatever router you're using - preferably from someone with a similar topology and similar goals.


BGP

BGP stands for Border Gateway Protocol. The popular "BGP" protocol that people speak of ("Can a Cisco 2501 speak BGP?") in use is actually BGP4 (which differs from BGP3 the same way that RIPv2 differs from the old RIP protocol - in that BGP4 and RIPv2 (the result of what some call "unsuccessful brain surgery" on the original RIP protocol) allow the announcement of "classless routes" - routes that aren't strictly on "Class A", "Class B", or "Class C" boundaries - but instead can also be "subnets" or "supernets"). For more information on "classless" or "CIDR" routes, see April's Boardwatch column.


ROUTING: INTERNAL (INTERIOR) AND EXTERNAL

Internal routing is the art of getting each router in your network to know how to get to every location (destination) in your network. You can do this simply, with static routes, or in a more complicated but robust way, with active internal routing protocols such as RIP, RIPv2, OSPF, and IS-IS.

It's obviously critical that any box inside your network know how to get (directly or indirectly) to any other box inside your network. Before you invite people to send data to your network, you've got to have a running and happy network to take the data.

If you default route into one or more providers, external routing isn't something you have in your network. But if you do want to "peer" with someone - or to "multi-home" to multiple providers and have a little bit more control over where your data goes on the Internet, you will be taking at least some external routes into your network (and will do so with BGP).


SO WHY IS BGP INTERESTING?

Well, as mentioned above, it's nice to have routing data for parts of the Internet in your routers.

But it is much more useful to tell people outside your network (upstream providers or "peers") about what routes (or portions of the IP address space) you "know how to get to" inside your network. The primary purpose of BGP4 (as we're studying it here) is to advertise routes to other networks ("Autonomous Systems").

An AS, or Autonomous System, is a way of referring to "someone's network". That network could be yours; a friend's; MCI's; Sprintlink's; or anyone's. Normally an AS will have someone or ones responsible for it (a point of contact, typically called a NOC, or Network Operations Center) and one or multiple "border routers" (where routers in that AS peer and exchange routes with other ASs), as well as a simple or complicated internal routing scheme so that every router in that AS knows how to get to every other router and destination within that AS.

When you "advertise" routes to other entities (ASs), one way of thinking of those route "advertisements" is as "promises" to carry data to the IP space represented in the route being advertised. For example, if you advertise 192.204.4.0/24 (the "Class C" starting at 192.204.4.0 and ending at 192.204.4.255), you promise that if someone sends you data destined for any address in 192.204.4.0/24, you know how to carry that data to its ultimate destination. The cardinal sin of BGP routing is advertising routes that you don't know how to get to. This is called "black-holing" someone - because if you advertise, or promise to carry data to, some part of the IP space that is owned by someone else, and that advertisement is more specific than the one made by the owner of that IP space, all of the data on the Internet destined for the black-holed IP space will flow to your border router. Needless to say, this makes that address space "disconnected from the 'net" for the provider that owns the space, and makes many people unhappy. The second most heinous sin of BGP routing is not having strict enough filters on the routes you advertise (more on this later). Anyway, the bottom line: Test your configs and watch out for typos. Think everything that you do through in terms of how it could screw up.

Also, one terminology note: Classless routes are sometimes called "prefixes". When someone talks about a prefix they're talking about a route with a particular starting point and a particular specificity (length). So 207.8.96.0/24 and 207.8.96.0/20 are not the same prefix (route). We'll mostly use "route" in this document.


BEING "CONNECTED" TO THE INTERNET

Throughout this discussion it's critical to think about what it means to be "connected" to the Internet. In order to be connected to the Internet, for each host that is "on the Internet", you need to be able to:

  • Send a packet out a path that will ultimately wind up at that host, and, just as critically,

  • That host has to have a path back to you. This means that whoever provides "Internet connectivity" to that host has to have a path to you - which, ultimately, means that they have to "hear a route" which covers the section of the IP space you're using, or you will not have connectivity to the host in question.

    Take a look at Figure 1. We'll explain more of the details below, but note the "Home Dialup User". He's connected to AOL, which is served by ANS (AOL actually owns ANS). We're using 10.10.20.0/24 as an example.

    The 10.10.x.x IP addresses are often used in examples because they're "reserved" space. Most networks will "filter" the RFC 1918 reserved space (10.0.0.0/8, 172.16.0.0/12, and 192.168.0.0/16), so people use them in examples because they don't get you into too much trouble if you accidentally try to use them (sort of like the film industry's yyy-555-xxxx phone number convention).

    In this example, the reason that an AOL dialup user can send a packet to 10.10.20.1 (for example) is that the ISP (AS 64512) advertised that route to the two upstream providers (AS 4969 and AS 701), who in turn advertised that route to AS 690 (ANS, which provides IP service for AOL).

    Every IP address that you can get to on the Internet is reachable because someone, somewhere, has advertised a route that covers it. The corollary to this is that if there is not a generally-advertised route to cover an IP address, no one on the Internet will be able to reach it.


    HARDWARE AND SOFTWARE FOR SPEAKING BGP

    The most commonly used implementations of BGP are Cisco routers, Bay routers, and PC clones running Linux, BSD, or some other Unix variant - and a program called gated to manage BGP.

    I recommend using Cisco routers (for many reasons). In particular, the Cisco implementation of BGP is relatively easy to use, get examples for, and debug - and there's a huge community of routing engineers that's familiar with the Cisco implementation and algorithms (there's much that isn't specified in the RFCs and is left up to the vendor to decide). Cisco's online documentation (UniverCD) isn't the best (it lacks a large number of case studies) but is a very good learning tool.

    PC-compatibles using gated are either the second- or third-largest community of BGP-speaking computers. You can build cheap PC routers that route Ethernet and t1 and have more than enough CPU and memory to handle all the routes you'd need for quite some time - but you've then got hardware that's not really as tested or reliable as a Cisco or Bay router. Trust me on this - the cost savings is usually not worth doing it this way. (Apologies to Riscom and ET, the leading vendors of T1 cards-for- PCs).

    Bay routers are the second-largest community of BGP-speaking boxes - but we're talking about a very small percentage of the number of BGP- speaking Ciscos out there. Bay is cheaper than Cisco; pretty responsive to customers (though Cisco is as well); and almost all configuration is done through a GUI (windowing) interface that drives most routing engineers nuts. Bay claims they're working on a command-line interface, (BCC, or "Blatant Cisco Clone"), but in the mean time most are throwing money at Cisco. (It's much easier to debug BGP or other routing problems from a telnet session or over the phone than it is to have to guide someone through a GUI to examine or reconfigure a router). On the other hand, the Bays do have a better architecture and are finally showing themselves to be more or less as stable as Ciscos. What I've seen of BCC looks quite promising, and I promise to retract in print my slam of Bay when their command line interface looks featureful, fast, and solid.

    We're going to talk about Cisco routers in these documents (and in this document in particular).


    PEERING SESSIONS AND ASNs: PART I

    There's a bunch of terminology associated with BGP. We already talked about Autonomous Systems (ASs). An ASN, or Autonomous System Number, is just that - a number used to represent that Autonomous System to the world. That number "identifies" your network to the world. Except for Sprintlink, most networks out there use (or at least show to the world) only one ASN.

    BGP-speaking routers exchange routes with other BGP-speaking routers via peering sessions. At a technical level, this is what it means to "peer with someone". A snippet of a Cisco "BGP clause" is:

    router bgp 64512
     neighbor 207.106.127.122 remote-as 701
     (omitted lines)
     neighbor 137.39.10.46 remote-as 4969
     (omitted lines)
    
    The "clause" starts out by saying "router bgp 64512". This means "What follows is a list of commands that describe how to speak BGP on behalf of ASN 64512". 64512 is also a "reserved" number - it's a number in the "reserved" section of ASNs (ASNs go from 1-65535).

    In order to bring up a "peering session", all you need to do is have that one line. In this example, 137.39.10.46 is the remote IP address of a UUNET router (UUNET is ASN 701). Remote, that is, with respect to the customer's router. 207.106.127.122 is the remote IP address of a Net Access router (Net Access is ASN 4969). See Fig 1 for a diagram of the network layout used in this example.

    In practice, however, you almost always use more than that one line to tell BGP how to exchange routes with that "neighbor" via that "peering session". A typical "neighbor clause" is:

    router bgp 64512
     (omitted lines)
     neighbor 207.106.127.122 remote-as 4969
     neighbor 207.106.127.122 next-hop-self
     neighbor 207.106.127.122 send-communities
     neighbor 207.106.127.122 route-map prepend-once out
     neighbor 207.106.127.122 filter-list 2 in
     (omitted lines)
    

    WHAT DO YOU DO WITH BGP?

    Speaking BGP to your provider(s) and/or peers lets you do two things:

  • Make (semi-)intelligent routing decisions (decide what is the "best" path for a particular route to take outbound from your network, as opposed to simply setting a default route from your border router(s) into your provider(s)), and, more importantly,

  • Announce your routes to those providers, for them to in turn to announce to others (transit) or just use internally (in the case of peers).


    PEERING SESSIONS

    The purpose of the "neighbor" clauses is to bring up "peering sessions" with neighbors. For the purposes of this document, all neighbors must be either on the other end of a leased-line from you - or on a LAN interface (Ethernet, Fast Ethernet, FDDI). It is possible to have BGP peering sessions that go over multiple "hops" - but "eBGP multihop" is a more advanced topic and has many potential pitfalls.

    Every time a neighbor session comes up, each router will evaluate every BGP route it has by running it through any filters you specificity in the "neighbor" clause. Any routes that "pass" the filter are sent to the remote end.

    While the session is up, "BGP Updates" will be sent from one router to the other each time one of the routers knows about a new BGP route or needs to "withdraw" a previous announcement ("promise").

    The "sho ip bgp summ" command will show you a list of all peering sessions:

    brain.netaxs.com#sho ip bgp summ
    BGP table version is 1159873, main routing table version 1159873
    44796 network entries (98292/144814 paths) using 9596344 bytes of memory
    16308 BGP path attribute entries using 2075736 bytes of memory
    12967 BGP route-map cache entries using 207472 bytes of memory
    16200 BGP filter-list cache entries using 259200 bytes of memory
    
    Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State
    205.160.5.1     4  6313       0       0        0    0    0 never    Active
    207.106.90.1    4 64514 1145670  237369  1159873    0    0 4d03h
    207.106.91.5    4 64515    6078    5960  1159869    0    0 4d03h
    207.106.92.16   4 64512    6128    6782  1159870    0    0 4d03h
    207.106.92.17   4 64512    5962    6894  1159870    0    0 10:08:46
    206.245.159.17  4  4231  161072  276660  1159870    0    0 2d05h
    207.44.7.25     4  3564    6109  310292  1159867    0    0 22:40:50
    207.106.33.3    4 64513  164708  724571  1159866    0    0 3d23h
    207.106.33.4    4  3564    6086  274182  1159853    0    0 4d03h
    207.106.127.6   4  6078    5793  310011  1159869    0    0 2d03h
    
    This is a session summary from one of Net Access's core routers. The 6451X ASes are BGP sessions to other Net Access routers (using confederations, which we'll talk about in a future document) - those ASNs are not shown to the world.

    Most of it is pretty self-explanatory; briefly:

  • The "V" column is the BGP version number. If it is not 4, something is very wrong! BGP version 3 doesn't understand about Classless ("CIDR") routing and is thus dangerous.
  • The AS column is the remote ASN.
  • InQ is the number of routes left to be sent to us.
  • OutQ is the number of routes left to be sent to the other side.
  • The Up/Down column is the time that the session has been up (if nothing is in the State column) or down (if something is).
  • Anything in the State column indicates that the session is not up. Note: A State of Active means that the session is inactive. Just one of the nomenclature flaws of BGP.

    More on all of this below.


    eBGP vs. iBGP

    We're talking about eBGP in this document. eBGP and iBGP share the same low-level protocol for exchanging routes, and also share some of the algorithms, but eBGP is used to exchange routes between different Autonomous Systems, while iBGP is used to exchange routes between the same Autonomous System. In fact, iBGP is one of the "interior routing protocols" that you can use to do "active routing" inside your network. We'll talk more about iBGP in a future document when we cover all of the major interior routing protocols: OSPF, iBGP, IS-IS, RIP, RIPv2.

    The major difference between eBGP and iBGP is that eBGP tries like crazy to advertise every BGP route it knows to everyone - you have to put "filters" in place to stop it from doing so. iBGP is actually pretty difficult to get working because it tries like crazy not to redistribute routes - in fact, all iBGP-speakers inside your network have to peer with all other iBGP "speakers" in order to make it work. This is called a "routing mesh" and, as you can imagine, is quite a mess. If you have 20 routers, each router has to peer with every other router. The solution to this is "BGP confederations", also a topic for a future document.

    Also, iBGP has major drawbacks as an IGP. The main one is the necessity to "peer up" every set of routers in your network (or in one POP if you're using confederations). Protocols like OSPF and IS-IS just "find" each other over serial and Ethernet interfaces (they're "broadcast" protocols). This can be a pain (you don't want to accidentally merge your IGP with a customer's or peer's) but turning off broadcasting on certain ports is easier than turning on peering sessions between a new router and every other router on your network. Also, iBGP doesn't do as good a job at "convergence" (closing the gap and re-routing around failed network segments) as OSPF and IS-IS.


    BGP AND THE SINGLE-HOMED

    When you have one upstream provider, it is rarely desirable to speak BGP to them. Why? Well, you only have one path out of your network. So filling your router with 45,000 BGP routes isn't going to do you any good, since all of those routes point to the same place (your one upstream provider).

    And if you have one upstream provider, it's almost guaranteed that you are using sub-allocations (CIDR delegations, to be precise) of their larger IP blocks ("aggregates"). In this case your provider is not going to advertise your more "specific" routes because:

  • It's pointless to waste slots in thousands of routers around the world - if you are in your provider's address space, other networks will get to you just as well by following the announcements of the aggregate blocks as if they also saw your more specific routes being advertised. For example, if you are using 207.106.96.0/20 out of your provider's 207.106.0.0/16 netblock, having the 207.106.96.0 be "out there" is redundant, since the 207.106.0.0/16 route covers that space as well. The only way to reach you is going to be through your provider - whether the outside world sends a packet to that provider based on a 207.106.0.0/16 or 207.106.96.0/20 route makes no difference - the packet still goes to the same place. So the world would prefer to not see that 207.106.96.0, since it takes up an extra slot in the global routing tables. (Hearing another "view" of a route takes up almost 10 times less memory than hearing another route. And only the a route of the same specificity can be considered another "view" of a route.)

  • If there's always one and only one path to your network, your provider should always advertise your routes (specific or in the aggregate) to minimize CPU consumption on routers world-wide due to "route flap". Also, enough routers out there severely penalize you if your route(s) "flap" that you want your provider to always advertise you (and thus not make internal instability reflect itself on a global level). Why? If your T1 goes down and your provider is advertising you as 207.106.96.0/20, they have to withdraw that routing assertion. If you go up and down enough times to flap, you'll be "black-holed" from large sections of the Internet. But if you're behind 207.106.0.0/16, you won't be black-holed unless your provider flaps their /16 announcement (which should in theory be less likely - if it isn't, choose another provider).


    AS-PATHS

    Every time a route is advertised via BGP, it is "stamped" with the ASN of the router doing the advertising. As a route moves from Autonomous System to Autonomous System (network to network), it builds up an "AS-PATH". Each route starts out with a "null AS-PATH", represented by the regular expression "^$". See Fig 1 - the blocks that show the routes as they move from hop to hop show you the AS-PATH accumulating as the route moves from network to network.

    The AS-PATH is useful for a number of reasons:

  • It provides a "diagnostic trace" of routing on the 'net. If you have "full routes" in one of your routers, or have "query access" to a router that does (such as telnet://route-server.cerf.net), you can find the route that encompasses a particular IP address and see which ASNs have advertised it. If you do some poking around, you can even see how a provider is actually connected (as opposed to what they might claim...)

  • It is one of a number of metrics that determine how routes "heard" via BGP are inserted into the actual IP routing table.

  • It is something that allows you to do "policy routing" of sorts (though policy routing has many different definitions, so watch out) - basically, you use the AS-PATH to filter routes. Why would you want to do this? Perhaps you only want to take UUNET, MCI, and ANS route from one provider (because of limited memory in your router). Or perhaps you want to make sure you only send routes originating in your network. There are many reasons (which will become clear as you read on) why you'd want to filter based on the AS-PATH. While it's true that most filtering is now done with communities (a community is another number which you can stamp on a route heard or to be announced via BGP - we'll go into communities shortly), AS-PATH filtering the best "first step" that you can work with to get comfortable with filtering routes. And if your network is fairly simple (as 90% of the networks out there are), you won't need anything fancier for quite some time.


    AS-PATH LENGTH AND BGP ROUTE SELECTION

    For routes of the same specificity, as-path length is going to be the deciding factor in choosing which of multiple routes gets used by the router (i.e. put into the IP routing table) when you're just starting out.

    See Fig 2 for a sample list of routes from an actual BGP routing table - and further explanation. Notice, though, the >'s to the left of the some of the routes. The ">" indicates the route that the router currently thinks is "best" when there are multiple choices.


    Fig 2.

    A SNIPPET OF A BGP ROUTING TABLE

    COMING SOON TO A TUTORIAL NEAR YOU.


    AS-PATH ACCESS LISTS (FILTERS)

    We'll use Cisco commands to illustrate AS-PATH filtering and "regexp matching". Each line of a Cisco AS-PATH filter looks like:

    ip as-path access-list NNN permit regexp
    
    or:
    ip as-path access-list NNN deny regexp
    
    Where NNN is the number (same as the name in the case of as-path access-lists), and regexp is very similar to Unix "regular expressions". (See Fig 3 for a summary of regexp characters, and the O'Reilly and Associates Regexp book for more information about regular expressions).


    Fig 3
    
    Regexp characters:
    
    NNN       match the characters NNN (where each digit of NNN is
    from 0-9)
    ^         match the beginning of a string
    $         match the end of a string
    _         match any of {space, beginning of a string, or end of a string}
    _NNN_     match the "word" or "distinct number" NNN.  Thus, the regexp
              "_1_" will match the string "3561 1 64000" but not "3561".
              (The problem is that if you don't anchor NNN with "_"s on 
              either side, you might match something you don't really want to).
    
    (regexp)  enclosing another regexp in parens means that the appearance of that
              regexp is optional
    *         the * operator means that the previous regexp can be matched 
              0, 1, 2, or any number of times.  To be safe, only use * in 
              conjunction with parens.
              Thus, (regexp)* matches the regexp inside the parens 0 or
              any number of times.
    
    [char1char2char3]   matches any one of char1, char2, char3, etc... 
                        Each charN expression can be an actual number or other 
                        symbol, or a range (i.e. 0-9, a-z).
    
    If you want to match any of the special symbols, you can escape them by
    putting a \ in front of them.  The only special symbols you'll want to 
    escape when matching against AS-PATHs are the parens, which pop up in 
    AS-PATHs when you use BGP confederations.
    

    We'll explore regular expressions and as-path access-lists by example. Remember the first rule of Cisco access-lists: There's an implicit deny .* at the end of every access list. Even so, it never hurts to add one just to be safe (we'll do that below).

    Important note: On Ciscos, regexps are matched against the AS-PATH as if the whole thing is a string, not a sequence of numbers. Thus, as you'll see below, you need to enclose ASNs within underscores to be sure of matching only the ASN you're looking for.

    How do access-lists work? When used as a filter, each route is passed through the access-list. Each rule is listed in the order it will be applied. Once a route has been matched by any rule, the decision on whether to pass the route through the filter or to drop it (and thus not let it pass) is made immediately, and no further rules are processed.

    Example 1:

    ip as-path access-list 1 permit .*
    ip as-path access-list 1 deny .*
    
    This is a good one to have around; it permits every route to flow through the filter. The "deny .*" is completely extraneous to the filter - every route has already passed through the first line and the second line is never actually used.

    Example 2:

    ip as-path access-list 2 deny .*
    
    This is also a handy one to have around; you might well want to always remember the number of this "deny everything" access-list - the opposite of the "permit everything" list above.

    Example 3:

    ip as-path access-list 3 permit ^$
    ip as-path access-list 3 deny .*
    
    This access-list is the other of the triad of ever-handy ones: It permits only routes that originate within your AS (because of network statements or "redistribute" statements in "router bgp" clauses somewhere within your network).

    If you have these three as-path access-lists installed and remember their numbers you'll save yourself a lot of time you'd otherwise spend searching online or through config files to find where you put your "send everything"; "send nothing"; or "send only my routes" filter.

    Remember: BGP between different ASNs (eBGP) will, by default, cause a router to redistribute every BGP route that the router knows about. This could lead to VERY BAD THINGS happening. (If you redistributed all of Sprintlink's routes into UUNET, a portion of UUNET could start sending all of its Sprintlink traffic through your t1 and you'd hurt a reasonable chunk of the Internet. Both Sprintlink and UUNET do things to prevent you from doing this, but you should always be paranoid when dealing with BGP.)

    Again, the "deny .*" rule is useless here, except as a safety precaution, since the router would insert that rule anyway (remember, there's an implicit "deny .*" at the end of every Cisco filter list).

    A quick note: For those playing with BGP confederations on your own (a topic we'll talk about in a future document) note that your "permit internal routes only" filter might have to look something different ("permit ^$" will no longer be enough) - something like: "ip as-path access-list 30 permit ^(\([0-9 ]*\))*$". Or you'll be using BGP communities instead of AS-PATH filtering to control which routes you redistribute Everyone else please ignore this paragraph, unless you want to try to parse the regexp above as an exercise.

    For Examples 4 and 5, please consult Fig 4 for a list of common ASNs you'll see when examining routes. To find out who "owns" an ASN (funny concept - owning a 16-bit integer), issue a WHOIS query on "ASN NNN", where NNN is the ASN. Note: You may actually need to put quotes around the "ASN NNN", especially if you're doing the whois query from a command line.

    -----------------------------------------------------------------------------
    
    Fig 4 Common ASNs
    3561      MCI
    1239      Sprintlink (Sprintlink also uses other ASNs, but 1239 will always
              appear somewhere in the AS-PATH when looking at Sprintlink
              routes from some other provider)
    701       UUNET
    174       PSI
    1673      ANS (the old ANS ASN, 690, should be retired by now)
    1         BBN
    4200      AGIS (the old Net99 ASN, 3830, should be retired by now)
    4969      Net Access (which will appear in the examples)
    
    There are hundreds of ASNs in use in the Internet, and thousands of ASNs in use in internal networks all over the world. If you want to take a look at live ASN info, check out http://www.merit.edu/ipma/routing_table or telnet to route-server.cerf.net, a Cisco that cerf.net loads with multiple full BGP routing tables.

    -----------------------------------------------------------------------------
    
    Example 4:
    ip as-path access-list 20 permit _1_
    ip as-path access-list 20 permit _701_
    ip as-path access-list 20 permit _174_
    ip as-path access-list 20 permit _1673_
    ip as-path access-list 20 permit _4200_
    ip as-path access-list 20 deny .*
    
    The _NNN_ notation means "match NNN as a distinct word". This means that NNN must have whitespace on either side of it (or must be the first or last word - or both - in the AS-PATH).

    "_1_" would match "1"; "3561 1 6000"; and "3561 1" - but not "701". (ASN 1 is used by BBN, which has a bit of history in the Internet...)

    So - this as-path access list permits, in order, BBN, UUNET, PSI, ANS, and AGIS routes, and denies all other routes. If you had a Cisco 2501, you might want to do this to accept some routes from one of your providers in an attempt to load-balance traffic a certain way (perhaps you've noticed that provider B gets better BBN connectivity than provider A...

    Example 5:

    ip as-path access-list 20 deny _3561_
    ip as-path access-list 20 deny _1239_
    ip as-path access-list 20 permit .*
    
    This filter denies any MCI or Sprintlink route, and permits all other routes. As of 4/97, this should yield about 45,000 routes.

    This will fill up a 2501 with absolutely all of the routes it can take and still function well. It used to be that all routes on the 'net fit in a 2501 with 16mb - and that the 2501 could still function. Then, the routes would fit in but the 2501 didn't have enough CPU. Now, all of the routes on the 'net except for MCI, Sprintlink, or both will fit in a 2501 and still let it function at at least a single t1's worth of throughput.


    ENTERING, MODIFYING, AND DELETING as-path access-lists

    The major reason we usually append an explicit "deny .*" at the end of as-path access-lists (actually, all filter-lists in Ciscos) is that if you already have an as-path access-list of a certain number (say, "as-path access-list 3" above), and you try to re-enter it, the Cisco has no way of knowing that you want to delete the old list.

    So, as a security blanket, appending an explicit "deny .*" to a list ensures that you will at least not be able to modify an existing list's functionality.

    Let's say you had:

    ip as-path access-list 3 permit ^$
    
    And then you configured (perhaps as a typo, perhaps as a brain-o):

    ip as-path access-list 3 permit _1239_
    
    You would alter the functionality of an existing filter list and potentially start redistributing Sprintlink routes to your peers and/or upstream providers.

    But if you had:

    ip as-path access-list 3 permit ^$
    ip as-path access-list 3 deny .*
    
    Then adding a third rule of:

    ip as-path access-list 3 permit _1239_
    
    Would have no effect, since every route would either be permitted or denied by the time the router had finished evaluating the second rule (the "deny .*") and the third rule would never be looked at.

    So, to modify an existing access list, either:

  • Enter a new list with a different number; modify the "router bgp" clause's "neighbor a.b.c.d filter-list NNN in" clause by just typing "neighbor a.b.c.d filter-list new-number in" (use the same method for outbound as-path filter-lists). Then, replace the old as-path access-list and change the "neighbor a.b.c.d filter-list ..." clause back to its original state. This is the safe way to do things; or:
  • If you know what you're doing, you can just enter "no ip as-path access-list NNN" to delete the list, then enter the new list (preferably via cut-and-paste or tftp, as opposed to simply typing the new list in, since any filter that refers to that list will be in a "deny .*" mode until the new list is in place. Please use the first method. If you have anything but "permit" clauses in your access-lists, you can do damage (redistribute routes you shouldn't) by not using the first method.


    BGP METRICS (ATTRIBUTES) AND ROUTE SELECTION: INTRODUCTION

    First, remember the primary rule of IP routing: The most specific route always wins.

    There are, however, rules for how a Cisco will select the "best BGP" route when there are multiple BGP route possibilities of the same specificity.

    It goes (basically):

  • Route specificity and reachability and reachability
  • BGP weight metric
  • BGP local_pref metric
  • Internally originated vs. Externally originated
  • AS-PATH length
  • BGP metric (MED) BGP weight, MED, and local_pref metrics are just integers associated with each route. They can be unset (zero) or can be set. Unless you set them yourself, it's unlikely that you'll have to worry about them.

    For "competing" BGP routes, the most likely way the router's going to pick the best route (if you aren't playing games with weights) is by looking at the AS-PATH lengths.


    BGP PATH SELECTION PROCESS ACCORDING TO CISCO

    It is:

    "BGP selects only one path as the best path. When the path is selected, 
    BGP puts the selected path in its routing table and propagates the path to 
    its neighbors. BGP uses the following criteria, in the order presented, to 
    select a path for a destination: 
    
       1. If the path specifies a next hop that is inaccessible, drop the update. 
    
       2. Prefer the path with the largest weight. 
    
       3. If the weights are the same, prefer the path with the largest local 
          preference. 
    
       4. If the local preferences are the same, prefer the path that was 
          originated by BGP running on this router. 
    
       5. If no route was originated, prefer the route that has the shortest 
          AS_path. 
    
       6. If all paths have the same AS_path length, prefer the path with the 
          lowest origin type (where IGP is lower than EGP, and EGP is lower than 
          Incomplete). 
    
       7. If the origin codes are the same, prefer the path with the lowest MED 
          attribute. 
    
       8. If the paths have the same MED, prefer the external path over the 
          internal path. 
    
       9. If the paths are still the same, prefer the path through the closest 
          IGP neighbor. 
    
      10. Prefer the path with the lowest IP address, as specified by the BGP 
          router ID."
    
    In addition to the "core" data about a route (where in the IP space it starts; how long it is (the "specificity"); and what the next hop is, there is other data embedded in BGP routes, most of which are either used for route selection or for additional debugging information for humans.
    Fig 8: BGP attributes
    
    For more info, see:
    
    RFC 2042: Registering New BGP Attribute Types
    RFC 1997: BGP Communities Attribute
    RFC 1773: Experience with the BGP-4 protocol
    RFC 1771: A Border Gateway Protocol 4 (BGP-4)
    
    To get an RFC, go to: http://www.internic.net/rfc/rfcXXXX.txt
    
    BGP ATTRIBUTE TYPES
    
    Value Code            Possible Values
    ---- ---------------- -----------------------------------------------
      1  ORIGIN           0 (IGP); 1 (EGP); 2 (Incomplete)
                          This attribute specifies the origin of a route.
                          Straightforward except that "Incomplete" means
                          that the route got into BGP by redistribution from
                          an IGP.
      2  AS_PATH          0-N 2-byte values
                          A list of the ASNs of all ASs the route has traversed.
      3  NEXT_HOP         IP Address
                          The most critical attribute; where to send data destined
                          for this route.
      4  MULTI_EXIT_DISC  0-2^32
                          A weight; designed to go outside and inside of an ASN.
      5  LOCAL_PREF       0-2^32
                          A weight; not designed to go outside of an ASN.
      6  ATOMIC_AGGREGATE TRUE/FALSE: If present, true; otherwise, false.
                          Present if this route was not the most specific one
                          known by the advertiser.  Dangerous stuff.
      7  AGGREGATOR       {ASN,Ip address} pair.
                          Data to indicate who formed the route if the route
                          is an aggregate of smaller routes.
      8  COMMUNITY        0-N 4-byte values ("communities")
                          To be covered in a future document.
      9  ORIGINATOR_ID    Used for BGP Route Reflection
                          To be covered in a future document.
     10  CLUSTER_LIST     Used for BGP Route Reflection
                          To be covered in a future document.
    

    The rules above are fairly straightforward, but use some of the route attributes that we'll be getting into in more detail in the future.

    Briefly:

    (Rule 2)

  • If you don't set them explicitly, BGP weights are 32768 for routes originated by, and 0 for routes coming from other routers. The BGP weight is not actually an attribute (in that it's not redistributed from one router to another as part of a BGP route update). A higher weight is "better" (means the route will be preferred over a route with a lower weight).

    (Rule 3)

  • The local_pref is a BGP attribute, and is set to 100 by default. Again, a higher weight is better.

    (Rules 2-3,5)

  • Setting weights and local_prefs gives you some control over "routing policy", but for beginners, filtering based on AS-PATH data should be more than sufficient.

    (Rule 6)

  • Origin isn't something you get to play around with. IGP means a route was injected into BGP with a "network" statement; EGP means it was heard via BGP from a remote AS; and incomplete means it was injected into BGP by "redistributing" from an IGP.

    (Rules 7-8)

  • A MED (or "BGP metric") is Yet Another Weight you get to play with. We use MEDs internally at Net Access to tune things (because we prefer to let the router first pick the route with the shortest AS-PATH, and BGP weights and local_prefs are looked at before AS-PATH length). Again, you typically won't be setting this until you have worked more with BGP.

    (Rule 9)

  • If you run "active routing" internally (an IGP other than static routes), there's some notion kept with each route of the "distance" for each route as it's passed around your network. Let's say you have two border routers and you're selecting between two equal-specificity, equal- AS-PATH-length, routes - one from each border - and that no weights, local_prefs, or MEDs have been set. This rule ensures that the router will do what is most natural - to send the packet towards the closest router of the two routers advertising the route. We'll explain this more, with diagrams, in a future document, as it involves an understanding of how IGPs such as OSPF and IS-IS function.

    (Rule 10)

  • Now we're down to guessing. There has to be some tie-breaker, and since BGP router ID should be unique, Cisco chose to make this the final factor.

    For further reading, see for more details.

    We'll be talking about using these metrics in the near future. If you want to experiment in the mean-time, that document shows you how to set these metrics. Please experiment first on test or lab networks! If you've got proper filters in place, experimenting with these things won't affect the outside world - but it could make your customers very unhappy...

    Another very big caution: BGP weights and local_prefs are very powerful. Realize that if you advertise routes for a customer that you hear via BGP, you could wind up preferring an external route for that customer if you set the BGP weight or local_pref too high (or at all) for external routes. The customer won't like this - if you prefer an external route for that customer, you're not going to advertise them to your transit providers any more, which will probably not please that customer...


    EGP vs. IGP

    EGP usually means "External Gateway Protocol". IGP usually means "Interior Gateway Protocol", though it can get confusing, because different people and vendors use different terminology for the same thing. Since I am a Cisco proponent, these documents use terminology used by the routing community, with a Cisco dialect.

    Routers which route IP packets have to have an "IP routing table". In that table are one or more routes of a particular {starting point, length, metric}. This IP routing table gets filled with routes heard from various sources - or configured statically (in the router's configuration store). BGP routes migrate into the IP routing table only if:

  • They are more specific that any other route of "lower preference"; or
  • They are the only route of a particular specificity.

    Here's a brief outline of the "order of preference" for filling the IP routing table. The exact order can be found in the Cisco documentation.

  • Connected routes (IP addresses and routes of router interfaces) first; then
  • Static routes (routes configured in router configurations with 'ip route' statements); then
  • Routes learned via an IGP (RIP, RIPv2, OSPF, IS-IS, ...); then
  • Routes learned via BGP and other EGPs.

    One note, though: Since static routes are really considered an "IGP" routing mechanism, there are ways to get other IGP-learned routes (say, via OSPF) to be preferred over static routes, but again - if you don't play with weights, this shouldn't be a worry.


    WHAT IS ROUTE FLAP AND WHY IS IT BAD?

    When you "assert" a route - saying "I know how to get to 192.204.4.0/24" based on some internal knowledge that you actually do know how to get to 192.204.4/0, the natural (and previously-though-to-be-correct-thing-to-do) is to "withdraw" that assertion if you in fact no longer know how to get to 192.204.4.0.

    But look at what happens when you withdraw that assertion. Your provider(s) must then also withdraw that assertion. And then their provider(s) and peer(s) must do the same. All in all, thousands of routers around the world now have to look at that route and decide if they have a next-best path in their BGP (or other routing) table, and insert it as the current best path in their IP routing table. This consumes many CPU-seconds on routers that are sometimes very busy.

    In fact, it was consuming so much CPU time a few years ago that Sean Doran of Sprintlink said "this must stop" and a few people came up with an idea (which Cisco implemented in record time) to "damp"(en) the "route flap"s. You'll hear people say "damp" and "dampen". There's no real consensus about which is the correct term.

    What this means in practice today is that if your routes flap more than one or two complete up-down-up cycles, you will be dampened by many providers for at least an hour or so. So even if you're only "single-homed", you will be dampened if your provider withdraws your routes every time your t1 flips up and down a few times because some Bell guy tripped over a wire.

    So do not ask your upstream provider to announce you unless it makes a difference (the benefit of being multiply-announced outweighs the possible negative effects of being dampened due to instability in either your or your provider's network).


    WHAT TO KEEP IN MIND WHEN CONFIGURING BGP

    When you're bringing up a new BGP session, or considering how to do BGP in general, the things to keep in mind for each peer are:

  • What routes do you want them to hear? Do you want to "tune" your announcements somehow (more on this later). The most important thing is to ensure that you do not redistribute routes that you are not providing "Internet connectivity" to; and

  • What do you want to do with the routes that you hear via the session? Do you want to "tune them"? Only take some? Take them all? Do you have the memory and CPU in your router to really do what you want?


    BGP AND PEERING

    Actually, we'll devote a whole document to this in a month or two.

    What we're talking about in this document is BGP and transit - getting global transit from upstream providers as opposed to peering, which is just mutual sharing of customer routes.


    INTERNET CONNECTIVITY WITHOUT BGP

    Let's review what happens when you are connected to the Internet without speaking BGP to your provider.

  • You create a default route towards your upstream provider, and all non- local packets go out the interface specified by the route; and

  • Your provider probably put static routes towards you on their side, and redistributes those static routes into their IGP, and then probably redistributes their IGP into BGP - unless all of their BGP is done statically (more on this in a future document).

    Basically, if you have any address space "inside" of your provider's larger "netblock" or "aggregate", you won't be advertised to the outside world specifically - your provider will just advertise their larger block. If you have any other networks (an old Class C; customers with address space; etc...) your provider will just statically announce those routes to the world and statically route them inside their network to your leased-line/ router interface(s).

    With BGP, your provider gives you all of the routes they have (the easy part), and listens to your route announcements and then redistributes some or all of those to their peers and customers. This is the hard part (for them - just worry about understanding and configuring your end for now). The net difference is "just" that they may start advertising a more specific route (no mean task in a complicated network designed, as most networks are, to prevent the accidental "leaking" of more specific routes) or that the routes that they normally advertise for you under just their ASN will now have your ASN attached as well.


    BGP AND THE SINGLE-HOMED

    If you've only got one upstream provider, why speak BGP to them? Well, you could say "practice", but in general, no upstream provider's going to waste their time configuring BGP with you (since it generally involves a fair amount of behind-the-scenes work on their part) unless you have a good reason.

    And you don't really need "full routes" so that you can "run defaultless" if you're single-homed. Since every packet destined for the Internet (as opposed to your internal network ) is going to go out the same router interface, it doesn't matter whether it's via one default route or via searching a list of 45,000 or more routes heard via BGP.

    The only really valid reason is that you want to be able to have more control in advertising your routes. Of course, you'll have to argue around the flap argument even if you have your own provider-independent address space (if you're singly- connected to the 'net, why bother all of the routers in the world by telling them whether you're reachable or not currently) and the routing-table space argument (if you're in your provider's IP space or "aggregate announcement"), why pollute the routing tables with an extra few routes by announcing your routes more specifically?

    You're on your own for the answers to these questions. If you think you have a good case, either talk to your current or potential provider, or perhaps send a question off to the inet-access list and see if anyone can help.

    If you do want to configure BGP and are single-homed, follow the instructions on how to announce your networks (routes), and either filter all incoming routes - or accept them if you feel you really want to.


    BGP AND THE MULTI-HOMED

    OK, so you're multi-homed. What is the most important thing about BGP to you? The ability to have it announce routes. Getting "full" or "partial" routes from your providers is "cool" - and may even be useful - but you can do almost as well by just load-balancing all outgoing traffic in either a "round-robin" or "route-caching" manner. (More on this later in this document).

    So the most important thing about being multi-homed is the ability to have your routes advertised to your providers - and by them to their providers and peers (i.e. to "the rest of the Internet"). Doing this basic level of route advertisement is not hard. You just have to do it in a paranoid way.

    If you screw up BGP routing you may get slapped down pretty hard. Screwups with BGP route advertisements can be felt all over the Internet. To repeat: Screwups with BGP route advertisements can be felt all over the Internet. If your provider is smart, they will also implement "filters" to prevent you from screwing them and the Internet up. But don't count on it.

    If you were to announce a route that was more specific than, say, the otherwise-best route for Yahoo's web servers, you would black-hole Yahoo for a period of time. Needless to say, they would not be very happy with you. The solution is to do good filtering on your end - and for your provider to also do excellent filtering wherever possible.

    Before you start playing with BGP, you might really want to wait and read the "Configuring a Cisco Router" document (also coming out in the next few months). If you do go ahead and are implementing BGP for the first time, get a friend or another provider to review your proposed configs for you before implementing them. And for a summary of BGP-related Cisco commands, see the BGP Cisco Commands sidebar.


    MULTI-HOMING AND LOAD-BALANCING

    Generally, the goal of multi-homing is to use both connections in a sane manner and "load-balance" them somehow. Ideally, you'd like roughly half the traffic to go in and out of each connection. You'd also like "fail-over" routing, where if one connection goes down the other one keeps you connected to the Internet. In an ideal network, you'd be able to have any one of your connections to the 'net go down and still maintain connectivity and speed.

    We'll talk a bit about how you load-balance incoming and outgoing traffic to and from your network. Incoming traffic is controlled by how you announce your routes to the world (packets will flow into your network because someone out there heard and is using a route announcement). Outgoing traffic is controlled by the routes that you allow to flow into your border router(s) - and is thus much easier to control and tune.


    HOW TO ANNOUNCE YOUR NETWORKS

    We'll now describe the safest way to announce your routes via BGP.

    There are many other ways, some of which we'll talk about in future document. The way we at Net Access do it is by redistributing from our IGP (IS-IS), through a filter list, into BGP. While we do run BGP inside our network, it's strictly to pass external route announcements through the various parts of our network - no internal routes are ever passed from one of our routers to another one of our routers with BGP. But when we first started speaking BGP, we set our routers up the way described below.

    You'll always set "next-hop-self" on all peering sessions. See the sidebar on next-hop-self for an explanation.

    The safest way to announce your routes with BGP is to configure everything statically. You can think of the process described below as turning networks into route announcements.

    To do this:

  • Add a static route for it to the Interface Loopback0 with a weight higher than any other static route for that network (higher numbers for static route weights mean that the routes are less preferred).

  • Configure a router BGP clause like the one below, with static network statements to announce your routes, and "sanity filters" in place to make sure you only announce your routes and only take the routes you want.

    For example, let's say you're routing the following networks (also called "netblocks" sometimes):

    170.100.0.0/16 (a /16 has a netmask of 255.255.0.0) 192.204.44.0/24 (a /24 has a netmask of 255.255.255.0) 206.8.128.0/17 (a /17 has a netmask of 255.255.128.0) 207.126.0.0/18 (a /18 has a netmask of 255.255.192.0)

    You'd first configure your router with:

    int Loopback0
     descr Loopback interface for routes to be nailed to.
    ip route 170.100.0.0 255.255.0.0 Loopback0 10
    ip route 192.204.44.0 255.255.255.0 Loopback0 10
    ip route 206.8.128.0 255.255.128.0 Loopback0 10
    ip route 207.126.0.0 255.255.192.0 Loopback0 10
    

    Then:

    ip as-path access-list 2 deny .*
    ip as-path access-list 3 permit ^$
    ip as-path access-list 3 deny .*
    
    router bgp 64512
     network 170.100.0.0 mask 255.255.0.0
     network 192.204.44.0 mask 255.255.255.0
     network 206.8.128.0 mask 255.255.128.0
     network 207.126.0.0 mask 255.255.192.0
     neighbor  remote-as 
     neighbor  next-hop-self
     neighbor  filter-list 3 out
     neighbor  filter-list 2 in
    

    Explanation:

    This method "statically nails down" the route announcements being advertised with the "network" statements. In order to nail them down, there must be: (1) Underlying static routes with the same netmask as each route being advertised with a network statement; and (2) Those underlying static routes must not go away. The purpose of the Loopback0 routes is to ensure that even if an existing primary route which matches the netmask of the route being announced (and this is often not the case) goes away, the Loopback0 route (with a weight of 10, which means it's only a "backup" route to any route without a weight at the end) will kick in and keep the BGP route advertisement stable. (Loopback0 routes always stay installed since there's no physical interface to go down and cause the route to be withdrawn - the interface Loopback0 will always be up, so the routes pointed to them will always be installed.)

    This example uses a "deny everything" incoming filter, so it will only announce routes - it won't accept any. If you want to accept all incoming routes, replace the "filter-list 2 in" with "filter-list 1 in". Actually, you could just not specify an "inbound as-path filter" - and the effect would be the same - but it's better by far to be explicit about these things.

    To add more peers, just create another similar neighbor statement. Ciscos give you 30 seconds to finish typing the neighbor statement before they start trying to establish the session. It is critical that you get those "neighbor somebody filter-list xxx .." statements in there by then. The best way by far to do it is to either cut and paste or tftp in a complete neighbor statement to the router.

    Here's an example of a completely filled-in bgp clause, based on the example above (note that the 64512 is a fictitious IP address).

    router bgp 64512
     network 170.100.0.0 mask 255.255.0.0
     network 192.204.44.0 mask 255.255.255.0
     network 206.8.128.0 mask 255.255.128.0
     network 207.126.0.0 mask 255.255.192.0
     neighbor 207.106.127.45 remote-as 4969
     neighbor 207.106.127.45 next-hop-self
     neighbor 207.106.127.45 filter-list 3 out
     neighbor 207.106.127.45 filter-list 2 in
     neighbor 137.10.10.121 remote-as 701
     neighbor 137.10.10.121 next-hop-self
     neighbor 137.10.10.121 filter-list 3 out
     neighbor 137.10.10.121 filter-list 2 in
    


    BEING ADVERTISED BY MULTIPLE PROVIDERS WITHOUT PI-SPACE

    Remember April 1997's document on getting provider-independent (PI) space? The reason it's so important to have "your own" ip space is that without it multi-homing is quite tricky and requires a lot of cooperation from your original provider. Why?

    Let's say you are using 207.106.96.0/20. Your provider (let's call him oldprovider) has 207.106.0.0/16. So oldprovider announces only 207.106.0.0/16 to the world. There is no advertisement for 207.106.96.0/20 in this case - any packet destined to 207.106.96.0/20 will be picked up by the less specific (more general) route 207.106.0.0./16.

    Now you want to multi-home. So you buy a T1 from newprovider. You set up BGP with both oldprovider and newprovider. Suddenly, the world sees two routes for you:

    207.106.0.0/16, advertised by oldprovider; and 207.106.96.0/20, advertised by newprovider.

    Remember, the most specific route always wins, so newprovider will wind up carry almost all, if not all, of your incoming traffic! In fact, certain parts of oldprovider's network may actually prefer newprovider's t1 to get to you!

    The problem is that most large-ish providers use something called "aggregate-address statements" - and they certainly have some sort of filter to keep the more specific routes floating around inside of their networks from being advertised to the world. Remember, the world only wants to hear about 207.106.0.0/16 if the little, more specific routes inside of 207.106.0.0 are not multi-homed.

    So what does oldprovider have to do? Blow holes in their "filter". One way or another, it's going to take modifications in oldprovider's 'border' routers to make incoming load-balancing work properly for you - and oldprovider may not want to do this. Basically, everywhere that oldprovider peers with anyone else (and this is usually at least 5-10 places), they have to modify their aggregation statements or other filters to "allow" your more specific route announcement to pass through.

    This is why it's important to choose a primary provider based on how cooperative they'll be when you want to multi-home.


    CONTROLLING OUTGOING DATA FLOW: "FULL ROUTING"

    Believe it or not, you don't need BGP to balance the flow of traffic from your network (outbound traffic). There are many arguments for and against, but it's true that if you are multi-homed and have a sufficiently studly router (a Cisco 4500, 4700, 70x0, 720x, or 75xx will do, but Cisco 4000s and 2501s will not), accepting full BGP routing from your multiple providers is a Good Thing. See the sidebar for an explanation of how to balance outbound traffic without BGP.

    There are a couple of reasons. First, each provider obviously knows best the way to get to its customers. Meaning, if you're multi-homed to Sprintlink and UUNET, you always want to send data to Sprintlink customers out your Sprintlink T1 and data to UUNET customers out your UUNET T1. Second, though AS-PATH length is a pretty poor selection tool, it's what we've got right now - and it does bear some relation to an indicator of how "close" a given provider is to some other provider.

    So filling your router with routes from all of your upstream providers means that, for routes of the same specificity, AS-PATH length will decide which one actually gets used. See Fig 7 for examples and explanation.


    CONTROLLING OUTGOING DATA FLOW: "PARTIAL ROUTING": "CUSTOMER ROUTES ONLY"

    If you can't take full routes from your providers, you're going to have to either not use BGP to balance outbound traffic - or take less than full routes.

    The minimum set of "less than full" routes you'll want to take is customer routes from each provider (from each provider, get only the routes for them and their customers). This is a problem if your providers include Sprintlink and MCI, however, since Sprintlink and MCI customer routes together are such a large percentage of "full routes" that you can't really put Sprintlink and MCI routes in Cisco 2501s or 4000s either. You should, however, be able to put Sprintlink and any other few sets of customer routes or MCI and any other few sets in even a 2501 or 4000.

    The problem is getting just customer routes (also called "peering routes"). You can tell your providers to only send you customer routes - and most providers that do a significant amount of BGP can do this pretty easily - but if any one of your providers screws up (changes a filter list slowly, for example) then they may blast more than enough routes at you to "melt your router". Unfortunately, when many brands of routers (Ciscos included) run out of memory, they don't just shut down BGP routing - or crash and restart. Ciscos, in particular, do not handle running out of memory gracefully at all, and will gleefully consume so much memory with routing data that basic command functionality gets trashed and someone needs to physically power cycle the router.


    SO WHAT'S TO BE DONE?

    Get customer routes from your providers - but put sanity filters in place to protect yourself. For each provider, build an as-path access-list to use as a filter of what you will not accept from them. Let's say you're triply-homed to Sprintlink, UUNET, and Net Access. Use something like the following:

    (Ciscos use ! at the beginning of a line to denote a comment line.)

    ! Filter everything but Sprintlink (ASN 1239) from Sprintlink
    ip as-path access-list 40 deny _3561_
    ip as-path access-list 40 deny _701_
    ip as-path access-list 40 deny _1673_
    ip as-path access-list 40 deny _174_
    ip as-path access-list 40 deny _1_
    ip as-path access-list 40 deny _4200_
    ip as-path access-list 40 permit .*
    ! Filter everything but UUNET (ASN 701) from UUNET
    ip as-path access-list 41 deny _3561_
    ip as-path access-list 41 deny _1239_
    ip as-path access-list 41 deny _1673_
    ip as-path access-list 41 deny _174_
    ip as-path access-list 41 deny _1_
    ip as-path access-list 41 deny _4200_
    ip as-path access-list 41 permit .*
    ! Filter the major providers from Net Access
    ip as-path access-list 42 deny _3561_
    ip as-path access-list 42 deny _1239_
    ip as-path access-list 42 deny _701_
    ip as-path access-list 42 deny _1673_
    ip as-path access-list 42 deny _174_
    ip as-path access-list 42 deny _1_
    ip as-path access-list 42 deny _4200_
    ip as-path access-list 42 permit .*
    
    router bgp 64512
     
     neighbor  remote-as 1239
     neighbor  next-hop-self
     neighbor  filter-list 3 out
     neighbor  filter-list 40 in
     neighbor  remote-as 701
     neighbor  next-hop-self
     neighbor  filter-list 3 out
     neighbor  filter-list 41 in
     neighbor  remote-as 4969
     neighbor  filter-list 3 out
     neighbor  filter-list 42 out
    

    That will ensure that even if Sprintlink, UUNET, or Net Access screw up and blow you all of the routes they know about, you'll still take their customer routes but won't take the vast majority of other routes from them. (Sprintlink, MCI, UUNET, ANS, PSI, BBN, and AGIS) make up the vast majority of routes - well over 80-85% of the routes out there.

    Note: If you're a Sprintlink customer, you'll probably be peering with AS 179x - or at least some ASN other than 1239. Sprintlink uses ASNs for each major POP (as do many other providers) - but unlike other providers, these ASNs are visible to the outside world. Any non-Sprintlink customer route, though (any route from the outside world), will still have the ASN 1239 (which is Sprintlink's "peering" ASN) in the AS-PATH, though. The bottom line is that instead of below you'll have whatever ASN Sprintlink actually has you peer with.


    AS-PATH PADDING

    Some people just aren't content to leave things the way nature intended them. Bored routing engineers are very dangerous. If you don't give them work to do they'll either sit and read news or Cisco documentation - or start optimizing ("tuning") routing.

    AS-PATH padding is probably the most widely-used BGP tuning method, and we'll go into it in more detail next month.

    Basically, if you make sure not to set weights or local_prefs, AS-PATH length is going to decide which of multiple BGP routes of the same specificity will be preferred. So if you want to make one path preferred or another one not preferred, you can "pad" the AS-PATH with extra ASNs to make one path look longer than another. This is done with route-maps, which we'll talk more about next month.


    QUESTIONS AND COMMENTS

    I expect that this document will generate a lot of questions. Please do not send them to freedman@netaxs.com. Please use either the inet-access list, which I and many of my routing-geek friends patrol regularly, or bgp@netaxs.com. Thanks.


    THANKS TO

    In no particular order:

    Thanks to Alexis Rosen at Panix (alexis@panix.com), who sent me some last-minute suggestions for clarification and pointed out an ugly factual error. Thanks to John Hawkinson (jhawk@panix.com) of BBN, who told me about something new called BGP in 1993 at a Science Fiction convention in the DC area. Thanks to Dave Siegel (dsiegel@rtd.net) who's shared his BGP experience with others since 1995. And thanks to Alec Peterson (ahp@hilander.com) for reviewing this document - and who explored some of the more advanced BGP features (oh, the joy of route-maps) using my network when I didn't have the time.


    Sidebar on next-hop-self

    If you've followed the "peering and transit" discussions, you may have heard of the "next-hop-self issue". Here's the problem.

    Ciscos keep the originating address of a route intact in the next-hop field when they pass it from eBGP peer to eBGP peer. (And ditto for iBGP, but we're talking about eBGP here). It turns out that this behavior is sometimes useful in large networks where there's an IGP running to tell every router which way to send a packet that says it came from 192.41.177.x (some other provider's MAE-East router); 192.157.69.x (some other provider's Pennsauken router); etc...

    But this is really subtle and can screw you up big-time. In the best case you'll piss someone off (if you forget to set "next-hop-self" in an exchange-point peering environment. In the worst case you'll cause routing loops for yourself (examples of this will be given when we talk more about IGPs).

    Setting next-hop-self causes a Cisco to override the originating address of a route and stamp instead its own address as the "next-hop" part of the route.

    Remember that the critical parts of a route are: What the base IP address is; how big the route is (the specificity or netmask); and what destination (next-hop) to use to send data to the IP space represented by the route.

    We'll use an exchange point environment to illustrate next-hop-self. Refer to the figure (XXX) below. When AS 4969 advertises 250.20.0.0/16 to AS 64500, AS 4969 sets next-hop-self, so the next-hop is 192.41.177.87 (AS 4969's mae-east IP address).

    Now, AS 64500 advertises it to AS 64600 (see the top diagram) without next-hop-self. When AS 64600 processes the route and installs it into the IP routing table, the next-hop used will be 192.41.177.87.

    But AS 64600 doesn't peer with AS 4969 - yet it's going to send data to a route advertised by AS 4969 - right to AS 4969's router. People generally do not like this. In this case, AS 4969 might discover this "behavior" by running a few careful probes of other routers at mae-east. AS 4969 would then look to see how it hears AS 64600 (who is announcing AS 64600 to AS 4969) and see if they're the culprits. If AS 4969 really wants to, it can find out who the culprit is by passing a bogus route or two to each peer in turn, and see when AS 64600's router starts using the bogus route.

    The solution is for 64500 to use next-hop-self as well (see the bottom diagram). In this case, the route as heard by 64600 has 192.41.177.NNN (AS 64500's mae-east IP address) in the next-hop field - though the AS-PATH and certain other fields still show that AS 4969 is the origin of the route. So when AS 64600 wants to send data to AS 4969 based on this route it'll "bounce the traffic off of" AS 64500's router. Some people don't even like this (since it's a form of providing service to downstream customers over the "shared medium" of the exchange-point switches), but it's not going to be as strenuously objected to as not using next-hop-self.


    Sidebar on Outgoing Data Flow Control Without BGP

    Without BGP, your only way to send data out (and the way 90% or more of the ISPs out there run their networks) is to default route into their provider(s).

    Any packet not destined to the inside of the ISP's network will then hit the "wildcard", or "default" route, and be sent out the router interface towards the provider(s).

    There are a few ways you can do this.

    Outgoing Data Flow: Option 1

    Option 1 is to default to one provider and install a "backup default" to your other provider. On a Cisco, this is done with:

    ip route 0.0.0.0 0.0.0.0 Serial0
    ip route 0.0.0.0 0.0.0.0 Serial1 10
    
    This says: "The default route (0.0.0.0/0, or 0.0.0.0, netmask 0.0.0.0) goes out Serial0 with a preference of 0 (if you don't put a 4th field in an "ip route" statement on a Cisco, it'll assume a weight of 0)." "Another default route is out Serial1, with a weight of 10".

    If you do it this way, the route with a lower weight will be around when Serial0 is up. If Serial0 goes down for some reason (actually, if the "line protocol" on Serial0 goes down), the route will be invalidated and will go away, so the Cisco will look for the next-best route, which will be the route through Serial1. Even though it has a lower weight, it's the only valid route left to consider, so it'll "win".

    Outgoing Data Flow: Option 2

    Option 2 is to default equally to both providers. However, there's a catch. If you just do:

    ip route 0.0.0.0 0.0.0.0 Serial0
    ip route 0.0.0.0 0.0.0.0 Serial1
    
    You will almost certainly not be happy with the result! Unless "ip route-cache" is set on the interfaces in question, the Cisco will simply "round-robin" outgoing packets, sending packet N out Serial0 and packet N+1 out Serial1. Why is this bad? Well, if you are sending data to site X, and site X is on Provider A's network (and let's say that Provider A is at the other end of Serial0), data sent to site X out Serial0 may arrive in 10ms. Data sent to site X out Serial1 may arrive in 30-100ms. This means that packets 1 and 3 could arrive before packets two in a pathologically worst-case scenario. Or even packets 1, 3, 5, and 7 could arrive before packet2 does. This kind of out-of-order (or even worse, packet-lossy) performance spells doom for IP traffic.

    The fix is easy, however:

    int Serial0
     ip route-cache
    int Serial1
     ip route-cache
    
    Note, though, that if you are using any Cisco bigger than a 2500 series, the "ip route-cache" command might be "ip route-cache cbus" or "ip route-cache optimum" or some other command.

    And actually, many Ciscos come pre-configured with "ip route-cache" set on all of the interfaces - but even so, it doesn't hurt to be explicit.

    If you do this, the Cisco will keep a cache of all destinations you're sending packets to, and will "lock in" each destination to one specific interface. In general, this method leads to decent load-balancing (in the 40/60 to 50/50 split range). The worst case in this scenario is not IP degradation, but poor use of your additional bandwidth (which can, of course, lead to IP degradation if you need your second outgoing pipe because your first has a tendency to get full). Anyway, this kind of load-balancing works pretty well and is what people use when they can't accept "full BGP routes" from multiple providers.


    TO BE DONE

    aggregate-address
    transit
    bgp and peering
    bgp: the provider's side: filtering
    as-path padding
    sync