Friday 1 January 2016

BGP PIC CORE

Happy New Year Folks!

In one of the previous posts, we looked at EIGRP FRR and OSPF LFA feature which helps achieving fast convergence.

There is a similar feature  in BGP which is called PIC (Prefix Independent Convergence). It speeds up the convergence of the FIB in failover conditions. BGP works differently than any IGP. It is designed to carry hundreds of thousands routes in the routing table hence fast failover works differently in BGP. There are couple of ways to implement PIC in BGP. They are "PIC Core" and "PIC Edge". We will look into both of these options.

Let's look at the below topology.
We have CE1 and CE2 with loopback IPs 1.1.1.1/24 and 8.8.8.8/24 respectively as customer's LAN range. We are running standard Layer 3 IPVPN in the service provider core. Each router in the core has a loopback IP which is advertised in the IGP. The VPNv4 neighbourship is built using that loopback. The router R4(P1) is the route reflector for VPNv4 prefixes with all the PEs being it's client.

This is how the physical topology looks like in case you want to build the lab yourself!



As mentioned before, We are running BGP free core i.e. only PE routers are maintaining BGP table. 

R4 learns the prefix 8.8.8.8/32 from both R6 and R7. It choses the path towards R6 as the preferred path and advertise it to R2.





So for R2, the IBGP next-hop to reach 8.8.8.8/32 will be 6.6.6.6. R2 will now check it's routing table to check the IGP path to reach 6.6.6.6.



As the link bandwidths are same, R2 can reach R6 either via R4 or R5.

If we increase the "OSPF Cost" of the link between R2 and R4, R2 will prefer the path via R5 to reach R6.





The CEF table on R2 shows the next hop, out going interface and the MPLS label number.


So to recap

- The prefix 8.8.8.8/32 is reachable via two PEs, R6 and R7.
- Both R6 and R7 advertises it to R4 which is the route-reflector.
- R4 choses R6 as the best path to reach 8.8.8.8/32
- R4 then advertise this VPNv4 prefix to R2.
- R2 looks at the next-hop IP 6.6.6.6 which is the loopback of R6
- To reach 6.6.6.6, R2 looks at the local routing table and finds that it can be reachable via R5
- Now from R2, any traffic for the destination 8.8.8.8 will be sent out to next-hop 10.1.25.5 out of interface Ethernet1/0

i.e. 8.8.8.8 --> Next-Hop 10.1.25.5, Outgoing Interface Ethernet1/0

The traffic flow would be R1-->R2-->R5-->R6-->R8. 

What would happen if R5 becomes unavailable?



There is a second path via R4 to reach R6 which means this outage will not affect the BGP session between R2 and R4 or even R4 and R6.

The traffic flow would be R1-->R2-->R4-->R6-->R8.



The only thing that will change from R2's prospective is the next-hop IP and the outgoing interface. 

i.e. 8.8.8.8 --> Next-Hop 10.1.24.4, Outgoing Interface Ethernet0/1



It may look a very small change as the IGP can converge really fast however this will result in change of BGP table and CEF. The next-hop and outgoing interface will be updated for each prefix in BGP table. 

In this example we only have one prefix but what if we have 500K routes in the BGP table? How long will take to change the next-hop IP and outgoing interface for each of those prefixes in the CEF table?

PIC Core resolves this problem by creating something called "Pointer". A pointer is combination of Next-Hop/outgoing Interface.

So in our example, let's say 

Pointer A = 10.1.25.5 Ethernet1/0

Now instead of using next-hop and interface, each prefix will refer to a pointer.
e.g.
The normal FIB: 8.8.8.8 --> 10.1.25.5, Ethernet1/0
FIB with PIC Core enabled: 8.8.8.8--> Pointer A = 10.1.25.5, Ethernet1/0

In the failure condition, we will update the nexthop and interface of Pointer A i.e. 8.8.8.8--> Pointer A = 10.1.24.4, Ethernet0/1 (Only one change)

If there are 500K prefixes, we still have to make one change and update pointer information instead of changing next-hop/Interface for 500K prefixes!

To enable PIC Core on the PE, you can apply this command.



To disable PIC Core, you can replace "convergence-speed" with "memory-utilization". 

Please remember, in the above example; BGP is NOT reconverging. PIC Core is about dealing with IGP failures, it cannot handle BGP failures. We will need PIC Edge for that which we will discuss in next post.

6 comments:

  1. Excellent post. It looks simple but when we consider current 585K GBP routes, impact will be massive when link fails terms of long convergence time. Very good explained. Thanks. Will wait for next post about PIC Edge to see how to deal with GBP failure.

    ReplyDelete
  2. Simply say, BGP-PIC core is same idea to "Indirect Nexthop", right?

    ReplyDelete
  3. one question.. is this applied only on PEs ??

    ReplyDelete
  4. This should be applied on every routers where BGP can converge using BGP PIC Edge.
    It means on PE devices mainly.

    ReplyDelete