In reading the SRND on router access layer there are a few key takeaways.
On page 5 it says: “Cisco recommends a routed network core in all cases.”
Let’s examine the reasoning behind this statement. First of all, what’s the alternative? You can have a layer-2 core where your devices simply switch frames at layer-2. Think about the problems with this design.
- Troubleshooting is hard
- Traffic is not as deterministic
- Engineering based on metrics is difficult
Why do I think troubleshooting is harder than layer-3 networks? Think about the tools you have available to you to troubleshoot layer-2 issues. You’ve got CAM tables, spanning-tree outputs, and CDP/LLDP messages. That’s about it. Layer-2 ping exists but what’s the next step if that fails? Layer-3 networks have a lot more management and troubleshooting tools available.
Traffic is not necessarily deterministic either. You rely on the spanning-tree algorithms to determine a loop-free path through the network. This means manipulation of root bridges and link costs. It’s difficult to ensure a particular path in the event of a failure. You need to ensure you know which ports are blocking/forwarding and so on. Layer-3 cores don’t have this problem – all links can be forwarding.
I would argue that traffic engineering is difficult as well. You have MST where you can select particular VLANs to remain forwarding on a set of links, with the other VLANs forwarding on a different set of links, but that’s difficult to scale. Layer-3 networks have an easier time for traffic engineering, with a standard set of tools to implement and troubleshoot them.
With that said, let’s examine some of the other items in the SRND with respect to the CCDE written outline.
- Layer 2 Down Detection – use point to point fiber connections where possible because the detection of failure is very quick. You should also examine the topics of debounce and carrier-delay. With this you should also implement ip event dampening on all interfaces to minimize disruption to your network during multiple failures. The dampening feature is very similar to BGP’s route dampening feature.
- For all media types – SONET and point-to-point fiber are both very fast, other media types are not quite as quick to detect. If you can, use BFD on any interfaces as this will decrease the detection of failures. In multipoint networks this may be the only way to have subsecond failure detection.
- Fast hello timers – OSPF and EIGRP provide the following:
- OSPF: ip ospf dead-interval minimal hello-multiplier # (typically 4)
- ip hello-interval eigrp 1
- ip hold-time eigrp 3
- OSPF, EIGRP, IS-IS, BGP – IS-IS will need further research, but I guess it has similar mechanisms as OSPF. BGP has several things that can be tweaked to decrease convergence time:
- bgp path mtu
- Fast SPF Timers – OSPF has several in the SRND:
- SPF throttle tuning
- timers throttle spf
- Best practice: 10 100 500
- LSA throttle tuning
- timers throttle lsa
- Best practice: 10 100 5000
- OSPF, IS-IS – this may be a typo?
- Recursion and Convergence – the issue they’re talking about here is the fact that OSPF’s convergence will increase as more routers exist in the network. You can increase the amount of links/routes within the OSPF area without taking as major of a hit as an increase in the amount of routers. The SPF calculation recurses on each type 1 LSA created by every router in the network, which will increase convergence time.
- Impact of Third Party Next Hop & BGP recursion – have a look at this diagram.