dslreports logo

Cisco Documentations

Nexus 7000 FabricPath
Cisco FabricPath Design Guide: Using FabricPath with an Aggregation and Access Topology
Configuring FabricPath Switching
Cisco FabricPath Best Practices (PDF)

What are the challenges?

FabricPath is specifically targeted at data centers because of several unique challenges:

• Layer 2 Adjacency - Unlike the campus where we've pushed Layer 3 to the closet, Data Centers truly have a need for large layer 2 domains. VMWare especially has made this even more critical because in order to take advantage of VMotion and DRS (two critical features), every VMWare host must have access to ALL of the same VLANs.

• Resiliency is key - the Data Center has to have the ability to be ALWAYS up. Redundant paths make this possible.

• Spanning tree addresses issues with redundant paths, but comes with tons of caveats. As the L2 network scales, convergence time increases, and it's complicated (and sometimes dangerous) to configure all of the tweaks to make it perform better (such as portfast, uplinkfast, etc.). Also, traditional spanning blocks links which cuts bandwidth in half crippling another need in Data Centers for bandwidth scalability.

• vPC Limitations vPC's are great, and they address the blocked links. But they come with several caveats such as complicated matching configuration, orphan ports, no routing protocol traversal, etc. Even in a vPC scenario, we still have to run spanning tree, we're just eliminating loops, and if i were to plug a non-vPC switch into the core, it's still going cause a convergence. Finally, they are only scalable to two core devices.

• Bandwidth scalability sure the Nexus 7018 can scale tremendously large, but it's also a massive box. If we use vPC's we are still limited to 2 core boxes. This sounds like overkill, but it's quickly becoming a more popular design in larger customers. What if in order to scale bandwidth in the core, we could just add a third or a fourth, smaller box.

What is FabricPath?

Originally I was worried about having to learn a completely new protocol, but the truth is that most of us already know all of the concepts that make FabricPath work. Think about routing to the access layer and why we like that design.

• Routing protocols truly eliminate spanning tree.

• They are very quick to converge, and the addition of a single node doesn't affect any other part of the network.

• With equal-cost multipath routing, I can scale bandwidth extremely easily by adding another core device and simply adding links. All of the links will be active and all of the links will be load-balanced.

There you go you just learned FabricPath. FabricPath is based on the TRILL standard with a few Cisco bonuses which builds on the concept of "what if we could route a layer 2 packet instead of switching it." Under the covers of FabricPath it uses the ISIS protocol, a MAC encapsulation, and routing tables to achieve all of the magic. In short, you now have all of the benefits of Layer 3 to the access switch, none of the caveats of vPCs, while still be able to span VLANs. Oh, and the configuration is extremely simple.

What do I need to use FabricPath?

F-Series line cards in a Nexus 7000, and Nexus 5500 series + 2k's in the access. The environment doesn't have to me homogenous, and portions of the environment could be running FabricPath while others are still traditional vPC or spanning tree. It's as simple as that.

FabricPath vs. TRILL

Today there are some key differentiators between Cisco's proprietary FabricPath technology, and what the competitors could bring with TRILL. What it amounts to is that ours is ready for deployment, and the standard still has some functional gaps.

In short, the big ones (all of the core switches) can act as a default gateway at the same time (using GLBP). The vPC+ can be used on the access switches to extend Active-Active to non-FabricPath-speaking server, and conversational learning allows extremely scalable setup.

FabricPath vs. vPC

You may note that FabricPath is definitely a replacement for vPC. More than that, it's really a replacement for traditional L2 network topologies. The vPC is really an attempt to trick a spanning-tree topology due to loop prevention struggles with multiple active paths to multiple switches.

There is one place, however, in a FP topology that you would still want to use vPCs and that is from the access switch to the server itself because there aren't any NICs or vSwitches that currently understand FP, but plenty that understand LACP. In this case, there is an extension of vPC called vPC+ which is a FabricPath aware vPC that bridge between an access layer switch running FP and a server that is unaware but still needs multiple active uplinks.

FabricPath for Layer 2 DC Interconnect

The requirement for layer 2 interconnect between data centre sites is very common these days. The pros and cons of doing L2 DCI have been discussed many times in other blogs or forums so I won't revisit that here. Basically there are a number of technology options for achieving this, including EoMPLS, VPLS, back-to-back vPC and OTV. All of these technologies have their advantages and disadvantages, so the decision often comes down to factors such as scalability, skillset and platform choice.

Now that FabricPath is becoming more widely deployed, it is also starting to be considered by some as a potential L2 DCI technology. In theory, this looks like a good bet easy configuration, no Spanning-Tree extended between sites, should be a no brainer, right? Of course, things are never that simple let's look at some things you need to consider if looking at FabricPath as a DCI solution.

1. FabricPath requires direct point-to-point WAN links

A technology such as OTV uses MAC-in-IP tunnelling to transport layer 2 frames between sites, so you simply need to ensure that end-to-end IP connectivity is available. As a result, OTV is very flexible and can run over practically any network as long as it is IP enabled. FabricPath on the other hand requires a direct layer 1 link between the sites (e.g. dark fibre), so it is somewhat less flexible. Bear in mind that you also lose some of the features associated with an IP network for example, there is currently no support for BFD over FabricPath.

2. Your multi-destination traffic will be "hairpinned" between sites

In order to forward broadcast, unknown unicast and multicast traffic through a FabricPath network, a multi-destination tree is built. This tree generally needs to "touch" each and every FabricPath node so that multi-destination traffic is correctly forwarded. Each multi-destination tree in a FabricPath network must elect a root switch (this is controllable through root priorities, and it's good practice to use this), and all multi-destination traffic must flow through this root. How does this affect things in a DCI environment? The main thing to remember is that there will generally be a single multi-destination tree spanning both sites, and that the root for that tree will exist on one site or the other. The following diagram shows an example.



In the above example, there are two sites, each with two spine switches and two edge switches. The root for the multi-destination tree is on Spine-3 in Site B. For the hosts connected to the two edge switches in site A, broadcast traffic could follow the path from Edge-1 up to Spine-1, then over to Spine-3 in Site B, then to Spine-4, and then back down to the Spine-2 and Edge-2 switches in Site A before reaching the other host. Obviously there could be slightly different paths depending on topology, e.g. if the Spine switches are not directly interconnected. In future releases of NX-OS, the ability to create multiple FabricPath topologies will alleviate this issue to a certain extent, in that groups of "local" VLANs can be constrained to a particular site, while allowing "cross-site" VLANs across the DCI link.

3. First Hop Routing localisation support is limited with FabricPath

When stretching L2 between sites, it's sometimes desirable to implement "FHRP localization" which usually involves blocking HSRP using port ACLs or similar, so that hosts at each site use their local gateways rather than traversing the DCI link and being routed at the other site. The final point to be aware of is that when using FabricPath for layer 2 DCI, achieving FHRP localisation is slightly more difficult. On the Nexus 5500, FHRP localization is supported using "mismatched" HSRP passwords at each site (you can't use port ACLs for this purpose on the 5K). However, if you have any other FabricPath switches in your domain which aren't acting as a L3 gateway (e.g. at a third site), then that won't work and is not supported.

This is because FabricPath will send HSRP packets from the virtual MAC address at each site with the local switch ID as a source. Other FabricPath switches in the domain will see the same vMAC from two source switch IDs and will toggle between them, making the solution unusable. Also, bear in mind that FHRP localization with FabricPath isn't (at the time of writing) supported on the Nexus 7000.

The issues noted above do not mean that FabricPath cannot be used as a method for extending layer 2 between sites. In some scenarios, it can be a viable alternative to the other DCI technologies as long as you are aware of the caveats above.

A vPC implementation on FabricPath: Introduction to vPC+

Virtual Port Channel (vPC) is a technology that has been around for a few years on the Nexus range of platforms. With the introduction of FabricPath, an enhanced version of vPC, known as vPC+ was released. At first glance, the two technologies look very similar, however there are a couple of differences between them which allows vPC+ to operate in a FabricPath environment. So for those of us deploying FabricPath, why can't we just use regular vPC?

Let's look at an example. The following drawing shows a simple FabricPath topology with three switches, two of which are configured in a (standard) vPC pair.



A single server (MAC A) is connected using vPC to S10 and S20, so as a result traffic sourced from MAC A can potentially take either link in the vPC towards S10 or S20. If we now look at S30's MAC address table, which switch is MAC A accessible behind? The MAC table only allows for a one to one mapping between MAC address and switch ID, so which one is chosen? Is it S10 or S20? The answer is that it could be either, and it is even possible that MAC A could "flip flop" between the two switch IDs.

In FabricPath implementation, such "flip flop" situation breaks traffic flow. So, clearly we have an issue with using regular vPC to dual attach hosts or switches to a FabricPath domain. How do we resolve this? We use vPC+ instead.

The vPC+ solves the issue above by introducing an additional element, the "virtual switch". The virtual switch sits "behind" the vPC+ peers and is essentially used to represent the vPC+ domain to the rest of the FabricPath environment. The virtual switch has its own FabricPath switch ID and looks, for all intents and purposes, like a normal FabricPath edge device to the rest of the infrastructure.



In the above example, vPC+ is now running between S10 and S20, and a virtual switch S100 now exists behind the physical switches. When MAC A sends traffic through the FabricPath domain, the encapsulated FabricPath frames will have a source switch ID of the virtual switch, S100. From S30's (and other remote switches) point of view, MAC A is now accessible behind a single switch S100. This enables multi-pathing in both directions between the Classical Ethernet and FabricPath domains. Note that the virtual switch needs a FabricPath switch ID assigned to it (just like a physical switch does), so you need to take this into account when you are planning your switch ID allocations throughout the network. For example, each access "Pod" would now contain three switch IDs rather than two in a large environment this could make a difference.

Much of the terminology is common to both vPC and vPC+, such as Peer-Link, Peer-Keepalive, etc and is also configured in a very similar way. The major differences are:

• In vPC+, the Peer-Link is now configured as a FabricPath core port (i.e. switchport mode fabricpath)

• A FabricPath switch ID is configured under the vPC+ domain configuration (fabricpath switch id ) remember to configure the same Switch ID on both peers!

• Both the vPC+ Peer-Link and member ports must reside on F series linecards.

The vPC+ also provides the same active / active HSRP forwarding functionality found in regular vPC this means that (depending on where your default gateway functionality resides) either peer can be used to forward traffic into your L3 domain. If your L3 gateway functionality resides at the FabricSpine layer, vPC+ can also be used there to provide the same Active/Active functionality.

Mapping a FabricPath Local ID to an Outbound Interface

When a FabricPath edge switch needs to send a frame to a remote MAC address, it performs a MAC address table lookup and finds an entry of the form SWID.SSID.LID. The SWID represents the switch-ID of the remote FabricPath edge switch, the SSID represents the sub-switch ID (which is only used in vPC+), and the LID (Local ID) represents the outbound port on the remote edge switch. However, the method by which these LIDs are derived doesn't seem to be very well documented and this had been bugging me for a while. So I decided to dig in and see if I could find out a bit more about the way LIDs are used on the Nexus switches.

I found a somewhat cryptic statement of the followings "for N7K the LID is the port index of the ingress interface, for N5K LID most of the time will be 0". Let's see what we can make of that.

The acronym LID stands for "Local ID" and, as the name implies, it has local significance to the switch that a particular MAC address resides on. As such, it is up to the implementation to determine how to derive a unique LID to represent its ports. Apparently, the Nexus 5000 and Nexus 7000 engineering teams did not talk to each other to agree on some consistent method of assigning the LIDs, but each created their own platform-specific implementation.

The interface represented by the LID is an ingress interface from the perspective of the edge switch that inserts the LID into the outer source address. For the switch sending to the MAC address it represents the egress port at the destination edge switch.

For the N5K I couldn't really find more than that the LID will usually be 0, but there may be some exceptions. For the N7K, the LID maps to the "port index" of the ingress interface.

So I decided to get into the lab and see if I could find some commands that would help me establish the relation between the LID and the outbound interface on the edge switch. I created a very simple FabricPath network and performed a couple of pings to generate some MAC address table entries.

Let's have a look at a specific entry in the MAC address table of a Nexus 7000:


So for example, let's zoom in on the MAC address 0005.73e9.fcfc. According the table, frames for this destination should be sent to SWID.SSID.LID "16.0.14". From the SWID part, we can see that the MAC address resides on the switch with ID "16". To find the corresponding switch hostname we can use the following command:


So we jump to switch N7K-2-pod6 and perform another MAC address table lookup:


Now we know that the outbound interface for the MAC address on the destination edge switch is Ethernet 3/15. So how can we map the LID "14" to this interface?

Since the LID corresponds to the "port index" for the interface in question, how can we find the port index? The port index is an internal identifier for the interface, also referred to as the LTL and there are some show commands to determine these LTLs. For example, if we wanted to know the LTL for interface E3/15, we could issue the following command:


Here we find that the LTL for the interface is 0xe, which equals 14 in decimal. This shows that the LID is actually the decimal representation of the LTL. (FabricPath switch-IDs, subswitch-IDs and Local IDs are represented in decimal by default).

This lookup can also be performed in reverse. If we take the LID and convert it to its hexadecimal representation of 0xe, we can find the corresponding interface as follows:


So through use of these two commands, we can map a FabricPath LID to an interface and vice versa on a Nexus 7000.

FabricPath Authentication in NX-OS

First and foremost, It is assumed that now you have a basic working knowledge of FabricPath. FabricPath here is Cisco's scalable Layer 2 solution that eliminates Spanning Tree Protocol and adds some enhancements that are sorely needed in L2 networks like Time To Live (TTL), Reverse Path Forwarding (RPF) and uses IS-IS as a control plane protocol. It's the fact that FabricPath uses IS-IS that makes it very easy and familiar for customers to enable authentication in their fabric. If you have ever configured authentication for a routing protocol in Cisco IOS or NX-OS, this will be similar with all of your favorites like key chains, key strings and hashing algorithms. Hopefully that nugget of information doesn't send you into a tail spin of despair.

With FabricPath there are two levels of authentication that can be enabled. The first is at the domain level for the entire switch (or VDC!). Authentication here will prevent routes from being learned. Important to note that ISIS adjacencies can be formed on the interface level even when the domain authentication is mismatched. This domain level authentication is for LSP and NSP exchange not PDUs on the interfaces.

If you are not careful, you can blackhole traffic during the implementation of authentication, just like you would with any other routing protocol.

A quick order of operation to enable domain level authentication would be to define a key-chain with keys which contain key-strings defined underneath. The key strings are the actual password and NX-OS allows you to define multiple key-strings so you can rotate passwords as needed and even includes nerd knobs for setting start and end times. After the key chains are defined, they are applied to the FabricPath domain. Let's quit typing and let the CLI do the talking.

We start with a VDC that has FabricPath, is in a fabric with other devices but doesn't have authentication enabled. We can see we have not learned any routes.


We can also see we are adjacent to some other devices, but also note that we do not see their name under system ID, just the MAC address. This is a quick point that something is amiss with the control plane. They are in bold and red below.


Now we'll add the authentication and start with the key-chain and call it "domain" then define key 0 and the key-string of "domain" (not very creative am I?) and then finally apply it to the fabricpath domain default.


Now let's see what that does for us. Much happier now aren't we?


The exact same sequence applies to interface-level authentication and looks like the CLI below. We can see that when we have two non-functioning states here INIT and LOST. INIT is from me removing the key-chain and flapping the interface (shut/no shut) and LOST is from me removing the pre-defined key chain and the adjacency going down to N7K-1-Agg1.


Now we'll add our key chain and key string.


A quick check shows us we're happily adjacent to our switches.


Finally, a quick command to check the FabricPath authentication status on your device is below:


With this simple exercise you've configured FabricPath authentication. Not too bad and very effective. As always when configuring passwords on your device, cut and paste from a common text file is important to avoid empty white spaces at the end of passwords and other nuances that can lead you down the wrong path. In general, I would expect a company implementing FabricPath authentication will probably configure both domain and interface level authentication.

A Way to tell the Root of the FabricPath tree



Remember with ISIS there are two authentication methods, the actual hello adjacency authentication, and the LSP data-plane authentication, here is a sample config of both of these.



The config as you can see above is quite simple, don't forget that with key chains you can specify a accept lifetime and send lifetime. But for our case we are not going to, when you don't specify this it is simply assumed to be infinite.


You can verify your ISIS authentication:



Next if you want to actually configure the LSP's to be authenticated



You can then verify this is configured


A big hint that your auth is working for hello but not for LSP is that the hostnames don't come up correctly in your isis adjacency.

FabricPath Load Balancing

First of all it helps if we establish a few items of terminology. The first thing to remember is that fabricpath supports multiple topologies so that you can actually break out particular FabPath enabled VLAN's to use a particular topology. However this is only available in certain versions of NXOS and is quite advanced, so we will be skipping this advanced configuration.

However, the concept of "Trees" in fabricpath also exists, tree's are used for the distribution of "multidestination" traffic, that is traffic that is not a single destination, so perfect examples of this would be multicast, unknown unicast and other flooding types.

The first multidestination tree, tree 1 is normally selected for unknown unicast and broadcast frames except when used in combination with vpc+, but the detail of that we will ignore for now.

Multicast traffic is load balanced based on a hashing function (which is based on the source and dest IP address) across both the trees, you can see what kind of tree the traffic is going to take on a nexus 7000 with the following command.


The FTAG is an important key here, the FTAG will correlate to the "Tree". The FTAG is used as it's an available field in the FabricPath Header that can be used to identify the frame and tell the switches "use this tree to distribute the traffic".

Now the whole point of this option is for scalability, especially with large multicast traffic domains, using this option you can increase link utilization for multicast traffic by having the traffic load balance across two "root" trees (yes, this is fabric path, so we don't really have a root tree like we do in spanning-tree, but for multidestination traffic we kind of have to.

You can actually tell using the following command what port your switch is going to use for that particular FTAG/MTREE:


As you can see from the above, there are two seperate paths that the switch is taking for each of the Trees based on where the root of the tree lies

So how is the root of each tree chosen? It's based on:

• root-Priority (highest wins, default is 64)
• Switch-id (highest wins, default is randomally chosen but can be manually assigned)
• System-id (Tie-breaker)

There will always be two seperate roots for each tree, but as you can imagine, your root tree might not be the most optimally chosen tree, so you can configure the root priority, the highest root priority will become the root for FTAG 1, and second place will become the root tree for FTAG 2.


N71k is now the root for this tree, you can attempt to verify this in a few ways, the first is to look at the show fabricpath mroute ftag 1 command we used previously, let's just quickly get our topology clear:



As you can see from the above, we have multiple connections between SW3 to SW2, and then a single connection from SW2 and SW3 up to N7K1


Let's check out our mroute routing:


You can tell from the above, neither of the switches will ever send unknown unicast (which remember, is placed into FTAG 1) out to each other but will instead always forward it up to the tree, up to N71k, which is our root for this tree.

From N7k's Perspective:


He is responsible for forwarding it back down, so if an unknown unicast or a multicast frame that was hashed to FTAG 1 comes from SW2, it will go up to N7k1 and then back down towards SW3 through N7K1.

Let's manually configure switch 2 to be the root for FTAG 2 by manually configuring SW3 to have a lower priority.


Let's take a look at the FTAG distribution now.


Let's check it out on the n71k:


Ok so now SW2 is the root for FTAG 2 and any frames from N71k will come down to him first, and he in turn will distribute it to SW3, now there is one bit of that config that might make you say "What Gives?" and that is, I have four connections between SW2 and SW3, why is traffic not load balancing across those Equal Cost Links?

Fabric Path only ECMP's for KNOWN unicast frames.

OK, here's one more way you can use to determine the root of a MTREE:


So the key point in this output I have highlighted:
"Note: The metric mentioned for multidestination tree is from the root of that tree to that switch-id"

What this is saying is that when your looking at this output, your being told the values for the topology tree as if you where running the command on the root of each tree itself, So if we take a closer look at a switch, Switch 3, which is not the root for either FTAG.


The Metric for reaching Switch-ID 1, which this switch reaches via Eth1/17, is metric 0... Because Switch 1 _is_ the root for this FTAG

Same again for Tree 2, the root of the tree is Switch-ID 2, which is out eth1/8, which has a metric of 0, because obviously for Switch-ID 2, it's metric to reach itself, would be 0.

Let's now look at unicast load balancing

So if we look at our default unicast load balancing table right now on our switches with multiple, equal cost links (Remember, fabricpath only supports load balancing across equal cost links)


We can see that our links are being equally balanced, how are they balanced?



They are load balanced based on a combination of values as shown above, these include

• layer-3: Include only Layer 3 input (source or destination IP address)

• layer-4: Include only Layer 4 input (source or destination TCP and UDP ports, if available)

• mixed: Include both Layer 3 and Layer 4 input (default).

• source: Use only source parameters (layer-3, layer-4, or mixed).

• destination: Use only destination parameters (layer-3, layer-4, or mixed).

• source-destination: Use both source and destination parameters (layer-3, layer-4, or mixed).

• symmetric: Sort the source and destination tuples before entering them in the hash function (source-to-destination and destination-to-source flows hash identically) (default).

• xor: Perform an exclusive OR operation on the source and destination tuples before entering them in the hash function.

• include-vlan: Include the VLAN ID of the frame (default).

• rotate-amount: Specify the number of bytes to rotate the hash string before it is entered in the hash function.

Each of these values is relatively straight forward; you can specify if you want to look at the layer 3 or layer 4 source/dest info OR a mixture (which is the default); you can specify that you only want to look at the source or destination OR mixed; you can control if the hash function will produce the same value for both source-dest traffic and the return dest-source traffic. Finally the VLAN ID can be included in your combinations, last but not least the rotate-amount controls some of the mathematics of the hash function that we will get into.

Let's use our favorite command to look at this closely


We can see that we only changed one tiny param the port number and all of a sudden the traffic will load balance across another link, great! Looks pretty good so far right?

Let's check out what that symetric command does for us, check this out:


Here we have changed the source and destination ports and ip addressing etc around, and we are provided with exactly the same CRC hash, which leads us to exactly the same output interface!

Let's see if that is also true on the N7k:


If we change the length that the hash key is based on, the rotate amount, our hash key will change.


So now we have a diffirent hash key generated, based on a longer rotate-amount.
This is apparently simply used to make sure that identical or near identical traffic flows using VDC's disripute the traffic diffirently to each other, it simply adds a longer hash value (in this case, it takes a number of bytes from the VDC Mac address) to increase the likelyhood that the hash will differ between the VDC's.

Check this out for size:

Two totally separate VDC's are shown here, and what we do is change the rotate-amount on each of them to 0 (nothing), then ask us to show it what it thinks the hash key is.


As you can see, the hash is identical, which means our traffic would flow over the same paths between these VDC's which we may not want, so we can use the rotate-amount to increase how much of the VDC-MAC address is used in the hashing function.

Note that just because FabricPath only supports equal cost load balancing, doesn't mean that we can't go through intermediate switches and still have load balancing. Here is an example of this.


In the above example, we have modified the metric on N71k so that SW1 and SW2, which have interfaces eth1/5 - 8 to each ohter, also see the route via N71k as a valid path between each other two, we did this by modifying the metrics like so:


Notice that the total cost of these links is now 40 (25 + 15) for SW2, which means SW2 now considers it an alternative Path

Over on SW3, since we have not modified the default metric, it will still load balance via the 4 links, not 5.


That is, until we change the metric:



Expand got feedback?

by aryoba See Profile
last modified: 2015-08-19 12:09:18