Blog

VXLAN on Nexus NX-OS Flood and Learn

1559034_db3b
Network

VXLAN on Nexus NX-OS Flood and Learn

What is VXLAN?

In this documentation, we are going to talk about VXLAN, use case scenarios, and deployment of VXLAN with Multicast flood and learn on a Cisco Nexus 9k. Before jumping to the configuration, let’s talk about some theories. What is VXLAN? VXLAN is a virtual extensible local area network. Technically it is a layer two in a layer three overlay tunnel. VXLAN is industry standard, and you can find more specific details from  https://tools.ietf.org/html/rfc7348  

    VXLAN expands Vlan space. As you know, VLANs use 12-bit space, which gives us 4096 values, but VXLAN uses 24 bits space equal to 16,777,216 almost infinite numbers for us. You can ask a question that, I can expand my VLAN over MPLS VPLS or Q-in-Q VLAN tunnels. That is right, but VXLAN is perfectly fitting to the Datacenter environment. If you want to extend your VLAN from one data center to another, VXLAN is a solution. Also, let’s say you have a three-tier architecture Web tier, Application tier, and Database tier. On condition that you follow traditional expansion, your broadcast domain will get bigger; your spanning-tree domain will enlarge. With a VXLAN deployment, you decrease your broadcast domain size. You do not need a spanning tree for loop prevention. Besides, VXLAN uses layer 3 ECMP. Scaling enhancements are Optimization for the control plane, mac learning, Arp tables, Bum replication.

There are some terminologies on a VXLAN.
1) VNI /VNID – Vxlan network identifier – Replaces the VLAN id.
2) VTEP – VXLAN tunnel endPoint. Cisco uses NVE – Network virtualization edge Logical representation of the VTEP. NVE is the tunnel interface Device that performs VXLAN encap/decap, which could be hardware or software.

3) VXLAN Gateway
A device that forwards traffic between VXLANs
It can be both layer two and layer three forwardings.

 

We can describe the Basic VXLAN workflow:
Receive ARP from a local Host.
Assume a miss occurs, the next step is: Find the remote VTEP. There are three different ways of finding remote VTEP.

Multicast flood and learn
Ingress replication
MP-BGP L2VPN EVPN

In this discussion, we will deploy a Multicast flood and learn.

Caveats 

 For the existing fabric, you have to configure your MTU size at least 1554 bytes. Here is a small calculation.

Ipv4 header –  20 bytes
UDP header – 8 bytes
VXLAN header – 8 bytes
Original Ethernet header – 18 byte
Original ethernet payload – 1500 bytes

 

For the VXLAN underlay network, OSPF/isis/Eigrp can be used.  I am going to use OSPF. What are we trying to accomplish? The main reason of IGP in VXLAN deployment is building reachability between VTEPs.  Our tunnel endpoint is going to be a loopback IP. Loopback IP must be /32 address and OSPF neighborship will be established between those loopback IP addresses. OSPF network type will be a Point-to-Point, and it will simplify adjacency formation.
Let’s move on to the configuration steps. For this demonstration I used eve-ng. CSR1000V is used as a host device. There are some features needed to be enabled on a Nexus 9k. Those are :

 

feature nxapi
feature ospf
feature pim
feature vn-segment-vlan-based
feature nv overlay
feature bfd

——–Spine 1———–

interface loopback0
ip address 172.24.1.1/32
ip router ospf 1 area 0.0.0.0
ip pim sparse-mode

router ospf 1
bfd
log-adjacency-changes

interface Ethernet1/1
no switchport
mtu 9000
medium p2p
ip unnumbered loopback0
ip router ospf 1 area 0.0.0.0
ip pim sparse-mode
no shutdown

interface Ethernet1/2
no switchport
mtu 9000
medium p2p
ip unnumbered loopback0
ip router ospf 1 area 0.0.0.0
ip pim sparse-mode
no shutdown

interface Ethernet1/3
no switchport
mtu 9000
medium p2p
ip unnumbered loopback0
ip router ospf 1 area 0.0.0.0
ip pim sparse-mode
no shutdown

——-Spine 2———-

router ospf 1
bfd
log-adjacency-changes

interface loopback0
ip address 172.24.1.2/32
ip router ospf 1 area 0.0.0.0
ip pim sparse-mode

interface Ethernet1/1
no switchport
mtu 9000
medium p2p
ip unnumbered loopback0
ip router ospf 1 area 0.0.0.0
ip pim sparse-mode
no shutdown

interface Ethernet1/2
no switchport
mtu 9000
medium p2p
ip unnumbered loopback0
ip router ospf 1 area 0.0.0.0
ip pim sparse-mode
no shutdown

interface Ethernet1/3
no switchport
mtu 9000
medium p2p
ip unnumbered loopback0
ip router ospf 1 area 0.0.0.0
ip pim sparse-mode
no shutdown

Leaf 1
router ospf 1
bfd
log-adjacency-changes

interface loopback0
ip address 172.24.1.3/32
ip router ospf 1 area 0.0.0.0
ip pim sparse-mode

interface Ethernet1/1
no switchport
mtu 9000
medium p2p
ip unnumbered loopback0
ip router ospf 1 area 0.0.0.0
ip pim sparse-mode
no shutdown

interface Ethernet1/2
no switchport
mtu 9000
medium p2p
ip unnumbered loopback0
ip router ospf 1 area 0.0.0.0
ip pim sparse-mode
no shutdown

Leaf 2

router ospf 1
bfd
log-adjacency-changes

interface loopback0
ip address 172.24.1.4/32
ip router ospf 1 area 0.0.0.0
ip pim sparse-mode

interface Ethernet1/1
no switchport
mtu 9000
medium p2p
ip unnumbered loopback0
ip router ospf 1 area 0.0.0.0
ip pim sparse-mode
no shutdown

interface Ethernet1/2
no switchport
mtu 9000
medium p2p
ip unnumbered loopback0
ip router ospf 1 area 0.0.0.0
ip pim sparse-mode
no shutdown

Leaf 3

router ospf 1
bfd
log-adjacency-changes

interface loopback0
ip address 172.24.1.5/32
ip router ospf 1 area 0.0.0.0
ip pim sparse-mode

interface Ethernet1/1
no switchport
mtu 9000
medium p2p
ip unnumbered loopback0
ip router ospf 1 area 0.0.0.0
ip pim sparse-mode
no shutdown

interface Ethernet1/2
no switchport
mtu 9000
medium p2p
ip unnumbered loopback0
ip router ospf 1 area 0.0.0.0
ip pim sparse-mode
no shutdown

After we completed the OSPF and interface configuration, let’s take a look at the routing output.

Leaf1# show ip route
IP Route Table for VRF “default”
‘*’ denotes best ucast next-hop
‘**’ denotes best mcast next-hop
‘[x/y]’ denotes [preference/metric]
‘%<string>’ in via output denotes VRF <string>

172.24.1.1/32, ubest/mbest: 1/0
*via 172.24.1.1, Eth1/1, [110/5], 00:00:27, ospf-1, intra
172.24.1.2/32, ubest/mbest: 1/0
*via 172.24.1.2, Eth1/2, [110/5], 00:11:55, ospf-1, intra
172.24.1.3/32, ubest/mbest: 2/0, attached
*via 172.24.1.3, Lo0, [0/0], 00:14:29, local
*via 172.24.1.3, Lo0, [0/0], 00:14:29, direct
172.24.1.4/32, ubest/mbest: 2/0
*via 172.24.1.1, Eth1/1, [110/45], 00:00:27, ospf-1, intra
*via 172.24.1.2, Eth1/2, [110/45], 00:00:27, ospf-1, intra
172.24.1.5/32, ubest/mbest: 2/0
*via 172.24.1.1, Eth1/1, [110/45], 00:00:27, ospf-1, intra
*via 172.24.1.2, Eth1/2, [110/45], 00:00:27, ospf-1, intra

Up to this point, we configured the underlay network for a VXLAN. Now it’s time to talk about PIM in the VXLAN underlay. Vxlan BUM traffic (Broadcast, Unknown unicast and multicast) is sent as IP multicast. It means multicast reachability is required between VTEPs. When I was configuring OSPF, I enabled PIM sparse mode on each interface. As an example from Spine1 and Spine 2 you can see that PIM neighborship is established.

Spine1# show ip pim neighbor
PIM Neighbor Status for VRF “default”
Neighbor Interface Uptime Expires DR Bidir- BFD
ECMP Redirect
Priority Capable State
Capable
172.24.1.3 Ethernet1/1 00:36:45 00:01:16 1 yes n/a
no
172.24.1.4 Ethernet1/2 00:37:33 00:01:41 1 yes n/a
no
172.24.1.5 Ethernet1/3 00:38:21 00:01:43 1 yes n/a
no

Spine2# show ip pim neighbor
PIM Neighbor Status for VRF “default”
Neighbor Interface Uptime Expires DR Bidir- BFD
ECMP Redirect
Priority Capable State
Capable
172.24.1.3 Ethernet1/1 00:50:26 00:01:21 1 yes n/a
no
172.24.1.4 Ethernet1/2 00:47:35 00:01:38 1 yes n/a
no
172.24.1.5 Ethernet1/3 00:43:59 00:01:24 1 yes n/a
no

After the enablement of PIM sparse on the interfaces,  establish PIM bidir reachability between VTEPs. Spine 1  loopback IP is going to be a Rendezvous point. After I paste this command on each Spine and Leaf, everyone agrees that Spine 1 is our Rendezvous point. 

ip pim rp-address 172.24.1.1 group-list 224.0.0.0/4 bidir

Leaf1# show ip pim rp
PIM RP Status Information for VRF “default”
BSR disabled
Auto-RP disabled
BSR RP Candidate policy: None
BSR RP policy: None
Auto-RP Announce policy: None
Auto-RP Discovery policy: None

RP: 172.24.1.1, (1),
uptime: 00:00:44 priority: 255,
RP-source: (local),
group ranges:
224.0.0.0/4 (bidir)

Now it is time to map VLAN to VXLAN.

Leaf 1
vlan 10
vn-segment 10010

Leaf 2
vlan 10
vn-segment 10010

Leaf 3
Vlan 20
vn-segment 10010

The next step is: Create a network virtualization edge (NVE) interface and specify VTEP source, indicate vni membership and specify a multicast group for BUM replication.

Leaf 1
interface nve1
source-interface loopback0
member vni 10010
mcast-group 225.1.2.3
no shutdown

Leaf 2
interface nve1
source-interface loopback0
member vni 10010
mcast-group 225.1.2.3
no shutdown

Leaf 3
interface nve1
source-interface loopback0
member vni 10010
mcast-group 225.1.2.3
no shutdown

In each device (or Leaf) you can create only 1 NVE interface. On condition that you need to bridge another VLAN, you have to assign a new segment number to that VLAN and attach it to a new mcast group.

Leaf1# show inter status | in nve
nve1 — connected — auto auto —

Leaf2# show inter status | in nve
nve1 — connected — auto auto —

Leaf3# show inter status | in nve
nve1 — connected — auto auto —

Currently, Host 1 is pinging to Host 2 over the VXLAN tunnel.
Host1#ping 192.168.10.2 repeat 10
Type escape sequence to abort.
Sending 10, 100-byte ICMP Echos to 192.168.10.2, timeout is 2 seconds:
!!!!!!!!!!
Success rate is 100 percent (10/10), round-trip min/avg/max = 30/33/37 ms
Host1#ping 192.168.10.3 repeat 10
Type escape sequence to abort.
Sending 10, 100-byte ICMP Echos to 192.168.10.3, timeout is 2 seconds:
!!!!!!!!!!
Success rate is 100 percent (10/10), round-trip min/avg/max = 29/32/43 ms
Host1#

As you can see, VTEP tunnels are up and running and we enabled the VXLAN bridging.
Common verifications are:
Show interface nve id
show mac address-table
Show nve peer
Show nve vni

Leaf1# show nve vni
Codes: CP – Control Plane DP – Data Plane
UC – Unconfigured SA – Suppress ARP
SU – Suppress Unknown Unicast

Interface VNI Multicast-group State Mode Type [BD/VRF] Flags
——— ——– —————– —– —- —————— —–
nve1 10010 225.1.2.3 Up DP L2 [10]

Leaf2# show nve vni
Codes: CP – Control Plane DP – Data Plane
UC – Unconfigured SA – Suppress ARP
SU – Suppress Unknown Unicast

Interface VNI Multicast-group State Mode Type [BD/VRF] Flags
——— ——– —————– —– —- —————— —–
nve1 10010 225.1.2.3 Up DP L2 [10]

Leaf3# show nve vni
Codes: CP – Control Plane DP – Data Plane
UC – Unconfigured SA – Suppress ARP
SU – Suppress Unknown Unicast

Interface VNI Multicast-group State Mode Type [BD/VRF] Flags
——— ——– —————– —– —- —————— —–
nve1 10010 225.1.2.3 Up DP L2 [20]

                                                                                                                                                       

Author Kamil Rasulov CCIE#53983

Comment (1)

  1. Anonymus

    Hello to all, the contents present at this web site are really remarkable for people experience, well, keep up the nice work fellows.

    01/25/2020 at 11:53 am
    |Reply

Leave your thought here

Your email address will not be published.