Neutron Layer 3 High Availability

L3 Agent Low Availability

Today, you can utilize multiple network nodes to achieve load sharing, but not high availability or redundancy. Assuming three network nodes, creation of new routers will be scheduled and distributed amongst those three nodes. However, if a node drops, all routers on that node will cease to exist as well as any traffic normally forwarded by those routers. Neutron, in the Icehouse release, doesn’t support any built-in solution.

A Detour to the DHCP Agent

DHCP agents are a different beast altogether – The DHCP protocol allows for the co-existence of multiple DHCP servers all serving the same pool, at the same time.

By changing:

neutron.conf:
dhcp_agents_per_network = X

You will change the DHCP scheduler to schedule X DHCP agents per network. So, for a deployment with 3 network nodes, and setting dhcp_agents_per_network to 2, every Neutron network will be served by 2 DHCP agents out of 3. How does this work?

dhcp_ha_topology

First, let’s take a look at the story from a baremetal perspective, outside of the cloud world. When the workstation is connected to a subnet in the 10.0.0.0/24 subnet, it broadcasts a DHCP discover. Both DHCP servers dnsmasq1 and dnsmasq2 (Or other implementations of a DHCP server) receive the broadcast and respond with an offer for 10.0.0.2. Assuming that the first server’s response was received by the workstation first, it will then broadcast a request for 10.0.0.2, and specify the IP address of dnsmasq1 – 10.0.0.253. Both servers receive the broadcast, but only dnsmasq1 responds with an ACK. Since all DHCP communication is via broadcasts, server 2 also receives the ACK, and can mark 10.0.0.2 as taken by AA:BB:CC:11:22:33, so as to not offer it to other workstations. To summarize, all communication between clients and servers is done via broadcasts and thus the state (What IPs are used at any given time, and by who) can be distributed across the servers correctly.

In the Neutron case, the assignment from MAC to IP is configured on each dnsmasq server beforehand, when the Neutron port is created. Thus, both dnsmasq leases file will hold the AA:BB:CC:11:22:33 to 10.0.0.2 mapping before the DHCP request is even broadcast. As you can see, DHCP HA is supported at the protocol level.

Back to the Lowly Available L3 Agent

L3 agents don’t (Currently) have any of these fancy tricks that DHCP offers, and yet the people demand high availability. So what are the people doing?

Rescheduling Routers Takes a Long, Long Time

All solutions listed suffer from a substantial failover time, if only for the simple fact that configuring a non-trivial amount of routers on the new node(s) takes quite a while. Thousands of routers take hours to finish the rescheduling and configuration process. The people demand fast failover!

Distributed Virtual Router

DVR has multiple documents explaining how it works:

The gist is that it moves routing to the compute nodes, rendering the L3 agent on the network nodes pointless. Or does it?

Ideally you would use DVR together with L3 HA. Floating IP traffic would be routed directly by your compute nodes, while SNAT traffic would go through the HA L3 agents on the network nodes.

Layer 3 High Availability

The Juno targeted L3 HA solution uses the popular Linux keepalived tool, which uses VRRP internally. First, then, let’s discuss VRRP.

What is VRRP, how does it work in the physical world?

Virtual Router Redundancy Protocol is a first hop redundancy protocol – It aims to provide high availability of the network’s default gateway, or the next hop of a route. What problem does it solve? In a network topology with two routers providing internet connectivity, you could assign half of the network’s default gateway to the first router’s IP address, and the other half to the second router.

router_ha_topology_before_vrrp

This would provide load sharing, but what happens if one router loses connectivity? Herein comes the idea of a virtual IP address, or a floating address, which will be configured as the network’s default gateway. During a failover, the standby routers won’t receive VRRP hello messages from the master and will thus perform an election process, with the winning router acting as the active gateway, and the others remain as standby. The active router configures the virtual IP address (Or VIP for short), on its internal, LAN facing interface, and responds to ARP requests with a virtual MAC address. The network computers already have entries in their ARP caches (For the VIP + virtual MAC address) and have no reason to resend an ARP request. Following the election process, the virtuous standby router becomes the new active instance, and sends a gratuitous ARP request – Proclaiming to the network that the VIP + MAC pair now belong to it. The switches comprising the network move the virtual MAC address from the old port to the new.

switch_moves_mac

By doing so, traffic to the default gateway will reach the correct (New) active router. Note that this approach does not accomplish load sharing, in the sense that all traffic is forwarded through the active router. (Note that in the Neutron use case, load sharing is not accomplished at the individual router level, but at the node level, assuming a non-trivial amount of routers). How does one accomplish load sharing at the router resolution? VRRP groups: The VRRP header includes a Virtual Router Identifier, or VRID. Half of the network hosts will configure the first VIP, and the other half the second. In the case of a failure, the VIP previously found on the failing router will transfer to another one.

router_ha_two_vrrp_groups

The observant reader will have identified a problem – What if the active router loses connectivity to the internet? Will it remain as the active router, unable to route packets? VRRP adds the capability to monitor the external link and relinquish its role as the active router in case of a failure.router_ha_external_trap

Note: As far as IP addressing goes, it’s possible to operate in two modes:

  1. Each router gets an IP address, regardless of its VRRP state. The master router is configured with the VIP as an additional  or secondary address.
  2. Only the VIP is configured. IE: The master router will hold the VIP while the slaves will have no IPs configured whatsoever.

VRRP – The Dry Facts

Back to Neutron-land

L3 HA starts a keepalived instance in every router namespace. The different router instances talk to one another via a dedicated HA network, one per tenant. This network is created under the blank tenant to hide it from the CLI and GUI. The HA network is a  Neutron tenant network, same as every other network, and uses the default segmentation technology. HA routers have an ‘HA’ device in their namespace: When a HA router is created, it is scheduled to a number of network nodes, along with a port per network node, belonging to the tenant’s HA network. keepalived traffic is forwarded through the HA device (As specified in the keepalived.conf file used by the keepalived instance in the router namespace). Here’s the output of ‘ip address’ in the router namespace:

[[email protected] ~]$ sudo ip netns exec qrouter-b30064f9-414e-4c98-ab42-646197c74020 ip address
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group default 
    ...
2794: **ha-45249562-ec**:  mtu 1500 qdisc noqueue state UNKNOWN group default 
    link/ether 12:34:56:78:2b:5d brd ff:ff:ff:ff:ff:ff
    inet 169.254.0.2/24 brd 169.254.0.255 scope global ha-54b92d86-4f
       valid_lft forever preferred_lft forever
    inet6 fe80::1034:56ff:fe78:2b5d/64 scope link 
       valid_lft forever preferred_lft forever
2795: qr-dc9d93c6-e2:  mtu 1500 qdisc noqueue state UNKNOWN group default 
    link/ether ca:fe🇩🇪ad:be:ef brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.1/24 scope global qr-0d51eced-0f
       valid_lft forever preferred_lft forever
    inet6 fe80::c8fe:deff:fead:beef/64 scope link 
       valid_lft forever preferred_lft forever
2796: qg-843de7e6-8f:  mtu 1500 qdisc noqueue state UNKNOWN group default 
    link/ether ca:fe🇩🇪ad:be:ef brd ff:ff:ff:ff:ff:ff
    inet 19.4.4.4/24 scope global qg-75688938-8d
       valid_lft forever preferred_lft forever
    inet6 fe80::c8fe:deff:fead:beef/64 scope link 
       valid_lft forever preferred_lft forever

That is the output for the master instance. The same router on another node would have no IP address on the ha, qr, or qg devices. It would have no floating IPs or routing entries. These are persisted as configuration values in keepalived.conf, and when keepalived detects the master instance failing, these addresses (Or: VIPs) are configured by keepalived on the appropriate devices. Here’s an example of keepalived.conf, for the same router shown above:

vrrp_sync_group VG_1 {
    group {
        VR_1
    }
    notify_backup "/path/to/notify_backup.sh"
    notify_master "/path/to/notify_master.sh"
    notify_fault "/path/to/notify_fault.sh"
}
vrrp_instance VR_1 {
    state BACKUP
    interface ha-45249562-ec
    virtual_router_id 1
    priority 50
    nopreempt
    advert_int 2
    track_interface {
        ha-45249562-ec
    }
    virtual_ipaddress {
        19.4.4.4/24 dev qg-843de7e6-8f
    }
    virtual_ipaddress_excluded {
        10.0.0.1/24 dev qr-dc9d93c6-e2
    }
    virtual_routes {
        0.0.0.0/0 via 19.4.4.1 dev qg-843de7e6-8f
    }
}

What are those notify scripts? These are scripts that keepalived executes upon transition to master, backup, or fault. Here’s the contents of the master script:

#!/usr/bin/env bash
neutron-ns-metadata-proxy --pid_file=/tmp/tmpp_6Lcx/tmpllLzNs/external/pids/b30064f9-414e-4c98-ab42-646197c74020/pid --metadata_proxy_socket=/tmp/tmpp_6Lcx/tmpllLzNs/metadata_proxy --router_id=b30064f9-414e-4c98-ab42-646197c74020 --state_path=/opt/openstack/neutron --metadata_port=9697 --debug --verbose
echo -n master > /tmp/tmpp_6Lcx/tmpllLzNs/ha_confs/b30064f9-414e-4c98-ab42-646197c74020/state

The master script simply opens up the metadata proxy, and writes the state to a state file, which can be later read by the L3 agent. The backup and fault scripts kill the proxy and write their respective states to the aforementioned state file. This means that the metadata proxy will be live only on the master router instance.

* Aren’t We Forgetting the Metadata Agent?

Simply enable the agent on every network node and you’re good to go.

Future Work & Limitations

Usage & Configuration

neutron.conf:
l3_ha = True
max_l3_agents_per_router = 2
min_l3_agents_per_router = 2

l3_ha controls the default, while the CLI allows an admin (And only admins) to override that setting on a per router basis:

neutron router-create --ha= router1

Comments

comments powered by Disqus