Attention
Please review the active-active topology blueprint first ( Active-Active, N+1 Amphorae Setup )
https://blueprints.launchpad.net/octavia/+spec/l3-active-active
This blueprint describes a L3 active-active distributor implementation to support the Octavia active-active-topology. The L3 active-active distributor will leverage the capabilities of a layer 3 Clos network fabric in order to distribute traffic to an Amphora Cluster of 1 or more amphoras. Specifically, the L3 active-active distributor design will leverage Equal Cost Multipath Load Sharing (ECMP) with anycast routing to achieve traffic distribution across the Amphora Cluster. In this reference implementation, the BGP routing protocol will be used to inject anycast routes into the L3 fabric.
In order to scale a single VIP address across multiple active amphoras it is required to have a distributor to balance the traffic. By leveraging the existing capabilities of a modern L3 network, we can use the network itself as the distributor. This approach has several advantages, which include:
Note: Items marked with [P2] refer to lower priority features to be designed / implemented only after initial release.
The below diagram shows the interaction between 2 .. n amphora instances from each tenant and how they interact with the L3 network distributor.
Management Front-End
Internet Network Networks
(World) ║ (provider)
║ ║ ┌─────────────────────────────┐ ║
║ ║ │ Amphora of Tenant A │ ║
┌──╨──────────┐ ║ ┌────┬┴──────────┬──────────────────┴┬───╨┐
│ │ ╠══════╡MGMT│ns: default│ns: amphora-haproxy│f.e.│
│ │ ║ │ IP ├-----------┼-------------------┤ IP │
│ │ ║ └────┤ BGP │ Anycast VIP ├───╥┘
│ │ ║ │ Speaker │ (loopback) │ ║
│ │ ║ └───────────┴──────────────╥────┘ ║
│ │ ║ | ║ ║
│ │ ║ | ║ ║
│ │ Peering Session 1..* | ║ ║
│ │---------------------------+ ║ ║
│ │ {anycast VIP}/32 next-hop {f.e. IP} ║ ║
│ │ ║ ║ ║
│ │ ║ ┌─────────────────────────╨───┐ ║
│ │ ║ │ Amphora of Tenant B │ ║
│ │ ║ ┌────┬┴──────────┬──────────────────┴┬───╨┐
│ ╞════════╬══════╡MGMT│ns: default│ns: amphora-haproxy│f.e.│
│ │ ║ │ IP ├-----------┼-------------------┤ IP │
│ │ ║ └────┤ BGP │ Anycast VIP ├───╥┘
│ │ ║ │ Speaker │ (loopback) │ ║
│ │ ║ └───────────┴──────────────╥────┘ ║
│ Distributor │ ║ | ║ ║
│ (L3 Network)│ ║ | ║ ║
│ │ Peering Session 1..* | ║ ║
│ │---------------------------+ ║ ║
│ │ {anycast VIP}/32 next-hop {f.e. IP} ║ ║
│ │ ║ ║ ║
│ │ ║ ┌─────────────────────────╨───┐ ║
│ │ ║ │ Amphora of Tenant C │ ║
│ │ ║ ┌────┬┴──────────┬──────────────────┴┬───╨┐
│ │ ╚══════╡MGMT│ns: default│ns: amphora-haproxy│f.e.│
│ │ │ IP ├-----------┼-------------------┤ IP │
│ │ └────┤ BGP │ Anycast VIP ├────┘
│ │ │ Speaker │ (loopback) │
│ │ └───────────┴──────────────╥────┘
│ │ | ║
│ │ | ║
│ │ Peering Session 1..* | ║
│ │---------------------------+ ║
│ │ {anycast VIP}/32 next-hop {f.e. IP} ║
│ │ ║
│ ╞═══════════════════════════════════════════════Anycast
└─────────────┘ 1..* Network
Whenever a new active-active amphora is instantiated it will create BGP
peering session(s) over the lb-mgmt-net to the L3 fabric. The BGP peer will
need to have a neighbor definition in order to allow the peering sessions
from the amphoras. In order to ease configuration, a neighbor statement
allowing peers from the entire lb-mgmt-net IP prefix range can be defined:
neighbor 10.10.10.0/24
The BGP peer IP can either be a route reflector (RR) or any other network
device that will redistribute routes learned from the amphora BGP speaker.
In order to help scaling, it is possible to peer with the ToR switch based on
the rack the amphora instance is provisioned in. The configuration can be
simplified by creating an anycast loopback interface
on each ToR switch,
which will provide a consistent BGP peer IP regardless of which rack or
hypervisor is hosting the amphora instance.
Once a peering session is established between an amphora and the L3 fabric, the amphora will need to announce its anycast VIP with a next-hop address of its front-end network IP. The front-end network IP (provider) must be routable and reachable from the L3 network in order to be used.
In order to leverage ECMP for distributing traffic across multiple amphoras,
multiple equal-cost routes must be installed into the network for the anycast
VIP. This requires the L3 network to have Multipath BGP
enabled, so BGP
installs multiple paths and does not select a single best path.
After the amphoras in a cluster are initialized there will be an ECMP group with multiple equal-cost routes for the anycast VIP. The data flow for traffic is highlighted below:
- Traffic will ingress into the L3 network fabric with a destination IP address of the anycast VIP.
- If this is a new flow, the flow will get hashed to one of the next-hop addresses in the ECMP group.
- The packet will get sent to the front-end IP address of the amphora instance that was selected from the above step.
- The amphora will accept the packet and send it to the back-end server over the front-end network or a directly attached back-end (tenant) network attached to the amphora.
- The amphora will receive the response from the back-end server and forward it on to the next-hop gateway of front-end (provider) network using the anycast VIP as the source IP address.
- All subsequent packets belonging to the same flow will get routed through the same path.
Adding or removing members to a L3 active-active amphora cluster will result
in flow remapping, as different paths will be selected due to rehashing. It
is recommended to enable the resilient hashing
feature on ECMP groups in
order to minimize flow remapping.
The below diagram shows the interaction between an amphora instance that is serving as a distributor and the L3 network. In this example we are peering with the ToR switch in order to disseminate anycast VIP routes into the L3 network.
+------------------------------------------------+
| Initialize Distributor on Amphora |
+------------------------------------------------+
| |
| +---------------+ +---------------+ |
| |1 | |4 | |
| | Amphora | | Ready to | |
| | (boot) | | announce | |
| | | | VIP(s) | |
| +-------+-------+ +-------+-------+ |
| | ^ |
| | | |
| | | |
| | | |
| | | |
| v | |
| +-------+-------+ +-------+-------+ |
| |2 | |3 Establish | |
| | Read Config | | BGP connection| |
| | Drive +----------->+ to ToR(s) | |
| | (BGP Config) | | (BGP Speaker) | |
| +---------------+ +---------------+ |
| |
+------------------------------------------------+
+------------------------------------------------+
| Register AMP to Distributor or Listener Start |
+------------------------------------------------+
| |
| +---------------+ +---------------+ |
| |5 | |8 | |
| | Amphora | | Amphora | |
| | BGP Speaker | | (Receives VIP | |
| |(Announce VIP) | | Traffic) | |
| +-------+-------+ +-------+-------+ |
| | ^ |
| | | |
| |BGP Peering | |
| |Session(s) | |
| | | |
| v | |
| +-------+-------+ +-------+-------+ |
| |6 | |7 | |
| | ToR(s) | | L3 Fabric | |
| |(Injects Route +----------->+ Accepts Route | |
| | into Fabric) | | (ECMP) | |
| +---------------+ +---------------+ |
| |
+------------------------------------------------+
+------------------------------------------------+
| Unregister AMP to Distributor or Listener Stop |
+------------------------------------------------+
| |
| +---------------+ +---------------+ |
| |9 | |12 | |
| | Amphora | | Amphora | |
| | BGP Speaker | |(No longer sent| |
| |(Withdraw VIP) | | VIP traffic) | |
| +-------+-------+ +-------+-------+ |
| | ^ |
| | | |
| |BGP Peering | |
| |Session(s) | |
| | | |
| v | |
| +-------+-------+ +-------+-------+ |
| |10 | |11 | |
| | ToR(s) | | L3 Fabric | |
| |(Removes Route +----------->+ Removes Route | |
| | from Fabric) | | (ECMP) | |
| +---------------+ +---------------+ |
| |
+------------------------------------------------+
TBD
Add the following columns to the existing vip
table:
(String(36) , nullable=True)
Add table distributor
with the following columns:
(String(36) , nullable=False)
(String(36) , nullable=False)
L3_BGP
.(String(36) , nullable=True)
Update existing table amphora
. An amphora can now serve as a distributor,
lb, or both. The vrrp_* tables will be renamed to frontend_* in order to make
the purpose of this interface more apparent and to better represent other use
cases besides active/standy.
(String(36) , nullable=True)
(String(36) , nullable=True)
(String(64) , nullable=True)
(String(36) , nullable=True)
(String(36) , nullable=True)
(String(16) , nullable=True)
(Integer , nullable=True)
(Integer , nullable=True)
Use existing table amphora_health
with the following columns:
(String(36) , nullable=False)
(DateTime , nullable=False)
(Boolean , nullable=False)
Add table amphora_registration
with the below columns. This table
determines the role of the amphora. The amphora can be dedicated as a
distributor, load balancer, or perform a combined role of load balancing and
distributor. A distributor amphora can be registered to multiple load
balancers.
(String(36) , nullable=False)
(String(36) , nullable=False)
(String(36) , nullable=True)
Add table distributor_l3_bgp_speaker
with the following columns:
(String(36) , nullable=False)
(Integer , nullable=False)
4
or 6
.(Integer , nullable=False)
Add table distributor_l3_bgp_peer
with the following columns:
(String(36) , nullable=False)
(String(64) , nullable=False)
(Integer , nullable=False)
(String(16) , nullable=True)
md5
. An additional parameter will need to
be set in the octavia configuration file by the admin to set the md5
authentication password that will be used with the md5 auth type.(Integer , nullable=True)
1-254
.(Integer , nullable=True)
(Integer , nullable=True)
Add table distributor_l3_bgp_peer_registration
with the following columns:
(String(36) , nullable=False)
(String(36) , nullable=False)
Add table distributor_l3_amphora_bgp_speaker_registration
with the
following columns:
(String(36) , nullable=False)
(String(36) , nullable=False)
Add table distributor_l3_amphora_vip_registration
with the following
columns:
(String(36) , nullable=False)
(String(36) , nullable=False)
(String(64) , nullable=False)
(String(36) , nullable=True)
The below extended amphora API calls will be implemented for amphoras running as a dedicated distributor:
Register Amphora
This call will result in the BGP speaker announcing the anycast VIP into the L3 network with a next-hop of the front-end IP of the amphora being registered. Prior to this call, the load balancing amphora will have to configure the anycast VIP on the loopback interface inside the amphora-haproxy namespace.
Unregister Amphora
The BGP speaker will withdraw the anycast VIP route for the specified amphora from the L3 network. After the route is withdrawn, the anycast VIP IP will be removed from the loopback interface on the load balancing amphora.
List Amphora
Will return a list of all amphora IDs and their anycast VIP routes currently being advertised by the BGP speaker.
[P2] Drain Amphora
All new flows will get redirected to other members of the cluster and existing flows will be drained. Once the active flows have been drained, the BGP speaker will withdraw the anycast VIP route from the L3 network and unconfigure the VIP from the lo interface.
[P2] Register VIP
This call will be used for registering anycast routes for non-amphora endpoints, such as for UDP load balancing.
[P2] Unregister VIP
This call will be used for unregistering anycast routes for non-amphora endpoints, such as for UDP load balancing.
[P2] List VIP
Will return a list of all non-amphora anycast VIP routes currently being advertised by the BGP speaker.
The distributor inherently supports multi-tenancy, as it is simply providing traffic distribution across multiple amphoras. Network isolation on a per tenant basis is handled by the amphoras themselves, as they service only a single tenant. Further isolation can be provided by defining separate anycast network(s) on a per tenant basis. Firewall or ACL policies can then be built around these prefixes.
To further enhance BGP security, route-maps, prefix-lists, and communities to control what routes are allowed to be advertised in the L3 network from a particular BGP peer can be used. MD5 password and GTSM can provide additional security to limit unauthorized BGP peers to the L3 network.
The API-Ref documentation will need to be updated for load balancer create. An additional optional parameter frontend_network_id will be added. If set, this parameter will result in the primary interface inside the amphora-haproxy namespace getting created on the specified network. Default behavior is to provision this interface on the VIP subnet.
Except where otherwise noted, this document is licensed under Creative Commons Attribution 3.0 License. See all OpenStack Legal Documents.