Scott's Weblog The weblog of an IT pro focusing on cloud computing, Kubernetes, Linux, containers, and networking

A Use Case for Policy Routing with KVM and Open vSwitch

In an earlier post, I provided an introduction to policy routing as implemented in recent versions of Ubuntu Linux (and possibly other distributions as well), and I promised that in a future post I would provide a practical application of its usage. This post looks at that practical application: how—and why—you would use Linux policy routing in an environment running OVS and a Linux hypervisor (I’ll assume KVM for the purposes of this post).

Before I get into the “how,” let’s first discuss the “why.” Let’s assume that you have a KVM+OVS environment and are leveraging tunnels (GRE or other) for some guest domain traffic. Recall from my post on traffic patterns with Open vSwitch that tunnel traffic is generated by the OVS process itself, and therefore is controlled by the Linux host’s IP routing table with regard to which interfaces that tunnel traffic will use. But what if you need the tunnel traffic to be handled differently than the host’s management traffic? What if you need a default route for tunnel traffic that uses one interface, but a different default route for your separate management network that uses its own interface? This is why you would use policy routing in this configuration. Using source routing (i.e., policy routing based on the source of the traffic), you could easily define a table for tunnel traffic that has its own default route while still allowing management traffic to use the host’s default routing table.

Let’s take a look at how it’s done. In this example, I’ll make the following assumptions:

  • I’ll assume that you’re running host management traffic through OVS, as I outlined here. I’ll use the name mgmt0 to refer to the management interface that’s running through OVS for host management traffic. We’ll use the IP address 192.168.100.10 for the mgmt0 interface.

  • I’ll assume that you’re running tunnel traffic through an OVS interface interface named tep0. (This helps provide some consistency with my walk-through on using GRE tunnels with OVS.) We’ll use the IP address 192.168.200.10 for the tep0 interface.

  • I’ll assume that the default gateway on each subnet uses the .1 address on that subnet.

With these assumptions out of the way, let’s look at how you would set this up.

First, you’ll create a custom policy routing table, as outlined here. I’ll use the name “tunnel” for my new table:

echo 200 tunnel >> /etc/iproute2/rt_tables

Next, you’ll need to modify /etc/network/interfaces for the tep0 interface so that a custom policy routing rule and custom route are installed whenever this interface is brought up. The new configuration stanza would look something like this:

auto tep0
iface tep0 inet static
  address 192.168.200.10
  netmask 255.255.255.0
  network 192.168.200.0
  broadcast 192.168.200.255
  post-up ip rule add from 192.168.200.10 lookup tunnel
  post-up ip route add default via 192.168.200.1 dev tep0 table tunnel

(Click here for the same information as a GitHub Gist.)

Finally, you’ll want to ensure that mgmt0 is properly configured in /etc/network/interfaces. No special configuration is required there, just the use of the gateway directive to install the default route. Ubuntu will install the default route into the main table automatically, making it a “system-wide” default route that will be used unless a policy routing rule dictates otherwise.

With this configuration in place, you now have a system that:

  • Can communicate via mgmt0 with other systems in other subnets via the default gateway of 192.168.100.1.

  • Can communicate via tep0 to establish tunnels with other hypervisors in other subnets via the 192.168.200.1 gateway.

This configuration requires only the initial configuration (which could, quite naturally, be automated via a tool like Puppet) and does not require using additional routes as the environment scales to include new subnets for other hypervisors (either for management or tunnel traffic). Thus, organizations can use recommended practices for building scalable L3 networks with reasonably-sized L2 domains without sacrificing connectivity to/from the hypervisors in the environment.

(By the way, this is something that is not easily accomplished in the vSphere world today. ESXi has only a single routing table for all VMkernel interfaces, which means that management traffic, vMotion traffic, VXLAN traffic, etc., are all bound by that single routing table. To achieve full L3 connectivity, you’d have to install specific routes into the VMkernel routing table on each ESXi host. When additional subnets are added for scale, each host would have to be touched to add the additional route.)

Hopefully this gives you an idea of how Linux policy routing could be effectively used in environments leveraging virtualization, OVS, and overlay protocols. Feel free to add your thoughts, idea, corrections, or questions in the comments below. Courteous comments are always welcome! (Please disclose vendor affiliations where applicable.)

Metadata and Navigation

Be social and share this post!