OpenStack + LXC + OpenContrail

Update: Patch is available in nova-contrail-vif-driver .

The last two days I worked on the integration of LXC into OpenContrail and OpenStack (icehouse). OpenStack does not support LXC in the first place. However, it is still included into nova and working via the libvirt driver. So I edited /etc/nova/nova.confand enabled lxc for libvirt.

[libvirt]

vif_driver=nova_contrail_vif.contrailvif.VRouterVIFDriver
virt_type=lxc

First I just tried to boot a usual kvm-Image, and it turned out, that it almost works. For lxc the image is mounted via qemu-nbdsomewhere to /var/lib/nova/instances/7849061d-740f-4727-9fa8-eca84bb3d77b/rootfs, but nova does not take care about partitions. So the mount failed:

2014-08-22 14:37:28.456 28050 ERROR nova.virt.disk.api [req-4982f8a8-cb29-4080-a0be-3a40d5a3e197 eb4ba78065e04df194d3a28f98a6eeac eaef37b7d8c24725bfc8c4e4090b4d97] Failed to mount container filesystem '<nova.virt.disk.api._DiskImage object at 0x43d5350>' on '/var/lib/nova/instances/77d6a580-eaba-4e8f-99e2-f9810df74f24/rootfs':

--
Failed to mount filesystem: Unexpected error while running command.
Command: sudo nova-rootwrap /etc/nova/rootwrap.conf mount /dev/nbd8 /var/lib/nova/instances/77d6a580-eaba-4e8f-99e2-f9810df74f24/rootfs
Exit code: 32
Stdout: ''
Stderr: 'mount: block device /dev/nbd8 is write-protected, mounting read-only\nmount: you must specify the filesystem type\n'

Not much to do to fix this. Edit /usr/lib/python2.7/dist-packages/nova/virt/disk/api.pyline 380 and add partition=1:

img = _DiskImage(image=image, use_cow=use_cow, mount_dir=container_dir, partition=1)

Restarting nova-compute. The image can be mounted, but the instance will not start because of the network anyway.

I created a VIF Driver to install necessary devices. It just bases on the Nova Driver of OpenContrail, but uses bridges instead of tap-devices. https://github.com/chrigl/nova-lxc-contrail-vif

Install this driver:

$ git clone git@github.com:chrigl/nova-lxc-contrail-vif.git

$ cd nova-lxc-contrail-vif
$ pip install .

And enable it in /etc/nova/nova.conf:

[libvirt]

vif_driver=nova_lxc_contrail_vif.contrailvif.VRouterVIFDriver
virt_type=lxc

Basically thats it. The instances should spawn and get an internal ip address. If they are in the same network, they should reach each other.

This is totally just a proof of concept, and I didn't do any tests within those containers. So I don't know anything about the forced limitations, like RAM, CPU or disk space, yet.

However. It was fun (and a bit painful for sure) digging into several parts of OpenContrail, libvirt, lxc and OpenStack. After the unsuccessful tests in OpenStack, I decided to start from the bottom up. First I installed lxc to get a feeling for how it is working, how limits are set, and especially how network interfaces are assigned to a container. Hint. Yes, it is possible to assign bare ethernet devices into a container. I will describe later, why I had to implement the stuff using brigdes.

Stepping one layer up, I defined some LXCs by using bare libvirt without any OpenStack on top. First using the libvirt default network, then using bridges, because it is well documented. Trying to assign a real ethernet device failed, since this is not yet supported by libvirt o_O There is already a patch for libvirt: http://www.redhat.com/archives/libvir-list/2014-February/msg00234.html

I learned much so far. The next step for me, was having a look into the OpenContrail network driver for nova. By default this driver creates tap devices and pass them into kvm-VMs via virtio and passthrough. However, as I learned so far. Libvirt is not able to pass a network device to LXCs directly. So I came up with the idea, just setting a bridge onto the tap device. This didn't work in the first place either, since OpenContrail decides about the state of the VM by getting the state of the tap device (UP|DOWN). But since the LXC was not running directly ontop of this device, it was DOWN all the time. Even if i tried to enable it with ip link set tabXYZ up. I started tcpdumpand saw data on my created bridge, but not on the device, assigned to the bridge!? So the really really last try for today was, registering the bridge instead of the tap device to OpenContrail. And it magically worked!

Fun fact: After assigning the bridge to OpenContrail, the created LXCs behaved strange. Really strange. So they start, but didn't find the metadata service. It felt like sometimes it worked and sometimes not. While debugging with pinging the gateway ip, it turned out, network works if tcpdump is running. If I stopped my tcpdump session, the ping stucks. It took me some time to realize tcpdump is switching promiscous mode on start and end. The quite simple solution: turning permiscous mode on by default.

[update] If you build your own images make sure it contains /sbin/init. nova uses this as init.

<domain type='lxc' id='22415'>

  [...]
  <os>
    <type arch='x86_64'>exe</type>
    <init>/sbin/init</init>
    <cmdline>console=tty0 console=ttyS0 console=ttyAMA0</cmdline>
  </os>
  [...]
</domain>