Latest Tweets

The libvirtd nightmare

Preamble

Coming from outdated OpenVZ containers, one would surely think that migrating every one of those old containers to virtual machines with native and mainstream Linux Kernel KVM technology would be stable, easy to manage and absolutely not prone to corruption, kernel panics and the like.

Dream on.

The chosen setup: one powerful server with lots of RAM and hard-disk space running on Debian Stretch with an AMD Ryzen 7 processor having native SVM and IOMMU support, native KVM support coming right out from the stable Kernel shipped with Debian out-of-the-box, and libvirtd to make things easier to manage. So far so good, we managed (easily) to migrate the old-fashioned OpenVZ containers to native KVM machines, and we tested them successfully using qemu-system. After that, some trivial stuff like using virt-install to install them and virsh list, virsh start and virsh autostart commands to integrate them within the libvirtd daemon. And, by the way, every single VM running as a non-privileged user.

All of the VMs were running like a charm and smoothly thanks to their bare-metal fast solid state disk. Stable and reliable.

Dream on.

And then, all hell broke loose

After some time with all the VMs running with no issues at all, I logged myself to the host server using the non-privileged account the VMs were running as. Everything looked normal: the total amount of physical RAM used, the pagination and so on. Then, by the heck of me, I escalated privileges (I performed a su command) to perform some basic sysadmin stuff on the box. At some point, instead of returning to the non-privileged shell with <CTRL+d>, exit or whatever, I did this:

su – kvmuser

Then, magically, another instance of every single VM started up all at once, for all those VMs marked as “autostart” by libvirtd. So in no time at all, all the VMs started complaining about data integrity and instability on their file systems. No wonder why, because we had two instances of qemu-system per VM, accessing the very same qcow2 image! In no time at all, all the filesystems were mounted read-only for the first VM instances. At some point, funny things started to happen and data corruption became the norm. Chaos aplenty. Data loss and an incredible headache as I have never experienced in my whole life as a Sysadmin.

What the fuck man?

What is wrong with libvirtd

Once the VMs were restored and everything was again under perfect control, I did some research and tests. Of course we had two main problems here, the first one being not having a locking system preventing the VMs from accesing read/write the very same image twice. Heck, as far as I’m concerned, there are different ways to implement that. The second issue here was, of course, why on earth libvirtd was starting the already running VMs twice.

On another box mirroring the exact same setup, I logged as kvmuser, listed the running VMs, then became root and performed the su – kvmuser command and ran virsh list again to list the running VMs. Instead of getting the same ones, I got a suspicious and buggy empty list! See and marvel:

kvmuser@kvmbox:~$ virsh list
Id Name State
—————————————————-
1 Test running

kvmuser@kvmbox:~$ su
Password:
root@kvmbox:/home/kvmuser# su – kvmuser
kvmuser@kvmbox:~$ virsh list
Id Name State
—————————————————-

So that was it. The second virsh list command was not seeing the same VM “Test” running, although with a virst list –all it did see all the installed ones alright.

Tracing the socket

The virsh command connects to the libvirtd daemon, of course. And libvirtd is listening for connections from clients on a socket. Unless you tell libvirtd otherwise, it listens to connections on a UNIX socket. So I used strace to determine if the virsh command was using the same socket, something I suspected was not the case. The first time, when virsh was correctly seeing the “Test” VM running, this is what I got:

connect(5, {sa_family=AF_UNIX, sun_path=”/var/run/nscd/socket”}, 110) = -1 ENOENT (No such file or directory)
connect(5, {sa_family=AF_UNIX, sun_path=”/var/run/nscd/socket”}, 110) = -1 ENOENT (No such file or directory)
connect(6, {sa_family=AF_UNIX, sun_path=”/run/user/1001/libvirt/libvirt-sock”}, 110) = 0

So the socket was /run/user/1001/libvirt/libvirt-sock. Then, I became root and then I performed the su – kvmuser again, and I traced the second virsh list command:

connect(5, {sa_family=AF_UNIX, sun_path=”/var/run/nscd/socket”}, 110) = -1 ENOENT (No such file or directory)
connect(5, {sa_family=AF_UNIX, sun_path=”/var/run/nscd/socket”}, 110) = -1 ENOENT (No such file or directory)
connect(6, {sa_family=AF_UNIX, sun_path=”/home/kvmuser/.cache/libvirt/libvirt-sock”}, 110) = 0

The second time, the UNIX socket used was a different one, which explained why the virsh command did not see the same running VMs as the first one!

A trivial fix

According to libvirtd manpage:

$XDG_RUNTIME_DIR/libvirt/libvirt-sock
The socket libvirtd will use.

If $XDG_RUNTIME_DIR is not set in your environment, libvirtd will use $HOME/.cache

So, the first time, $XDG_RUNTIME_DIR was correctly set, and thus the socket was created on the /run/user/UID/ directory. After executing su and then su – kmvuser, though, this variable was not set at all, therefore libvirtd was using the UNIX socket under /home/$USER/.cache, as shown above.

This absolutely trivial thing rendered all of our KVMs corrupt in a matter of seconds. To fix it, I added the following line to the .bashrc file of the “kvmuser” user:

export XDG_RUNTIME_DIR=/run/user/$UID

And now, everything (but locking) works like a charm.

So far.