Virtualization basics and an introduction to KVM

Illustration of a person with VR headset reaching out towards a glowing geometric shape, suggesting interaction with a virtual reality environment. Illustration of a person with VR headset reaching out towards a glowing geometric shape, suggesting interaction with a virtual reality environment.

Let's assume that you are young, but still poor student, and that means you have only PC on Windows and PS4 from all possible platforms. One fine day you decide to shape up and become a programmer, but wise people from the Internet have told you that you won't become a good engineer without Linux. You can't set up Fedora as your only core system, because you still need Windows for games and facebook, and the lack of experience or simply fear keeps you from installing Linux as the second system.

Or, let's assume you have already grown up, now you're the lead on servers in a big company, and one day you notice that most of the servers aren't even half loaded. You can't place more apps and data on servers for security reasons, and the cost of maintaining and supporting the growing farm of servers keeps increasing.

Or, say, you already have a beard and glasses, you're an engineering director, and you aren't satisfied with the fact that in order to deploy a new application, developers have to wait the whole two months for a new server. How to make progress in such circumstances?

Or, maybe, you're a designer, who have engineered a new complex system for business analytics processing. Your system contains such things as ElasticSearch, Kafka, Spark and many many others, and every component must be separate from each other, be configured wisely and communicate with other components. You're a good engineer, you understand that it's not enough just to install all this zoo on your system. You need to try to deploy maximally close to the future production environment, and, if possible, make it so that your groundwork would then work seamlessly on production servers.

So, what should you do in all these delicate situations? Right: use virtualization.

It is virtualization that allows to install a lot of completely isolated from each other and working side by side operating systems on one and the same hardware.

A little bit of history. The first virtualization technologies came up in 60s already, but the real demand for them arose only in 90s, while the number of servers was increasing. That's when the need for effective utilization of all the hardwares, optimization of updating processes, application deployment, security assurance and system recovery in case of some catastrophe, arose.

Let's leave behind the long and painful history of different technologies and virtualization methods development – a curious reader will find supplementary materials on it in the end of the article. The important thing is what all this has ended up in: the three main virtualization approaches.

DevOps consulting: DevOps is a cultural and technological journey. We'll be thrilled to be your guides on any part of this journey. About consulting

Approaches to virtualization

No matter the approach and the technology, in virtualization there is always a host machine and a hypervisor, which runs guest-machines, installed on it.

Depending on the technology used, a hypervisor can be both a separate software system installed directly on hardware, and a part of the OS.

A curious reader, who loves buzz words, after a couple of paragraphs will start to mumble that his favorite Docker-containers are virtualization, too. We will talk about container technologies next time, but yes, curious reader, you're right, containers are kind of virtualization, but on one and the same operating system resources level.

There are the three ways of communication between VMs and hardware:

Dynamic translation

In this case, VMs are not aware that they are, in fact, virtual. The hypervisor catches all commands from the VM on the fly and processes them, replacing with the safer ones, then returns them back to the VM. Such an approach, evidently, suffers some performance problems, but instead it allows to virtualize any OS, as the guest OS doesn't need modifications. Dynamic translation is used by VMWare – a leader in commercial virtualization software.

Paravirtualization

In case of paravirtualization, the source code of the guest OS is modified on purpose, so that all the directives would be executed in the most effective way and safely. At that, a VM is always aware that it's virtual. Advantages – improved performance. Disadvantages – you can't virtualize, say, MacOS or Windows, or any other OS, which sources you don't have the access to, this way. Paravirtualization is, one way or another, used in Xen and KVM, for example.

Hardware virtualization

The processors creators have realized in time, that x86 architecture doesn't suit well for virtualization, as it was initially intended only for one OS at a time. That's why, already after the dynamic translation from VMWare and the paravirtualization from Xen had appeared, Intel and AMD started to produce processors with hardware-assisted virtualization.

At first it didn't improve performance much, because the main focus of the first releases was on processors architecture improvement. However, now, more than 10 years after Intel VT-x and AMD-V had appeared, hardware virtualization doesn't concede but even succeed any other solutions.

And KVM (Kernel-based Virtual Machine), which we will use later, also uses and requires hardware virtualization.

Kernel-based Virtual Machine

KVM is a solution for virtualization, embedded directly in the Linux kernel, which doesn't concede any other solutions in functionality and succeeds them in usability. Moreover, KVM is an open-source technology, which, though, is pushed forward at full speed (both in terms of writing the code and marketing) by Red Hat and implemented in Red Hat's products.

This, by the way, is one of the many reasons why we insist on Red Hat distributives.

KVM creators, at first, focused on hardware virtualization and didn't start to re-invent other things. Hypervisor is, in itself, a small operating system, which has to be able to work with the memory, network, etc. Linux already does all this perfectly well, that's why using the Linux kernel as a hypervisor is a logical and effective solution. Every KVM is just one separate Linux process, whose safety is ensured using SELinux/sVirt, and resources are allocated using CGroups.

We will talk about SELinux and CGroups in another article, don't be scared if these words don't mean anything to you.

KVM isn't just working as a part of the Linux kernel: starting with a kernel of version 2.6.20, KVM is the main Linux component. In other words, if you have Linux installed, you already have KVM. Convenient, right?

It's worth mentioning that in the field of public cloud platforms Xen dominates more that completely. For instance, it is Xen that AWS, EC2 and Rackspace use. It's because Xen appeared earlier than others and first achieved a sufficient level of performance.

Despite the fact that KVM uses hardware virtualization, KVM can use paravirtualization for some I/O devices drivers, which ensures the increase in performance for certain use cases.

libvirt

We are already pretty close to the practical part of the article, all that's left is review one more open source tool: libvirt

libvirt is a set of tools providing one API to a lot of different virtualization technologies. When using libvirt, you don't have to worry about this complicated "bakend": Xen, KVM, VirtualBox, or whatever. Moreover, you can use libvirt inside of Ruby (and also Python, C++ and many other) programs. You can also remotely connect to virtual machines via secure channels.

By the way, Red Hat is the libvirt's developer. Have you already installed Fedora Workstation as your main OS?

Let's create a VM

libvirt is just an API, and it's up to user to decide how to deal with it. There are a lot of options. We will use several basic tools. Don't forget: we insist on using Red Hat distributives (CentOS, Fedora, RHEL), and the commands below were tested exactly on one of these systems. Other Linux distributives can have some differences.

First of all, let's check if hardware virtualization is supported. Actually, everything will work without it, too, but much slower.

egrep --color=auto 'vmx|svm|0xc0f' /proc/cpuinfo # if it doesn't print anything, then you don't have the support :(

KVM is a Linux kernel module, so we need to check if it is already downloaded and if not, then download it.

lsmod | grep kvm # kvm, kvm_intel, kvm_amd. If it doesn't print anything, you need to download new modules.

# If there is no module
modprobe kvm
modprobe kvm_intel # or modprobe kvm_amd

It is possible that hardware virtualization is not enabled in BIOS. That's why, if kvm_intel/kvm_amd modules don't load, check BIOS settings.

Now let's install the necessary packages. The easiest way to do this is by installing a group of packages at once:

yum group list "Virtual*"

The list of groups depends on the OS you use. Mine was called Virtualization. A tool virsh is used for managing virtual machines from a console. Check if you have at least one VM with the help of a command virsh list. It is likely that you don't.

If you don't like console, there is also a virt-manager – quite a convenient GUI for VMs.

virsh can create VMs only from XML files, whose format you can look at in libvirt documentation. Luckily, there is also a virt-manager and a command virt-install. You will explore GUI yourself, and here is the example of virt-install usage:

sudo virt-install --name mkdev-vm-0 \
--location ~/Downloads/CentOS-7-x86_64-Minimal-1511.iso \
--memory=1024 --vcpus=1 \
--disk size=8

Instead of specifying the disk size, you can create it in advance via virt-manager, or with the help of virsh and an XML file. Here above, I have used an iso with Centos 7 minimal, which is pretty easy to find on the Centos website.

Now, one important question remains: how to connect to created machine? The easiest way to do this is via virt-manager – it's enough to double click the created machine, and a window with SPICE connection will open. There you will see an OS installation screen.

By the way, KVM can do nested virtualization: a VM inside of a VM. We need to go deeper!

After you install the OS manually, you will immediately wonder how all this process can be automated. For this, we will need a tool named Kickstart, which is intended for OS automatic first configuration. This is a simple text file, in which you can specify the OS configuration, including different scripts to be executed after installation.

But where do I get one? Don't I have to write it from scratch? Of course, you don't: as we have already installed Centos 7 inside of our VM, we need just to connect to it and find a file /root/anaconda-ks.cfg – this is a Kickstart config for creating a copy of this OS. You just need to copy and edit it.

But it's not interesting just to copy the file, so we'll add a little something to it. You see, by default, we won't be able to connect to a VM console from a host-machine console. To do this, you need to edit config inside of a GRUB bootloader. That's why, let's add the following section at the end of a Kickstart file:

%post --log=/root/grubby.log
/sbin/grubby --update-kernel=ALL --args="console=ttyS0"
%end

It's not difficult to guess that %post will execute after the OS installation. A command grubby will update GRUB config, adding a possibility to connect to a console to it.

By the way, you can enable connecting to a console during the VM creation. For this, pass one more argument to the virt-install command. --extra-args="console=ttyS0". After this, you can install the OS itself via online text mode right from your host-machine console, connecting to a VM via virsh console immediately after its creation. It's especially convenient when you are creating VMs on a hardware remote server.

Now we can apply the config! virt-install allows to pass additional arguments, including the path to a Kickstart-file, when creating a VM.

sudo virt-install --name mkdev-vm-1 \
--location ~/Downloads/CentOS-7-x86_64-Minimal-1511.iso \
--initrd-inject /path/to/ks.cfg \
--extra-args ks=file:/ks.cfg \
--memory=1024 --vcpus=1 --disk size=8

After the second VM is created (completely automatically), you will be able to connect to it from a console using a command virsh console vm_id. You can see the vm_id from the list of all VMs using the command virsh list.

One of the KVM/libvirt advantages is a great documentation, including the one that was written by the company Red Hat. Dear reader is offered to study it with a proper curiosity.

Of course, creating virtual machines like this, manually from the console, and only then configure them using Kickstart is not the easiest way. We will review a lot of cool tools, which can simplify and completely automate a system configuration, in the next articles.

What's next?

It's impossible to tell everything you need to know about virtualization in one article. We have reviewed several use cases of virtualization and its advantages, deepened into the details of its work and even got acquainted with the best, in my opinion, solution for such tasks (KVM), we have even created and configured a VM.

It's important to understand that virtual machines are the bricks in huge buildings of modern cloud technologies. They are the ones that allow applications to grow automatically and unlimited in size, in the fastest way possible and maximally utilizing all the resources.

However powerful and rich in services was AWS, its base is virtual machines on top of Xen. Every time you create a new droplet on DigitalOcean, you create a VM. Practically all the websites you use are placed inside of virtual machines. The simplicity and flexibility of VMs allow not only to build production-systems but also to make local development and testing ten times easier, especially when a system uses a lot of components.

We have learned to create only one machine – that's not bad for testing one application. But what if we need several virtual machines at once? How would they communicate? How would they find each other? We will have to find out how networks work, how they work in terms of virtualization and what components are involved in this work and need configuring – in the next article.

Further reading