Andrey Savochkin leads the development of the kernel portion of OpenVZ, an operating system-level server virtualization solution. In this interview, Andrey offers a thorough explanation of what virtualization is and how it works. He also discusses the differences between hardware-level and operating system-level virtualization, going on to compare OpenVZ to VServer, Xen and User Mode Linux.
Andrey is now working to get OpenVZ merged into the mainline Linux kernel explaining, "virtualization makes the next step in the direction of better utilization of hardware and better management, the step that is comparable with the step between single-user and multi-user systems." The complete OpenVZ patchset weighs in at around 70,000 lines, approximately 2MB, but has been broken into smaller logical pieces to aid in discussion and to help with merging.
Jeremy Andrews: Please share a little about yourself and your background...
Andrey Savochkin: I live in Moscow, Russia, and work for SWsoft. My two major interests in life are mathematics and computers, and I was unable to decide for a long time which one I preferred.
I studied in Moscow State University which has a quite strong mathematical school, and got M.Sc. degree in 1995 and Ph.D. degree in 1999. The final decision between mathematics and computers came at the time of my postgraduate study, and my Ph.D. thesis was completely in the computer science area, exploring some security aspects of operating systems and software intended to be used on computers with Internet access.
Jeremy Andrews: What is your involvement with the OpenVZ project?
Andrey Savochkin: The OpenVZ project has kernel and userspace parts. For the kernel part, we have been using the a development model close to the model of the mainstream Linux kernel, and for a long time I accumulated and reviewed OpenVZ kernel patches and prepared "releases". Certainly, I've been contributing a lot of code to OpenVZ.
Jeremy Andrews: What do you mean when you say that your development model is close to the kernel development model?
Andrey Savochkin: Linux kernel development model implies that the developers can't directly add their changes to the main code branch, but publish their changes. Other developers can review and provide comments, and, more importantly, there is a dedicated person who reviews all the changes, asks for corrections or clarifications, and finally incorporates the changes into the main code branch. This model is extremely rare in producing commercial software, and in the open source software world only some projects use it. Linux kernel has been using this model from the beginning quite effectively.
In my opinion, this model is very valuable for software that has high reliability requirements and, at the same time, is complex and difficult to debug by traditional means (such as debuggers, full state dump on failure, and so on).
Jeremy Andrews: OpenVZ is described as an "Operating System-level server virtualization solution". What does this mean?
Andrey Savochkin: First, it is a virtualization solution, that is, it enables multiple environments (compartments) on a single physical server, and each environment looks like and provides the same functionality as a dedicated server. We call these environments Virtual Private Servers (VPSs), or Virtual Environments (VEs). VPSs on a single physical server are isolated from each other, and also they are isolated from the physical hardware. Isolation from the hardware allows to implement on top of OpenVZ an automated migration of VPSs between servers that does not require any reconfiguration for running the VPSs on a very different hardware. A fair and efficient resource management mechanism is also included, as one of the most important components for a virtualization solution.
Second, OpenVZ is an operating system-level solution, virtualizing access to the operating system, not to the hardware. There are many well-known hardware-level virtualization solutions, but operating system-level virtualization architecture gives many advantages over them. OpenVZ has better performance in some areas, considerably better scalability and VPS density, and provides unique management options in comparison with hardware-level virtualization solutions.
Jeremy Andrews: How many VPSs can you have on one piece of hardware?
Andrey Savochkin: That depends on the hardware and the "size" of VPSs and applications in them. For experimental purposes OpenVZ can run hundreds of small VPSs at the same time; in production environment -- tens of VPSs. Virtuozzo has higher density and can run hundreds production VPSs.
Jeremy Andrews: When you talk about the migration of VPSs between servers, do you mean that a VPS can be running on one server and then migrate to another server where it will continue running, somewhat like a cluster?
Andrey Savochkin: OpenVZ VPS will be stopped and started again, so there will be some downtime. But this migration doesn't require any reconfiguration or other manual intervention related to IP addresses, drivers, partitions, device names or anything else. That means in the first place that taking hardware offline for maintenance or upgrade, replacement of hardware and similar things become much more painless, and this is a certain advantage of virtualization. Then, since OpenVZ allows to fully automate manipulations with VPS as a whole, it makes implementation of load balancing (as well as fail-over and other features of clustering) more easy.
Virtuozzo has additional functionality called Zero-Downtime Migration. It provides the ability to migrate a VPS from one server to another without downtime, without restart of processes and preserving network connections. This functionality will be released as part of OpenVZ in April.
Jeremy Andrews: Can you explain how the resource management mechanism works?
Andrey Savochkin: In virtualization solutions resource management has two main requirements. First, it should cover enough resources to provide good isolation and security (and the isolation and security properties of resource management are one of the main differentiators between OpenVZ and VServer). Next, resource management should be flexible enough to allow high utilization of hardware when the resource demands of VPSs or virtual machines change.
OpenVZ resource management operates the following resource groups:
Each group may have multiple resources, like low memory and high memory, or disk blocks and disk inodes. Resource configuration can be specified in terms of upper limits (which may be soft or hard limits, and impose an upper boundary on the consumption of the corresponding resource), in terms of shares (or weights) for resource distribution, or in terms of guarantees (the amount of resources guaranteed no matter what other VPSs are doing).
Jeremy Andrews: What are some common uses of server virtualization?
Andrey Savochkin: Just examples are:
Server consolidation -- moving the content of multiple servers into VPSs on a single server to reduce management (and hardware) costs.
Disaster Recovery -- providing redundant environments for replication and fast data and application recovery.
Improving server security -- by creating multiple VPSs and moving different services (HTTP, FTP, mail) into different VPSs.
Creation of multiple environments and replication of environments for software testing and development.
Hosting -- hosting service providers use Virtuozzo/OpenVZ to bridge the gap between and exceed shared and dedicated services. Typical Virtuozzo/OpenVZ based hosting services include VPSs and Dynamic Servers which provide isolation, root access and guaranteed and burstable resources to customers.
Jeremy Andrews: What prevents multiple operating systems running on the same server using OpenVZ from affecting each other?
Isolation between multiple VPSs consists of
Let's first speak about separation of processes and similar objects. There are two possible approaches to this separation: access control and separation of namespace. The former means that when someone tries to access an object, the kernel checks whether he has access rights; the latter means that objects live in completely different spaces (for example, per-VPS lists), do not have pointers to objects in spaces other than their own and, thus, nobody can get access to objects to which he isn't supposed to get the access.
OpenVZ uses both of these two approaches, choosing the approaches so that they do not reduce performance and efficiency and do not degrade isolation.
In the theory of security, there are strong arguments in favor of both of these approaches. For a long period of time different military and national security agencies in their publications and solutions preferred the first approach, accompanying it with logging. Many authors on different occasions advocate for the second approach. In our specific task, virtualization of the Linux kernel, I believe that the most important step is to identify the objects that need to be separated, and this step is absolutely same for both approaches. However, depending on the object type and data structures these two approaches differ in performance and resource consumption. For search in long lists, for example, namespace separation is better, but for large hash tables access control is better. So, the way the isolation is implemented in OpenVZ provides both safety and efficiency.
Resource control is the other very important part of VPS isolation.
Jeremy Andrews: When relying on namespace separation, what prevents a process in one VPS from writing to a random memory address that just happens to be used by another VPS?
Andrey Savochkin: Processes can't access physical memory at random addresses. They only have their virtual address space and, additionally, can get access to some named objects: processes identified by a numeric ID, files identified by their path and so on. The idea of namespace separation is to make sure that a process can identify only those objects that it is authorized to access. For other objects, the process won't get "permission denied" error, it will be unable to see them instead.
Jeremy Andrews: Can you explain a little about how resource control provides virtual private server isolation?
Andrey Savochkin: Resource control is very related to resource management. It ensures that one VPS can't harm others through excessive use of some resources. If one VPS had been able to easily take down the whole server by exhausting some system resource, we couldn't say that VPSs are really isolated from each other. Implementing resource control, we in OpenVZ tried to prevent not only situations when one VPS can bring down the whole server, but also possibilities to cause significant performance drop for other VPSs.
One of part of resource control is accounting and management of CPU, memory, disk quota, and other resources used by each VPS. The other part is virtualization of system-wide limits. For instance, Linux provides a system-wide limit on the number of IPC shared memory segments. For complete isolation, this limit should apply to each VPS separately - otherwise, one VPS can use all IPC segments and other VPS will get nothing. But certainly, most difficult part of resource control is accounting and management of resources like CPU and system memory.
Jeremy Andrews: How does OpenVZ improve upon other virtualization projects, such as VServer?
Andrey Savochkin: First of all, OpenVZ is a completely different project than VServer and has different code base.
OpenVZ has bigger feature set (including, for example, netfilter support inside VPSs) and significantly better isolation, Denial-of-Service protection and general reliability. Better isolation and DoS protection comes from OpenVZ resource management system, which includes hierarchical CPU scheduler and User Beancounter patch to control the usage of memory and internal kernel objects. Also, we've invested a lot of efforts in the creation of the system of quality assurance, and now we have people who manually test OpenVZ as well as a large automated testing system.
Virtuozzo, a virtualization solution built on the same core as OpenVZ, provides much more features, has better performance characteristics and includes many additional management capabilities and tools.
Jeremy Andrews: What are some examples of hardware-level virtualization solutions?
Andrey Savochkin: VMware, Xen, User Mode Linux.
Jeremy Andrews: How does OpenVZ compare to Xen?
Andrey Savochkin: OpenVZ has certain advantages over Xen.
OpenVZ allows to utilize system resources such as memory and disk space much more efficiently, and because of that has better performance on memory-critical workloads. OpenVZ does not run separate kernel in each VPS and saves memory on kernel internal data. However, even bigger efficiency of OpenVZ comes from dynamic resource allocation. Using Xen, you need to specify in advance the amount of memory for each virtual machine and create disk device and filesystem for it, and your abilities to change settings later on the fly are very limited. When running multiple VPSs, at each moment some VPSs are handling load burst and are busy, some are less busy and some are idle, hence the dynamic assignment of resources in OpenVZ can significantly improve the utilization of resources. With Xen, you have to slice the server for the worst-case scenario and maximal resource usage by each VPS; with OpenVZ you usually can slice basing on average usages.
OpenVZ provides more management capabilities and management tools. To start, OpenVZ has from out of the box ability to immediately create VPSs based on various Linux distributions, without preparation of disk images, installing hundreds of packages and so on. But most importantly, OpenVZ has the ability to access files and start from the host system programs inside VPS. It means that a damaged VPS (having lost network access or unbootable) can be easily repaired from the host system, and that a lot of operations related to management, configuring or software upgrade inside VPSs can be easily scripted and executed from the host system. In short, managing Xen virtual machines is like managing separate servers, but managing a group of VPSs on one computer is more like managing a single multi-user server.
Operating system inside Xen virtual machine is not necessarily able to use all capabilities of the hardware; for instance, support of SMP and more that 4GB of RAM inside virtual machines will appear only in Xen 3.0. OpenVZ is as scalable as Linux when hardware capabilities increase. SMP and more than 4GB have been supported in OpenVZ from the very beginning. Recently we've built OpenVZ for x86_64 platform, and it was a straightforward job not requiring going into architecture details. So, OpenVZ is far more hardware independent than Xen, and hence is able to start to use new hardware capabilities much faster.
There is one point where Xen will have certain advantage over OpenVZ. In version 3.0, Xen is going to allow to run Windows virtual machines on Linux host system (but it isn't possible in the stable branch of Xen).
Again, I need to note that the above describes my opinion about the main differences between OpenVZ and Xen. Virtuozzo has many additions to OpenVZ, and, for instance, there is Virtuozzo for Windows solution.
Jeremy Andrews: How does OpenVZ compare to User Mode Linux?
What I've said before about advantages of OpenVZ over Xen also apply when OpenVZ is compared with User Mode Linux.
The unique feature of User Mode Linux is that you can run it under standard debuggers for studying Linux kernel in depth. In other aspects, User Mode Linux does not have as many features as Xen, and Xen is superior in performance and stability.
Jeremy Andrews: Is OpenVZ portable? That is, can we expect to see the technology ported to other kernels?
Andrey Savochkin: Well, OpenVZ is portable between different Linux kernels (but the amount of efforts to port between 2 kernels certainly depends on how different the kernels are). On our FTP there are OpenVZ ports to SLES 10, Fedora Core 5 kernels. The ideas of OpenVZ are broadly portable, and we even had them implemented on FreeBSD kernel (but by now this FreeBSD port has been dropped).
Why was the FreeBSD port dropped?
We decided to focus on Linux version to implement new ideas as fast as possible.
Jeremy Andrews: How widely used is OpenVZ?
Andrey Savochkin: OpenVZ in its current form has just been released to the public, but we've already got considerable number of downloads (and questions). Virtuozzo, a superset of OpenVZ, already has a large number of installations. I'd estimate that currently 8,000+ servers with 400,000 VPSs on them run Virtuozzo/OpenVZ code.
Jeremy Andrews: Is there any plan to try and get OpenVZ merged into the mainline Linux kernel?
Andrey Savochkin: Yes, we'd like to get it merged into the mainstream Linux and are working in that direction. Virtualization makes the next step in the direction of better utilization of hardware and better management, the step that is comparable with the step between single-user and multi-user systems. Virtualization will become more demanded with the growth of hardware capabilities, such as multi-core systems that are currently in the Intel roadmap. So, I believe that when OpenVZ is merged into the mainstream, Linux will instantly become more attractive and more convenient in many usage scenarios. That's why I think OpenVZ project is so interesting project, and that's why I've invested so much of my time into it.
Jeremy Andrews: How large are the changes required in the Linux kernel to support OpenVZ? Can they be broken into small logical pieces?
Andrey Savochkin: The current size of the OpenVZ kernel patch is about 2MB (70,000 lines). This size is not small, but it is less than 10% of the average size of the changes between minor versions in 2.6 kernel branch (e.g., 2.6.12 to 2.6.13). OpenVZ patch split into major parts is presented here. OpenVZ code can also be viewed and downloaded from GIT repository at http://git.openvz.org/. One of the large parts (about 25%) is various stability fixes, which we are submitting to the mainstream. Then comes virtualization itself, general management of resources, CPU scheduler, and so on.
What efforts have been made so far to try and get OpenVZ merged into the kernel?
OpenVZ patch was split into smaller pieces, easier for us to explain and for the community to accept. Then, in the last couple of months, some virtualization pieces have been send to the linux-kernel mailing list and actively discussed there.
The biggest argument was whether we want "partial" virtualization, when VPSs can have, for example, isolated network but common filesystem space. In my personal opinion, in some perfect world such partial virtualization would be ok. But in real life, subsystems of Linux kernel have a lot of dependencies on each other: every subsystem interacts with proc filesystem, for example. Virtualization is cheap, so its easier to to have complete isolation, both from the implementation point of view and then for use and management of VPSs by users.
The process of submitting OpenVZ patches into the mainstream keeps going. Also, we are working with SuSE, RedHat (RHEL and Fedora Core), Xandros, and Mandriva to include OpenVZ in their distributions and make it available and well supported for maximum number of users.
What do you think is the biggest obstacle that could keep OpenVZ from being merged into the mainline Linux kernel?
I don't see any serious obstacles. OpenVZ code is available, its functionality has been proven to be very useful - I think it is now running on 8,000+ servers. So, it is just a matter of continuing the discussion to make everyone involved agree what exactly we want to have in Linux and how technically we want to organize these new capabilities.
Jeremy Andrews: You've referred to OpenVZ as a subset of Virtuozzo. What is Virtuozzo, and what does it add over OpenVZ?
Andrey Savochkin: OpenVZ is SWsoft's contribution to the community. Virtuozzo is a commercial product, built on the same core backend, with many additional features and management tools.
Virtuozzo provides much more efficient resource sharing through VZFS filesystem, and better scalability and higher VPS per node density because of that; new generation resource and service level management; different system of OS and application templates; tools for VPS migration between nodes and for conversion of a dedicated server into a VPS; monitoring, statistics and traffic accounting tools; additional management APIs and various GUI and Web-based tools, including self-management and recovery tools for VPS users and owners.
Are there plans to eventually release any of this additional functionality under the GPL?
Andrey Savochkin: SWsoft, the company that I work for, is very positive about Open Source movement, and has been contributing a lot of code to the Open Source. OpenVZ is a big piece of code contributed to the community, and people working for our company have submitted many fixes to the mainstream Linux kernel not related to OpenVZ. I believe it is very likely that many parts of our additional code working on top of OpenVZ will eventually be also released under GPL.
Why was a subset of Virtuozzo, OpenVZ, released as open source?
The company believes that this code should belong to the community. The strength of Linux is that innovations spread fast and become available to everyone, and we should be in line with it. Moreover, we believe that virtualization must and will become a part of the OS, and we want to speed up this process.
When it comes to the kernel parts of the code, GPL license just requires them to be released under GPL.
Jeremy Andrews: How many people from SWsoft are working on OpenVZ?
I don't think anyone at SWsoft works on OpenVZ 100% of his time. But, I guess, 15 to 20 people from SWsoft have made significant contributions to OpenVZ.
Do all improvements to Virtuozzo that could also benefit OpenVZ get merged into OpenVZ?
If some improvements were made in a course of Virtuozzo development but belong to the OpenVZ part, they would certainly be released. Everything that is related to core functionality, to virtualization, isolation and protection between VPSs is immediately pushed from Virtuozzo to OpenVZ.
Jeremy Andrews: What other kernel projects have you contributed to?
Andrey Savochkin: Well, I've been contributing to the Linux kernel here and there from 1996. Historically, the area where I contributed most code was networking, including TCP, routing and other parts.
What are some examples of the networking code that you've contributed?
Andrey Savochkin: Many pieces here and there. I maintained eepro100 driver for some time, I wrote inetpeer cache, contributed some pieces to window management algorithm and MTU discovery in TCP, to routing code, and so on.
Well, OpenVZ and especially its resource management part will be another my major contribution.
Jeremy Andrews: How do you enjoy spending your free time when you're not working on OpenVZ?
Andrey Savochkin: I like reading and read a lot. In music, I'm very fond of the Baroque period and try to attend every such concert in Moscow. When I have time for a longer vacation, I enjoy diving.
Jeremy Andrews: Thanks for all your time in answering my questions!