What is the problem that is being solved?
Sometimes we want to share hardware. OSes and timesharing systems were built to solve this issue but generally have a large system area. So OSes are great for sharing hardware in a trusted environment among users that trusted each other. On the other hand, sometimes we want to guarantee 100% hardware isolation and want to run applications requiring different operating systems on the same hardware. The state of the art for this has significant overhead which renders it impractical for many use-cases. The authors notice that many of these overheads are not fundamental and can be overcome through a couple of trade-offs They then design a system that is able to run 100 VMs simultaneously with under 1% overhead.
What are the key results?
This paper has a lot of hidden treasures that are the state of the art for virtualization till this day. One of the contributions of Xen that had a big impact and that also delivered their biggest performance win was the way they implemented virtual memory. They noted that traditional systems such as VMWare virtualize memory through shadow pages. this has a lot of overhead for both reads and writes since the virtualization environment needs to constantly be involved. VMWare promises full virtualization and therefore has to do this. THe author notice that most of the time paravirtualization with ABI-compatibility is all that is necessary and this is which what Xen strives to achieve. They take advantage of this and modify the way guest OSes handle page tables. This comprises of a small change in all kernels they consider but provides immense gains. In their design Xen does not have to be involved for memory reads which can just be done by hardware. Xen is only involved in page table modifications which is fundamentally necessary (when there is no hardware support for virtualization and at the time there was none in x86). Unlike VMWare, allocating new address spaces for new processes is cheap in Xen since memory is initialized and filled in guest-managed memory region and after initialization is done it is yielded to Xen for further management.
Another important contribution of Xen is its goal to allow mutually distrustful guest OSes share hardware. In combination with minimal hardware overhead, this has ben the key enabler of modern IaaS cloud infrastructure.
What are some of the limitations and how might this work be improved?
Xen was design particularly for x86 but many of the ideas it proposed have grown beyond x86 and far beyond the scope it was meant for. The way Xen handles CPU, memory or IO device sharing, for example is quite universal and constitutes valuable contribution to the virtualization world in general. At times however, the paper overfits for the particular limitation of x86 of the time. The way it handles TLB misses, for example, is a nice hack but is not nearly as relevant as the rest of the presented content. I understand that at the time x86 was considered a never changing monolith but identifying ways in which hardware support could improve Xen instead of optimizing for particular hardware accidents could be more valuable knowledge contribution
How might this work have long term impact?
As mentioned earlier, Xen was one of teh key enablers of today’s cloud infrastructure. It seems aws could be built right after this paper was published without needing to read anything else and is a little surprising that they took another 3 years before launch)). Many of the things implemented in Xen are now provided by underlying hardware but the logical structure proposed by Xen is the bedrock of virtualization till now.