I spent a few days comparing various Hypervisors under the same workload and on the same hardware. This is a very specific workload and results might be different when testing oher workloads.
I wanted to share it here, because many of us run very modest Hardware and getting the most out of it is probably something others are interested in, too. I wanted to share it also because maybe someone finds a flaw in the configurations I ran, which might boost things up.
If you do not want to go to the post / read all of that, the very quick summary is, that XCP-ng was the quickest and KVM the slowest. There is also a summary at the bottom of the post with some graphs if that interests you. For everyone else who reads the whole post, I hope it gives some useful insights for your self-hosting endeavours.
Interesting article, but man, the random capitalization thing going on there is distracting as hell.
Xcp-ng might have the edge against bare metal because Windows uses virtualization by default uses Virtualization-Based Security (VBS). Under xcp-ng it can’t use that since nested virtualization can’t be enabled.
Disclaimer: I’m a maintainer of the control plane used by xcp-ng
Oooh, that explains it! I wondered what is going on. Thank you very much. And thank you for working on XCP-ng, it is a fantastic platform :-)
Not being an expert it seems as though your setup is windows-centric, whereas KVM tends to shine on linux.
Yes, it is Windows centric because that is where the workload is based on I need to run. It would be cool to see a similar comparison with a workload under Linux that puts strain on CPU, Memory and Disk.
What I am missing is ESXi/vSphere. Would be quite important for the few people that have access to the eval ressources to set it up.
Same for the BSD versions. I think Beehive?Sure, ESXi would have been interesting. I thought about that, but I did not test it because it is not interesting to me anymore from a business perspective. And I am not keen of using it in my Homelab, so I left that out and use that time to do something relaxing. It’s my holiday right now :-)
VMware’s predatory practices make wary of their products. I’m avoiding it for all new installs.
Totally fair. Just as a suggestion.
Why do you need vSphere for self hosting?
You ask for a “why deploy this [software]” in this community?
Anyway…Simply: Why not? =)
Proxmox does clustering and should have most of the same features. While you are welcome to run whatever you want I think vSphere is getting a bit pricey.
Not even just pricey, but unpurchasable in many cases. Broadcom is really fucking it up
They are just making a living.
In fact, there customers were stealing from them previously.
- unnamed VMware rep
I discovered about a few months ago that XCP-NG does not support NFS shares which was a huge dealbreaker for me. Additionally, my notes from my last test indicated that I could not mount existing drives without erasing them. I’m aware that I could have spun up a TrueNAS or other file sharing server to bypass this, but maybe not if the system won’t mount the drives in the first place so it can pass them to the TrueNAS . I also had issues with their xen-orchestra which I will talk about below shortly. They also at the time, used an out of date CentOS build which unless I’m missing something, is no longer supported under that branding.
For the one test I did which was for a KVM setup, was my Home Assistant installation, I have that running in Proxmox and ccomparativelyit did seem to run faster than my Proxmox instance does. But that may be attributed to Home Assistant being the sole KVM on the system and no other services running (Aside from XCP-NG’s).
Their Xen-Orchestra for me was a bit frustrating to install as well, and being locked behind a 14 day trial for some of the services was a drawback for me. They are working on the front end gui to negate the need for this I believe, but the last time I tried to get things to work, it didn’t let me access it.
I had a rough start with XCP-ng too. One issue I had was the NIC in my OptiPlex, which worked… but was super slow. So the initial installation of the XO VM (to manage XCP-ng) took over an hour. After using a USB NIC with another Realtek Chip, Networking was no issue anymore.
For management, Xen-Orchestra can be self-built and it is quite easy and works mostly without any additional knowledge / work if you know the right tools. Tom Lawrence posted a Video I followed and building my own XO is now quite easy and quick (sorry for being a YT link): https://www.youtube.com/watch?v=fuS7tSOxcSo
What are your disk settings for the KVM environments? We use KVM at work and found that the default configuration loses you a lot of performance on disk operations.
Switching from SATA to SCSI driver, and then enabling queues (set the number equal to your number of cores) dramatically speeds up all disk operations, large and small.
On mobile right now but I’ll try to add some links to the KVM docs later.
That’s a very good question. The testsystem is running Apache Cloudstack with KVM at the moment and I have yet to figure out how to see which Disk / Controller mode the VM is using. I will dig a bit to see if I can find out. Would be interesting if it is not SCSI to re-run the tests.
Edit: I did a ‘virsh dumpxml <vmname>’ and the Disk Part looks like this:
<devices> <emulator>/usr/bin/qemu-system-x86_64</emulator> <disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none'/> <source file='/mnt/0b89f7ac-67a7-3790-9f49-ad66af4319c5/8d68ee83-940d-4b68-8b28-3cc952b45cb6' index='2'/> <backingStore/> <target dev='sda' bus='sata'/> <serial>8d68ee83940d4b688b28</serial> <alias name='sata0-0-0'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk>
It is SATA… now I need to figure out how to change that configuration ;-)
I am only messing with KVM/libvirt for fun so i have no professional experience in this but wouldn’t you want to use virtio disks for best performance?
I am neither working professionally in that field. To answer your question: Of course I would use whatever gives me the best performance. Why it is set like this is beyond my knowledge. What you basically do in Apache Cloudstack when you do not have a Template yet is: You upload an ISO and in this process you have to tell ACS what it is (Windows Server 2022, Ubuntu 24 etc.). From my understanding, those pre-defined OS you can select and “attach” to an ISO seem to include the specifics for when you create a new Instance (VM) in ACS. And it seems to set the Controller to SATA. Why? I do not know. I tried to pick another OS (I think it was called Windows SCSI), but in the end it ended up still being a VM with the disks bound to the SATA controller, despite the VM having an additional SCSI controller that was not attached to anything.
This can probably be fixed on the commandline, but I was not able to figure this out yesterday when I had a bit spare time to tinker with it again. I would like to see if this makes a big difference in that specific workload.
Unfortunately I’m not very familiar with Cloudstack or Proxmox; we’ve always worked with KVM using virt-manager and Cockpit.
Our usual method is to remove the default hard drive, reattach the qcow file as a SCSI device, and then we modify the SCSI controller that gets created to enable queuing. I’m sure at some point I should learn to do all this through the command line, but it’s never really been relevant to do so.
The relevant sections look like this in one our prod VMs:
<disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/var/lib/libvirt/images/XXX.qcow2' index='1'/> <backingStore/> <target dev='sdb' bus='scsi'/> <alias name='scsi0-0-0-1'/> <address type='drive' controller='0' bus='0' target='0' unit='1'/> </disk>
<controller type='scsi' index='0' model='virtio-scsi'> <driver queues='6'/> <alias name='scsi0'/> <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> </controller>
The driver queues=‘X’ line is the part you have to add. The number should equal the number of cores assigned to the VM.
See the following for more on tuning KVM:
- https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/7/epub/virtualization_tuning_and_optimization_guide/sect-virtualization_tuning_optimization_guide-blockio-multi-queue_virtio-scsi#sect-Multiqueue_virtio-scsi
- https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/7/epub/virtualization_tuning_and_optimization_guide/sect-virtualization_tuning_optimization_guide-blockio-multi-queue_virtio-scsi#sect-Virtualization_Tuning_Optimization_Guide-Introduction-7_Improvements
Thank you very much. I spent another two hours yesterday reading up on that and creating other VMs and Templates, but I was not able yet to attach the Boot disk to a SCSI controller and make it boot. I would really liked to see if this change would bring it on-par with Proxmox (I wonder now what the defaults for Proxmox are), but even then, it would still be much slower than with Hyper-V or XCP-ng. If I find time, I will look into this again.
I’d suggest maybe testing with a plain Debian or Fedora install. Just enable KVM and install virt-manager, and create the environment that way.
I just can’t figure out how to create a VM in ACS with SCSI controllers. I am able to add a SCSI controller to the VM, but the Boot Disk is always connected to the SATA controller. I tried to follow this thread (https://lists.apache.org/thread/op2fvgpcfcbd5r434g16f5rw8y83ng8k) and create a Template, and I am sure I am doing something wrong, but I just cannot figure it out :-(
Kinda surprised to see XCP topping the charts, time to benchmark my server currently running Proxmox I guess.
It would be cool to see how linux centric workloads behave on those Hypervisors. Juuust in case you plan to invest some time into that ;-)
Interesting stuff, thanks.