KVM QEMU Guest Performance Tuning

For over 3 years, I have been trying to run windows APPs and games in windows guest on linux host. I was not satisfied with the performance until very recently. This is definitely a long journey but it’s a great challenge. I would like to share all the things I have learned along the way.

A little background about this whole thing. I play PUBG (FPS game) competitively. PUBG is a great game, but it has horrible optimization. The result is that PUBG is the most sensitive game on the market to CPU, cache and memory performance. And I play it competitively so I would like to have a stable and high refresh rate. In a high-quality game like scrim or competition, in late stage of the game when there are around 50 players within 1 kilometer around you, the cpu load is so immense that no CPU as of today (Intel Aderlake and Ryzen zen 3) and make it running stable at 160hz. So, every bit of performance matters here. This poses an excellent challenge for optimization of CPU performance. Another side of the challenge is that QEMU does not have a great documentation. I found it difficult to get useful information from the internet.

For example, there is a QEMU emulated device that I learned from a reddit post, but when I tried googling the details about this device, I got nothing. It almost seemed like this thing did not exist. QEMU is a relatively new and great tool. However, it sometimes still can be a mystery to many enthusiasts like you and me.

Before we get into the details, I want to share a few key observations if you have some experience with QEMU and just want to get the TLDR version:

  • I have tried both AMD (Zen 2 & Zen 3) and Intel (Alderlake) CPUs. Intel shows less performance degradation compared to AMD in KVM QEMU windows guest compared to bare metal (Linux Kernel 5.18 + qemu 7.0.0). This might be a misconception that I have despite my effort, but I have seen other people share similar thoughts. (Note the performance here is mostly about cache latency. If you run other logic operations that’s not sensitive to cache AMD CPUs are actually great)
  • For QEMU Audio. If you have a USB audio interface, you won’t want to pass it to the guest via a QEMU emulated USB controller. As audio workload requires very tight timing (CPU cycles) and QEMU USB device is not great for such a workload. You will have to either: a). Pass it through via VFIO PCIe for no-comprise audio (pass through the entire USB controller if you have multiple of them in your hardware system) or b). Pass it into Linux host via QEMU -audiodev through PulseAudio or other audio backend with limited sample rate (I remember it’s something like 16bit 48Khz or something, subject to improvement though).
  • CPU isolation and CPU pinning. CPU pinning is a must. CPU pinning will lock the physical CPU thread to the virtual CPU thread emulated via QEMU. Otherwise, the guest CPU scheduler will not know how to best allocate jobs to CPU threads. There are several ways to achieve this. You need to first understand the representation of your CPU topology/NUMA layout (Different hardware combinations will yield slightly different representations in CPU ids). All the methods I know involve leveraging Cgroups directly or indirectly.
  • Guess where the workload of the QEMU emulator is assigned to? Yes, it’s assigned to the CPUs that you allocated for the guest. Why is this a big deal? In most situations this is not a big deal. However, if you want every last bit of CPU consistency and uninterrupted CPU cycles, like playing PUBG in windows guest, this is a very big deal. Without properly addressing this you will get stutters in your demanding games as the precious CPU cycles have to take care of the emulator workload every few seconds. One way that I find helpful to relieve this situation is to add dedicated IOthreads for your IO workload.

If you are ready for the detailed methods, please visit this git repo. You might want to focus on `vfio.txt` and `vm.sh` files first. I have tested this on Archlinux, Fedora and ubuntu. I expect the implementations are similar on different distros except a few commands.

I also want to share some helpful readings:

  • CPU isolation: https://www.suse.com/c/cpu-isolation-practical-example-part-5/
  • QEMU command line guide: https://archive.fosdem.org/2018/schedule/event/vai_qemu_jungle/

Feel free to discuss on the github repo or leave questions for me. I wish you a good journey.


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *