Requirement: Virtualization
- Hypervisor transposes device from host address space to VM address space
- IOMMU only accounts for DMA translation
- Virtualization required for:
- Mapping device MMIO and I/O port resources
- Translating interrupt address spaces
- Topology translations
- External dependencies
PCI Resource Mapping
PCI BARs
- Describes resource type and features
- MMIO vs I/O port, 32-bit, caching, etc
- Exposes resource size
- Locates resources within address space…
User cannot relocate physical resource,
BARs must be virtualized and mapped
into VM address space
PCI BAR Virtualization
- Pass-through type and feature bits
- Emulate BAR sizing interface
- Map resources into VM address space
- Host resource address remains static
Emulate the standard programming
model defined by PCI
Except…
Is PCI configuration space the only
way to access these registers?
Config Space Backdoors
- PCI defines the configuration address
space and register definition
- Register implementation is not defined
- Implementations may expose configuration
space registers via multiple address spaces
Example Backdoors
- Config space mirrors in MMIO BAR spaces
- Window/data registers to config space
through I/O port BARs
- VGA register hooks to BAR offsets
Result: PCI config space virtualization is not sufficient for all devices
Needed:
Infrastructure for virtualizing config space registers across all address spaces
Our first quirk…
VFIOQuirks
- QEMU-based quirks
- Infrastructure within VFIOPCIDevice
for handling device specific quirks
- Helpers to facilitate common mirrors
and windows to PCI config space
- Extensive use of MemoryRegions
- VFIOQuirks are sub-regions within
the default region mapping
Problem solved?
- KVM maps MMIO at PAGE_SIZE granularity
- Any quirk within an otherwise directly mapped
page traps the entire page through QEMU
What other registers reside within the page?
Case Study: NVIDIA
- MSI “ACK” required to retrigger interrupt
- MMIO write through BAR0 config mirror
- Increases MSI re-triggering latency
(partially addressed with ioeventfd handling)
But device resource mappings are not the only ranges needing virtualization…
Interrupt Address Space
Message Signaled Interrupts (MSI) trigger by writing pre-programmed data to a pre-programmed address
Both of these require address space virtualization
MSI is entirely configured through PCI configuration space, already virtualized…
MSI-X
Address/Data specified per vector within
MMIO BAR resident vector table
MSI-X Vector Table
- Resides within MMIO BAR
- Requires address space translation
- Opportunities for backdoors
- Page size virtualization granularity
- No performance critical registers within
same page as MSI-X structures
- Some architectures use 64KB page size
MSI-X Vector Table Recommendations
- Use an exclusive BAR for MSI-X data structures
- Or, allow ample alignment of structures (64KB)
- For existing devices, “x-msix-relocation”:
- QEMU vfio-pci device option
- Modifies MSI-X capability to relocate to new BAR or extend existing BAR for alignment
Topology & Firmware Considerations
- Does the device driver depend on or assume
PCI capability presence or offsets?
- Virtualization may change, move, or hide
- Does the device depend on other devices,
firmware tables, or reserved memory?
- Intel IGD (GVT-d): All of the above!
- Integrated vs Discrete mindset
Lessons
- Avoid “backdoors” to PCI config space
- Use VFIOQuirks to fill virtualization gaps
- Be mindful of virtualization granularity
- Avoid external dependencies and
hardwired assumptions
Corollary: Assigned device config space accesses may be slower than bare metal