forked from U-interrupt/qemu
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into sta…
…ging virtio,pc,pci: features, cleanups, fixes vhost-user enabled on non-linux systems beginning of nvme sriov support bigger tx queue for vdpa virtio iommu bypass FADT flag to detect legacy keyboards Fixes, cleanups all over the place Signed-off-by: Michael S. Tsirkin <[email protected]> # gpg: Signature made Mon 07 Mar 2022 22:43:31 GMT # gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469 # gpg: issuer "[email protected]" # gpg: Good signature from "Michael S. Tsirkin <[email protected]>" [full] # gpg: aka "Michael S. Tsirkin <[email protected]>" [full] # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: (47 commits) hw/acpi/microvm: turn on 8042 bit in FADT boot architecture flags if present tests/acpi: i386: update FACP table differences hw/acpi: add indication for i8042 in IA-PC boot flags of the FADT table tests/acpi: i386: allow FACP acpi table changes docs: vhost-user: add subsection for non-Linux platforms configure, meson: allow enabling vhost-user on all POSIX systems vhost: use wfd on functions setting vring call fd event_notifier: add event_notifier_get_wfd() pci: drop COMPAT_PROP_PCP for 2.0 machine types hw/smbios: Add table 4 parameter, "processor-id" x86: cleanup unused compat_apic_id_mode vhost-vsock: detach the virqueue element in case of error pc: add option to disable PS/2 mouse/keyboard acpi: pcihp: pcie: set power on cap on parent slot pci: expose TYPE_XIO3130_DOWNSTREAM name pci: show id info when pci BDF conflict hw/misc/pvpanic: Use standard headers instead headers: Add pvpanic.h pci-bridge/xio3130_downstream: Fix error handling pci-bridge/xio3130_upstream: Fix error handling ... Signed-off-by: Peter Maydell <[email protected]> # Conflicts: # docs/specs/index.rst
- Loading branch information
Showing
66 changed files
with
1,229 additions
and
174 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1819,7 +1819,6 @@ F: docs/specs/acpi_hw_reduced_hotplug.rst | |
|
||
ACPI/VIOT | ||
M: Jean-Philippe Brucker <[email protected]> | ||
R: Ani Sinha <[email protected]> | ||
S: Supported | ||
F: hw/acpi/viot.c | ||
F: hw/acpi/viot.h | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,115 @@ | ||
PCI SR/IOV EMULATION SUPPORT | ||
============================ | ||
|
||
Description | ||
=========== | ||
SR/IOV (Single Root I/O Virtualization) is an optional extended capability | ||
of a PCI Express device. It allows a single physical function (PF) to appear as multiple | ||
virtual functions (VFs) for the main purpose of eliminating software | ||
overhead in I/O from virtual machines. | ||
|
||
Qemu now implements the basic common functionality to enable an emulated device | ||
to support SR/IOV. Yet no fully implemented devices exists in Qemu, but a | ||
proof-of-concept hack of the Intel igb can be found here: | ||
|
||
git://github.com/knuto/qemu.git sriov_patches_v5 | ||
|
||
Implementation | ||
============== | ||
Implementing emulation of an SR/IOV capable device typically consists of | ||
implementing support for two types of device classes; the "normal" physical device | ||
(PF) and the virtual device (VF). From Qemu's perspective, the VFs are just | ||
like other devices, except that some of their properties are derived from | ||
the PF. | ||
|
||
A virtual function is different from a physical function in that the BAR | ||
space for all VFs are defined by the BAR registers in the PFs SR/IOV | ||
capability. All VFs have the same BARs and BAR sizes. | ||
|
||
Accesses to these virtual BARs then is computed as | ||
|
||
<VF BAR start> + <VF number> * <BAR sz> + <offset> | ||
|
||
From our emulation perspective this means that there is a separate call for | ||
setting up a BAR for a VF. | ||
|
||
1) To enable SR/IOV support in the PF, it must be a PCI Express device so | ||
you would need to add a PCI Express capability in the normal PCI | ||
capability list. You might also want to add an ARI (Alternative | ||
Routing-ID Interpretation) capability to indicate that your device | ||
supports functions beyond it's "own" function space (0-7), | ||
which is necessary to support more than 7 functions, or | ||
if functions extends beyond offset 7 because they are placed at an | ||
offset > 1 or have stride > 1. | ||
|
||
... | ||
#include "hw/pci/pcie.h" | ||
#include "hw/pci/pcie_sriov.h" | ||
|
||
pci_your_pf_dev_realize( ... ) | ||
{ | ||
... | ||
int ret = pcie_endpoint_cap_init(d, 0x70); | ||
... | ||
pcie_ari_init(d, 0x100, 1); | ||
... | ||
|
||
/* Add and initialize the SR/IOV capability */ | ||
pcie_sriov_pf_init(d, 0x200, "your_virtual_dev", | ||
vf_devid, initial_vfs, total_vfs, | ||
fun_offset, stride); | ||
|
||
/* Set up individual VF BARs (parameters as for normal BARs) */ | ||
pcie_sriov_pf_init_vf_bar( ... ) | ||
... | ||
} | ||
|
||
For cleanup, you simply call: | ||
|
||
pcie_sriov_pf_exit(device); | ||
|
||
which will delete all the virtual functions and associated resources. | ||
|
||
2) Similarly in the implementation of the virtual function, you need to | ||
make it a PCI Express device and add a similar set of capabilities | ||
except for the SR/IOV capability. Then you need to set up the VF BARs as | ||
subregions of the PFs SR/IOV VF BARs by calling | ||
pcie_sriov_vf_register_bar() instead of the normal pci_register_bar() call: | ||
|
||
pci_your_vf_dev_realize( ... ) | ||
{ | ||
... | ||
int ret = pcie_endpoint_cap_init(d, 0x60); | ||
... | ||
pcie_ari_init(d, 0x100, 1); | ||
... | ||
memory_region_init(mr, ... ) | ||
pcie_sriov_vf_register_bar(d, bar_nr, mr); | ||
... | ||
} | ||
|
||
Testing on Linux guest | ||
====================== | ||
The easiest is if your device driver supports sysfs based SR/IOV | ||
enabling. Support for this was added in kernel v.3.8, so not all drivers | ||
support it yet. | ||
|
||
To enable 4 VFs for a device at 01:00.0: | ||
|
||
modprobe yourdriver | ||
echo 4 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs | ||
|
||
You should now see 4 VFs with lspci. | ||
To turn SR/IOV off again - the standard requires you to turn it off before you can enable | ||
another VF count, and the emulation enforces this: | ||
|
||
echo 0 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs | ||
|
||
Older drivers typically provide a max_vfs module parameter | ||
to enable it at load time: | ||
|
||
modprobe yourdriver max_vfs=4 | ||
|
||
To disable the VFs again then, you simply have to unload the driver: | ||
|
||
rmmod yourdriver |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,200 @@ | ||
ACPI ERST DEVICE | ||
================ | ||
|
||
The ACPI ERST device is utilized to support the ACPI Error Record | ||
Serialization Table, ERST, functionality. This feature is designed for | ||
storing error records in persistent storage for future reference | ||
and/or debugging. | ||
|
||
The ACPI specification[1], in Chapter "ACPI Platform Error Interfaces | ||
(APEI)", and specifically subsection "Error Serialization", outlines a | ||
method for storing error records into persistent storage. | ||
|
||
The format of error records is described in the UEFI specification[2], | ||
in Appendix N "Common Platform Error Record". | ||
|
||
While the ACPI specification allows for an NVRAM "mode" (see | ||
GET_ERROR_LOG_ADDRESS_RANGE_ATTRIBUTES) where non-volatile RAM is | ||
directly exposed for direct access by the OS/guest, this device | ||
implements the non-NVRAM "mode". This non-NVRAM "mode" is what is | ||
implemented by most BIOS (since flash memory requires programming | ||
operations in order to update its contents). Furthermore, as of the | ||
time of this writing, Linux only supports the non-NVRAM "mode". | ||
|
||
|
||
Background/Motivation | ||
--------------------- | ||
|
||
Linux uses the persistent storage filesystem, pstore, to record | ||
information (eg. dmesg tail) upon panics and shutdowns. Pstore is | ||
independent of, and runs before, kdump. In certain scenarios (ie. | ||
hosts/guests with root filesystems on NFS/iSCSI where networking | ||
software and/or hardware fails, and thus kdump fails), pstore may | ||
contain information available for post-mortem debugging. | ||
|
||
Two common storage backends for the pstore filesystem are ACPI ERST | ||
and UEFI. Most BIOS implement ACPI ERST. UEFI is not utilized in all | ||
guests. With QEMU supporting ACPI ERST, it becomes a viable pstore | ||
storage backend for virtual machines (as it is now for bare metal | ||
machines). | ||
|
||
Enabling support for ACPI ERST facilitates a consistent method to | ||
capture kernel panic information in a wide range of guests: from | ||
resource-constrained microvms to very large guests, and in particular, | ||
in direct-boot environments (which would lack UEFI run-time services). | ||
|
||
Note that Microsoft Windows also utilizes the ACPI ERST for certain | ||
crash information, if available[3]. | ||
|
||
|
||
Configuration|Usage | ||
------------------- | ||
|
||
To use ACPI ERST, a memory-backend-file object and acpi-erst device | ||
can be created, for example: | ||
|
||
qemu ... | ||
-object memory-backend-file,id=erstnvram,mem-path=acpi-erst.backing,size=0x10000,share=on \ | ||
-device acpi-erst,memdev=erstnvram | ||
|
||
For proper operation, the ACPI ERST device needs a memory-backend-file | ||
object with the following parameters: | ||
|
||
- id: The id of the memory-backend-file object is used to associate | ||
this memory with the acpi-erst device. | ||
- size: The size of the ACPI ERST backing storage. This parameter is | ||
required. | ||
- mem-path: The location of the ACPI ERST backing storage file. This | ||
parameter is also required. | ||
- share: The share=on parameter is required so that updates to the | ||
ERST backing store are written to the file. | ||
|
||
and ERST device: | ||
|
||
- memdev: Is the object id of the memory-backend-file. | ||
- record_size: Specifies the size of the records (or slots) in the | ||
backend storage. Must be a power of two value greater than or | ||
equal to 4096 (PAGE_SIZE). | ||
|
||
|
||
PCI Interface | ||
------------- | ||
|
||
The ERST device is a PCI device with two BARs, one for accessing the | ||
programming registers, and the other for accessing the record exchange | ||
buffer. | ||
|
||
BAR0 contains the programming interface consisting of ACTION and VALUE | ||
64-bit registers. All ERST actions/operations/side effects happen on | ||
the write to the ACTION, by design. Any data needed by the action must | ||
be placed into VALUE prior to writing ACTION. Reading the VALUE | ||
simply returns the register contents, which can be updated by a | ||
previous ACTION. | ||
|
||
BAR1 contains the 8KiB record exchange buffer, which is the | ||
implemented maximum record size. | ||
|
||
|
||
Backend Storage Format | ||
---------------------- | ||
|
||
The backend storage is divided into fixed size "slots", 8KiB in | ||
length, with each slot storing a single record. Not all slots need to | ||
be occupied, and they need not be occupied in a contiguous fashion. | ||
The ability to clear/erase specific records allows for the formation | ||
of unoccupied slots. | ||
|
||
Slot 0 contains a backend storage header that identifies the contents | ||
as ERST and also facilitates efficient access to the records. | ||
Depending upon the size of the backend storage, additional slots will | ||
be designated to be a part of the slot 0 header. For example, at 8KiB, | ||
the slot 0 header can accomodate 1021 records. Thus a storage size | ||
of 8MiB (8KiB * 1024) requires an additional slot for use by the | ||
header. In this scenario, slot 0 and slot 1 form the backend storage | ||
header, and records can be stored starting at slot 2. | ||
|
||
Below is an example layout of the backend storage format (for storage | ||
size less than 8MiB). The size of the storage is a multiple of 8KiB, | ||
and contains N number of slots to store records. The example below | ||
shows two records (in CPER format) in the backend storage, while the | ||
remaining slots are empty/available. | ||
|
||
:: | ||
|
||
Slot Record | ||
<------------------ 8KiB --------------------> | ||
+--------------------------------------------+ | ||
0 | storage header | | ||
+--------------------------------------------+ | ||
1 | empty/available | | ||
+--------------------------------------------+ | ||
2 | CPER | | ||
+--------------------------------------------+ | ||
3 | CPER | | ||
+--------------------------------------------+ | ||
... | | | ||
+--------------------------------------------+ | ||
N | empty/available | | ||
+--------------------------------------------+ | ||
|
||
The storage header consists of some basic information and an array | ||
of CPER record_id's to efficiently access records in the backend | ||
storage. | ||
|
||
All fields in the header are stored in little endian format. | ||
|
||
:: | ||
|
||
+--------------------------------------------+ | ||
| magic | 0x0000 | ||
+--------------------------------------------+ | ||
| record_offset | record_size | 0x0008 | ||
+--------------------------------------------+ | ||
| record_count | reserved | version | 0x0010 | ||
+--------------------------------------------+ | ||
| record_id[0] | 0x0018 | ||
+--------------------------------------------+ | ||
| record_id[1] | 0x0020 | ||
+--------------------------------------------+ | ||
| record_id[...] | | ||
+--------------------------------------------+ | ||
| record_id[N] | 0x1FF8 | ||
+--------------------------------------------+ | ||
|
||
The 'magic' field contains the value 0x524F545354535245. | ||
|
||
The 'record_size' field contains the value 0x2000, 8KiB. | ||
|
||
The 'record_offset' field points to the first record_id in the array, | ||
0x0018. | ||
|
||
The 'version' field contains 0x0100, the first version. | ||
|
||
The 'record_count' field contains the number of valid records in the | ||
backend storage. | ||
|
||
The 'record_id' array fields are the 64-bit record identifiers of the | ||
CPER record in the corresponding slot. Stated differently, the | ||
location of a CPER record_id in the record_id[] array provides the | ||
slot index for the corresponding record in the backend storage. | ||
|
||
Note that, for example, with a backend storage less than 8MiB, slot 0 | ||
contains the header, so the record_id[0] will never contain a valid | ||
CPER record_id. Instead slot 1 is the first available slot and thus | ||
record_id_[1] may contain a CPER. | ||
|
||
A 'record_id' of all 0s or all 1s indicates an invalid record (ie. the | ||
slot is available). | ||
|
||
|
||
References | ||
---------- | ||
|
||
[1] "Advanced Configuration and Power Interface Specification", | ||
version 4.0, June 2009. | ||
|
||
[2] "Unified Extensible Firmware Interface Specification", | ||
version 2.1, October 2008. | ||
|
||
[3] "Windows Hardware Error Architecture", specfically | ||
"Error Record Persistence Mechanism". |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.