|
| 1 | +KVM/ARM VGIC Forwarded Physical Interrupts |
| 2 | +========================================== |
| 3 | + |
| 4 | +The KVM/ARM code implements software support for the ARM Generic |
| 5 | +Interrupt Controller's (GIC's) hardware support for virtualization by |
| 6 | +allowing software to inject virtual interrupts to a VM, which the guest |
| 7 | +OS sees as regular interrupts. The code is famously known as the VGIC. |
| 8 | + |
| 9 | +Some of these virtual interrupts, however, correspond to physical |
| 10 | +interrupts from real physical devices. One example could be the |
| 11 | +architected timer, which itself supports virtualization, and therefore |
| 12 | +lets a guest OS program the hardware device directly to raise an |
| 13 | +interrupt at some point in time. When such an interrupt is raised, the |
| 14 | +host OS initially handles the interrupt and must somehow signal this |
| 15 | +event as a virtual interrupt to the guest. Another example could be a |
| 16 | +passthrough device, where the physical interrupts are initially handled |
| 17 | +by the host, but the device driver for the device lives in the guest OS |
| 18 | +and KVM must therefore somehow inject a virtual interrupt on behalf of |
| 19 | +the physical one to the guest OS. |
| 20 | + |
| 21 | +These virtual interrupts corresponding to a physical interrupt on the |
| 22 | +host are called forwarded physical interrupts, but are also sometimes |
| 23 | +referred to as 'virtualized physical interrupts' and 'mapped interrupts'. |
| 24 | + |
| 25 | +Forwarded physical interrupts are handled slightly differently compared |
| 26 | +to virtual interrupts generated purely by a software emulated device. |
| 27 | + |
| 28 | + |
| 29 | +The HW bit |
| 30 | +---------- |
| 31 | +Virtual interrupts are signalled to the guest by programming the List |
| 32 | +Registers (LRs) on the GIC before running a VCPU. The LR is programmed |
| 33 | +with the virtual IRQ number and the state of the interrupt (Pending, |
| 34 | +Active, or Pending+Active). When the guest ACKs and EOIs a virtual |
| 35 | +interrupt, the LR state moves from Pending to Active, and finally to |
| 36 | +inactive. |
| 37 | + |
| 38 | +The LRs include an extra bit, called the HW bit. When this bit is set, |
| 39 | +KVM must also program an additional field in the LR, the physical IRQ |
| 40 | +number, to link the virtual with the physical IRQ. |
| 41 | + |
| 42 | +When the HW bit is set, KVM must EITHER set the Pending OR the Active |
| 43 | +bit, never both at the same time. |
| 44 | + |
| 45 | +Setting the HW bit causes the hardware to deactivate the physical |
| 46 | +interrupt on the physical distributor when the guest deactivates the |
| 47 | +corresponding virtual interrupt. |
| 48 | + |
| 49 | + |
| 50 | +Forwarded Physical Interrupts Life Cycle |
| 51 | +---------------------------------------- |
| 52 | + |
| 53 | +The state of forwarded physical interrupts is managed in the following way: |
| 54 | + |
| 55 | + - The physical interrupt is acked by the host, and becomes active on |
| 56 | + the physical distributor (*). |
| 57 | + - KVM sets the LR.Pending bit, because this is the only way the GICV |
| 58 | + interface is going to present it to the guest. |
| 59 | + - LR.Pending will stay set as long as the guest has not acked the interrupt. |
| 60 | + - LR.Pending transitions to LR.Active on the guest read of the IAR, as |
| 61 | + expected. |
| 62 | + - On guest EOI, the *physical distributor* active bit gets cleared, |
| 63 | + but the LR.Active is left untouched (set). |
| 64 | + - KVM clears the LR on VM exits when the physical distributor |
| 65 | + active state has been cleared. |
| 66 | + |
| 67 | +(*): The host handling is slightly more complicated. For some forwarded |
| 68 | +interrupts (shared), KVM directly sets the active state on the physical |
| 69 | +distributor before entering the guest, because the interrupt is never actually |
| 70 | +handled on the host (see details on the timer as an example below). For other |
| 71 | +forwarded interrupts (non-shared) the host does not deactivate the interrupt |
| 72 | +when the host ISR completes, but leaves the interrupt active until the guest |
| 73 | +deactivates it. Leaving the interrupt active is allowed, because Linux |
| 74 | +configures the physical GIC with EOIMode=1, which causes EOI operations to |
| 75 | +perform a priority drop allowing the GIC to receive other interrupts of the |
| 76 | +default priority. |
| 77 | + |
| 78 | + |
| 79 | +Forwarded Edge and Level Triggered PPIs and SPIs |
| 80 | +------------------------------------------------ |
| 81 | +Forwarded physical interrupts injected should always be active on the |
| 82 | +physical distributor when injected to a guest. |
| 83 | + |
| 84 | +Level-triggered interrupts will keep the interrupt line to the GIC |
| 85 | +asserted, typically until the guest programs the device to deassert the |
| 86 | +line. This means that the interrupt will remain pending on the physical |
| 87 | +distributor until the guest has reprogrammed the device. Since we |
| 88 | +always run the VM with interrupts enabled on the CPU, a pending |
| 89 | +interrupt will exit the guest as soon as we switch into the guest, |
| 90 | +preventing the guest from ever making progress as the process repeats |
| 91 | +over and over. Therefore, the active state on the physical distributor |
| 92 | +must be set when entering the guest, preventing the GIC from forwarding |
| 93 | +the pending interrupt to the CPU. As soon as the guest deactivates the |
| 94 | +interrupt, the physical line is sampled by the hardware again and the host |
| 95 | +takes a new interrupt if and only if the physical line is still asserted. |
| 96 | + |
| 97 | +Edge-triggered interrupts do not exhibit the same problem with |
| 98 | +preventing guest execution that level-triggered interrupts do. One |
| 99 | +option is to not use HW bit at all, and inject edge-triggered interrupts |
| 100 | +from a physical device as pure virtual interrupts. But that would |
| 101 | +potentially slow down handling of the interrupt in the guest, because a |
| 102 | +physical interrupt occurring in the middle of the guest ISR would |
| 103 | +preempt the guest for the host to handle the interrupt. Additionally, |
| 104 | +if you configure the system to handle interrupts on a separate physical |
| 105 | +core from that running your VCPU, you still have to interrupt the VCPU |
| 106 | +to queue the pending state onto the LR, even though the guest won't use |
| 107 | +this information until the guest ISR completes. Therefore, the HW |
| 108 | +bit should always be set for forwarded edge-triggered interrupts. With |
| 109 | +the HW bit set, the virtual interrupt is injected and additional |
| 110 | +physical interrupts occurring before the guest deactivates the interrupt |
| 111 | +simply mark the state on the physical distributor as Pending+Active. As |
| 112 | +soon as the guest deactivates the interrupt, the host takes another |
| 113 | +interrupt if and only if there was a physical interrupt between injecting |
| 114 | +the forwarded interrupt to the guest and the guest deactivating the |
| 115 | +interrupt. |
| 116 | + |
| 117 | +Consequently, whenever we schedule a VCPU with one or more LRs with the |
| 118 | +HW bit set, the interrupt must also be active on the physical |
| 119 | +distributor. |
| 120 | + |
| 121 | + |
| 122 | +Forwarded LPIs |
| 123 | +-------------- |
| 124 | +LPIs, introduced in GICv3, are always edge-triggered and do not have an |
| 125 | +active state. They become pending when a device signal them, and as |
| 126 | +soon as they are acked by the CPU, they are inactive again. |
| 127 | + |
| 128 | +It therefore doesn't make sense, and is not supported, to set the HW bit |
| 129 | +for physical LPIs that are forwarded to a VM as virtual interrupts, |
| 130 | +typically virtual SPIs. |
| 131 | + |
| 132 | +For LPIs, there is no other choice than to preempt the VCPU thread if |
| 133 | +necessary, and queue the pending state onto the LR. |
| 134 | + |
| 135 | + |
| 136 | +Putting It Together: The Architected Timer |
| 137 | +------------------------------------------ |
| 138 | +The architected timer is a device that signals interrupts with level |
| 139 | +triggered semantics. The timer hardware is directly accessed by VCPUs |
| 140 | +which program the timer to fire at some point in time. Each VCPU on a |
| 141 | +system programs the timer to fire at different times, and therefore the |
| 142 | +hardware is multiplexed between multiple VCPUs. This is implemented by |
| 143 | +context-switching the timer state along with each VCPU thread. |
| 144 | + |
| 145 | +However, this means that a scenario like the following is entirely |
| 146 | +possible, and in fact, typical: |
| 147 | + |
| 148 | +1. KVM runs the VCPU |
| 149 | +2. The guest programs the time to fire in T+100 |
| 150 | +3. The guest is idle and calls WFI (wait-for-interrupts) |
| 151 | +4. The hardware traps to the host |
| 152 | +5. KVM stores the timer state to memory and disables the hardware timer |
| 153 | +6. KVM schedules a soft timer to fire in T+(100 - time since step 2) |
| 154 | +7. KVM puts the VCPU thread to sleep (on a waitqueue) |
| 155 | +8. The soft timer fires, waking up the VCPU thread |
| 156 | +9. KVM reprograms the timer hardware with the VCPU's values |
| 157 | +10. KVM marks the timer interrupt as active on the physical distributor |
| 158 | +11. KVM injects a forwarded physical interrupt to the guest |
| 159 | +12. KVM runs the VCPU |
| 160 | + |
| 161 | +Notice that KVM injects a forwarded physical interrupt in step 11 without |
| 162 | +the corresponding interrupt having actually fired on the host. That is |
| 163 | +exactly why we mark the timer interrupt as active in step 10, because |
| 164 | +the active state on the physical distributor is part of the state |
| 165 | +belonging to the timer hardware, which is context-switched along with |
| 166 | +the VCPU thread. |
| 167 | + |
| 168 | +If the guest does not idle because it is busy, the flow looks like this |
| 169 | +instead: |
| 170 | + |
| 171 | +1. KVM runs the VCPU |
| 172 | +2. The guest programs the time to fire in T+100 |
| 173 | +4. At T+100 the timer fires and a physical IRQ causes the VM to exit |
| 174 | + (note that this initially only traps to EL2 and does not run the host ISR |
| 175 | + until KVM has returned to the host). |
| 176 | +5. With interrupts still disabled on the CPU coming back from the guest, KVM |
| 177 | + stores the virtual timer state to memory and disables the virtual hw timer. |
| 178 | +6. KVM looks at the timer state (in memory) and injects a forwarded physical |
| 179 | + interrupt because it concludes the timer has expired. |
| 180 | +7. KVM marks the timer interrupt as active on the physical distributor |
| 181 | +7. KVM enables the timer, enables interrupts, and runs the VCPU |
| 182 | + |
| 183 | +Notice that again the forwarded physical interrupt is injected to the |
| 184 | +guest without having actually been handled on the host. In this case it |
| 185 | +is because the physical interrupt is never actually seen by the host because the |
| 186 | +timer is disabled upon guest return, and the virtual forwarded interrupt is |
| 187 | +injected on the KVM guest entry path. |
0 commit comments