Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it appropriate to specify an exception vs interrupt priority? #544

Closed
GregAC opened this issue Jul 21, 2020 · 11 comments
Closed

Is it appropriate to specify an exception vs interrupt priority? #544

GregAC opened this issue Jul 21, 2020 · 11 comments

Comments

@GregAC
Copy link
Contributor

GregAC commented Jul 21, 2020

The current privileged spec specifies:

Synchronous exceptions are of lower priority than all interrupts

I feel this may be better off being left as implementation defined or the meaning of priority in this instance should have further clarification.

Consider the following case:

  1. Core fetches a load instruction and begins its execution
  2. Core issues a load request on its external memory interface (i.e. it's not being satisfied by a cache if one exists)
  3. Before a response to the load request is received an interrupt is raised
  4. When the response to the load request does appear it signals a bus error which results in a synchronous exception (either interrupt and bus error response are seen on the same cycle or bus error response occurs after interrupt)

The obvious interpretation of the line I quote from the spec would be that in this scenario we should take the interrupt (whether we see the bus error response on the same cycle or not). However what do we then do with our pending load/pending exception on bus error?

If the response has yet to be received for the load then we need to keep some state around so we're aware there is a pending load response along with some information about the instruction and similarly if we saw the error response the same cycle as the interrupt we need some state to indicate a 'pending' exception with logic to handle the special pending load -> pending exception transition when the response returns. For larger cores this may be perfectly fine and indeed just 'fall out' of the existing micro-architectural design of a super-scalar out of order core. At the other end of the spectrum with tiny 2 or even 1 stage in-order pipelines this adds extra area and complexity.

Then there's also the question of how a 'pending exception' should be dealt with. One answer would be to immediately take the exception upon returning from an interrupt with an mret that results in a jump back to the code that saw the exception though I don't think the architecture strictly specifies an interrupt handler must end with mret (though it is probably inadvisible for software to use a different mechanism). You also have to deal with nested cases where you have a pending exception and end up with another interrupt and exception co-incident to get a second pending exception. For further complexity you could end having the same PCs both times. Clearly a big hardware stack of pending exceptions is not something you want to architecturally mandate.

In some cases on taking the interrupt we could set the mepc to the currently executing load and ignore any response received for it, so we simply repeat the load on return from the interrupt. However whether this is permissible is highly system/implementation dependent. For example repeating the load may fine when loading from some 'ordinary' memory but would not be fine when the load targets some device register where a read triggers some action (such as popping from a FIFO) and clearly wouldn't be fine for stores.

Another way to interpret the prioritization is it only applies between instructions, i.e. you wait until an instruction execution has been resolved (so for a load/store you know whether it will succeed or see a bus error) before taking an interrupt handler. In this sense the exception occurs 'before' the interrupt so you would take the exception first. Then if you see an interrupt the same cycle you might send a load or store request you prevent that request from going out in order that the interrupt can occur before a potential exception.

Though this still suffers from being complex to apply to different microarchitectures. In a big out of order core you are always executing multiple instructions so there is no neat dividing point you can pay attention to interrupts in. For a small core preventing a load or store request the same cycle you see an interrupt may introduce a nasty timing path (from interrupt in -> memory request out).

Ultimately I think the asynchronous vs synchronous nature of these two things means you can't sensibly specify a priority between them without adding extra baggage to the architecture to explain how things are synchronized (and potentially require this 'pending exception' concept). It is better off being left to implementors to decide (you could imagine a core that wanted the lowest latency interrupt possible may be happy to implement some kind of 'pending exception' idea or simply take a non-recoverable general 'system error' exception to allow prioritizing the interrupt given this case should be very rare and bus errors are generally software issues rather than an expected event with a correct program).

Incidentally I came across this whilst looking at an issue on the lowRISC Ibex core (some discussion here: lowRISC/ibex#1034) which is an in-order 2 or 3 stage pipeline. There I plan to deal with this scenario by waiting for the load or store response and taking the synchronous exception if a bus error is seen (the interrupt then occurring in the synchronous exception handler when interrupts are re-enabled).

@allenjbaum
Copy link

allenjbaum commented Jul 21, 2020 via email

@sorear
Copy link
Contributor

sorear commented Jul 22, 2020

"synchronous exception" in the above refers to idempotent failures that occur instead of executing an instruction, e.g. page faults. The purpose of that sentence is that low-privilege code in a fault loop, executing an infinite sequence of ecalls or misaligned accesses or whatever, must not be able to starve high-privilege code of interrupts.

@GregAC
Copy link
Contributor Author

GregAC commented Jul 22, 2020

High performance implementations treat bus error as asynchronous, because
they've already completed subsequent instructions by the time the bus error
is reported (even not-so-high performance implementations do).

Sure, I'd argue you can build a high-performance implementation that can take synchronous errors on bus errors on loads at least but my issue concerns the scenario where a bus error results in a synchronous exception. So if you're using an asynchronous exception for this then this issue doesn't apply. You could still end up with multiple pending asynchronous bus error exceptions as I described above but you could just say your asynchronous exception will only give the details (PC + Addr) of the first bus error seen.

The synchronous exceptions can be re-executed and possibly succeed; bus
errors can't (by my definition of bus error).

I think this is a reasonable definition of bus error, but the specification does not define a bus error at all, simply a defined synchronous exception called 'Load access fault' with no requirements placed on what a 'Load access fault' must be. A reasonable interpretation here seems to be that a load or store that results in a bus error produces a synchronous exception with a 'Load access fault' or 'Store/AMO access fault' cause.

"synchronous exception" in the above refers to idempotent failures that occur instead of executing an instruction, e.g. page faults.

But this isn't mandated in the specification, there is no real definition of what a 'synchronous exception is' indeed there is space in the cause table for custom use. An implementer is free to define some synchronous exception that only occurs on a load or store and occurs based upon the system's response to the request for instance. That response could be a bus error or something else entirely. In this situation dealing with an interrupt that occurs after request but before response whilst strictly sticking to the specification is problematic as I've outlined. Perhaps the intent is no synchronous exception should have these properties (implying that bus errors must result in an asynchronous exception).

The purpose of that sentence is that low-privilege code in a fault loop, executing an infinite sequence of ecalls or misaligned accesses or whatever, must not be able to starve high-privilege code of interrupts.

Well if the interrupt vs synchronous exception priority was implementation defined it'd be perfectly possible for the interrupt to take priority over the ecall/misaligned access exception and enter the handler. If the low-privilege code is intentionally hitting a bus error in a loop then you'd still enter the interrupt handler just a little later as soon as the exception handler re-enabled interrupts.

@sorear
Copy link
Contributor

sorear commented Jul 22, 2020

I think this is a reasonable definition of bus error, but the specification does not define a bus error at all, simply a defined synchronous exception called 'Load access fault' with no requirements placed on what a 'Load access fault' must be. A reasonable interpretation here seems to be that a load or store that results in a bus error produces a synchronous exception with a 'Load access fault' or 'Store/AMO access fault' cause.

I think you found a typo in the spec. "Access faults" in the "Machine level CSRs" chapter are the same thing as "access exceptions" in the "Physical memory protection" chapter; it is not simply a placeholder for user-defined errors.

there is no real definition of what a 'synchronous exception is'

General problem with the spec unfortunately.

If the low-privilege code is intentionally hitting a bus error in a loop then you'd still enter the interrupt handler just a little later as soon as the exception handler re-enabled interrupts.

Some kernels, e.g. seL4, never enable interrupts in kernel code and the next opportunity to take an interrupt is after returning to user.

@sorear
Copy link
Contributor

sorear commented Jul 22, 2020

Well if the interrupt vs synchronous exception priority was implementation defined it'd be perfectly possible for the interrupt to take priority over the ecall/misaligned access exception and enter the handler.

The purpose of a specification is to define enough for software to be able to work using only defined behavior. A kernel like seL4 needs timer and external hardware interrupts to take priority over software exceptions that can be generated by user code; to leave that implementation-defined renders the specification roughly as useful as if the existence of add had been left implementation-defined.


I think what you and I actually want here is to declare:

Synchronous exceptions generated by an instruction are of lower priority than all interrupts pending and enabled at the time the instruction was fetched.

I don't think we really specify the timing when interrupts become pending, but this is what cores actually do, and if an interrupt is ready before the memory access is initiated you don't initiate a memory access.

@GregAC
Copy link
Contributor Author

GregAC commented Jul 22, 2020

A kernel like seL4 needs timer and external hardware interrupts to take priority over software exceptions that can be generated by user code; to leave that implementation-defined renders the specification roughly as useful as if the existence of add had been left implementation-defined.

I'm not familiar with seL4 but if the exception prioritization was implementation defined as proposed then in the hostile/broken user-code case you suggest where there is a tight loop of faulting loads or similar then the interrupt handler will be executed immediately after the exception handler returns with an mret or sret (assuming the implementation chose that prioritization). So the user code won't be able to starve the kernel of interrupts. It would exactly like the case where the interrupt appears one cycle after the user code has executed the faulting load.

Synchronous exceptions generated by an instruction are of lower priority than all interrupts pending and enabled at the time the instruction was fetched.

Yes I think we want something like that though what 'fetched' means isn't straight-forward. I think wording around 'instruction boundaries' or 'beginning of instruction execution' may be better, e.g.

Synchronous exceptions generated by an instruction are of lower priority than all interrupts pending and enabled when the instruction begins its execution

Some clarification on what should be done on a bus error would also be useful. I'd argue we shouldn't constrain that it must be an asynchronous/imprecise exception but if the 'access fault' cause is specifically for PMP failures another spec defined cause for a generic 'bus error' or 'system fault' or something would be useful. Otherwise everyone will choose their own cause value which doesn't seem like a good idea for something that is a generic issue that will be seen on many systems. You'd be unable to write a kernel that can see an exception was due to a bus error without having to know implementation specifics.

@jscheid-ventana
Copy link
Contributor

Can this be closed, as it seems to not have traction?

@allenjbaum
Copy link

allenjbaum commented Aug 10, 2021 via email

@GregAC
Copy link
Contributor Author

GregAC commented Aug 11, 2021

My confusion/concern here ultimately stemmed from trying to literally apply the architecture to micro-architectural events. In this case the processor is awaiting a response to a memory transaction and will take a synchronous exception if an error response is seen. Whilst awaiting this an interrupt is raised (as in the physical wire into the processor changes state). Taking that interrupt before you can deal with the potential synchronous exception is tricky/impossible as you end up with a half completed memory instruction that needs to trap if an error response comes back but you can't replay it so it's not as easy as pointing mepc to it. You have a new half completed instruction/pending synchronous error at PC X concept.

However it is easily resolved in a architecturally permissible manner. Just because an interrupt line has physically changed state doesn't mean that's architecturally visible yet. You simply state pending interrupts become visible as you begin instruction execution so ignoring a change on an interrupt line whilst in the middle of an instruction is permissible. Perhaps some wording saying it's implementation defined when a change in an interrupt line becomes architecturally visible would be useful?

I do think there should be some mention of bus errors in the specification, even if just to state its implementation defined and some may choose an asynchronous interrupt, others may use a synchronous exception.

@aswaterman
Copy link
Member

The thread has bifurcated into a discussion of the desire for standardized handling of bus errors and a desire for clarification of the originally quoted remark that "Synchronous exceptions are of lower priority than all interrupts".

The bus-error topic will be addressed by an RVI TG.

To the original question:

The arrival of most interrupts is asynchronous to the instruction stream, and consequentially, implementations are usually offered the luxury of deferring the taking of such interrupts until it's convenient for them to do so. Only at defined points are implementations required to constrain this behavior. Although we're still working on the language, the specific situation you described isn't one of those events, and so it would be valid to take the exception over the interrupt.

What, then, was the point of the remark that "Synchronous exceptions are of lower priority than all interrupts"? That describes what happens when the instruction stream is synchronized to the list of pending interrupts. One such event is executing an MRET. If an interrupt is pending in the MRET's target privilege mode at the time of the MRET, then that interrupt must be taken before any synchronous exception that would've occurred as a result of execution in the target privilege mode. This isn't in conflict with your example.

So, I think there's nothing actionable here, and in a roundabout way you'll actually get what you want. I'm going to close this issue. Feel free to open subsequent issues that are more targeted (but let's keep them to a single topic).

@allenjbaum
Copy link

allenjbaum commented Aug 11, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants