Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] mepc value on instruction access fault exception #938

Closed
stnolting opened this issue Dec 16, 2022 · 14 comments
Closed

[Question] mepc value on instruction access fault exception #938

stnolting opened this issue Dec 16, 2022 · 14 comments

Comments

@stnolting
Copy link
Contributor

stnolting commented Dec 16, 2022

Setup: simple rv32i architecture, machine-mode only

If the CPU encounters an instruction access fault exception while accessing a specific address (for example 0x00000204), what value should be written to the mepc CSR?

As far as I understand, the trap value CSR mtval will be set to the address that caused the exception - 0x00000204 in this case. But what about mepc? The spec. says:

When a trap is taken into M-mode, mepc is written with the virtual address of the instruction that was interrupted or that encountered the exception

But in this case there is no instruction that was interrupted nor that explicitly caused the exception (so nothing like an ebreak instruction that explicitly raise an exception) - the exception was rather caused by the instruction fetch.

Should mepc also be set to 0x00000204 (which feels somehow redundant as mtval already shows this address) or should mepc be set to the previously-executed instruction (so 0x00000200 in this case)?

Or is this something that is platform-defined? 😅

@scottj97
Copy link
Contributor

The address of the instruction that was attempted to be fetched goes into mepc.

mtval is not necessarily the same. If compressed instructions are supported, an instruction could be half on one page, and the second half could cause a page fault. So for example you could have mepc of 0x1ffe and mtval of 0x2000.

@stnolting
Copy link
Contributor Author

mtval is not necessarily the same. If compressed instructions are supported, an instruction could be half on one page, and the second half could cause a page fault. So for example you could have mepc of 0x1ffe and mtval of 0x2000.

According to the priv. spec. instruction access fault and instruction page fault exception have the same priority (see cutout below). From my understanding mepc, mtval, mcause, ... get either updated according to an instruction access fault, or get updated according to an instruction page fault (mutually exclusive).

grafik

@scottj97
Copy link
Contributor

According to the priv. spec. instruction access fault and instruction page fault exception have the same priority (see cutout below). From my understanding mepc, mtval, mcause, ... get either updated according to an instruction access fault, or get updated according to an instruction page fault (mutually exclusive).

Yes, so what? I was explaining why mepc and mtval are two separate registers and not redundant.

Although the same thing could happen with an access fault on the second half of a page crossing.

@stnolting
Copy link
Contributor Author

Yes, so what? I was explaining why mepc and mtval are two separate registers and not redundant.

Oh course they are not redundant. I'm just trying to understand the behavior of mepc.

I thought that for async. exceptions mepc always gets updated with the address of the actually executed instruction that caused the exception. But in the case of an instruction access fault, there is no instruction that has been executed / attempted to be executed.

So, in my understanding that meant mepc would get the address of the last executed instruction right before the instruction fetch access fault occurred.

Seems like I got something wrong 😅

@scottj97
Copy link
Contributor

I thought that for async. exceptions mepc always gets updated with the address of the actually executed instruction that caused the exception. But in the case of an instruction access fault, there is no instruction that has been executed / attempted to be executed.

So, in my understanding that meant mepc would get the address of the last executed instruction right before the instruction fetch access fault occurred.

Seems like I got something wrong sweat_smile

An access fault is not an async exception. If an instruction's execution causes a fault then yes its PC would end up in mepc. But if the problem occurs during fetch, it doesn't make sense to point mepc to the previous successfully completed instruction, because that instruction had nothing to do with the exception.

@stnolting
Copy link
Contributor Author

But if the problem occurs during fetch, it doesn't make sense to point mepc to the previous successfully completed instruction, because that instruction had nothing to do with the exception.

That's the point! 😉

The instruction access fault mepc value has the interrupt bit cleared. And as interrupts basically are async. exceptions (?) I thought "instruction access fault" was an async. exception just like all the others (e.g. ebreak).

I tried to setup a table that shows which values will be written to mcause, mepc and mtval for all kind of traps (link):

grafik

Maybe it would be good to have something like that in the RISC-V ISA manual - even if some points might be highly implementation-defined 😉

Anyway, seems like I was on the wrong track here... Thanks for clearing!

@scottj97
Copy link
Contributor

The instruction access fault mepc value has the interrupt bit cleared. And as interrupts basically are async. exceptions (?) I thought "instruction access fault" was an async. exception just like all the others (e.g. ebreak).

ebreak is also not an async exception.

By "interrupt bit" I assume you are talking about the top bit of mcause? When 0, it means it is not an interrupt, so why would you assume these exceptions have async behavior like interrupts? The caption on Table 3.7 you pasted above expressly says these are synchronous.

An async interrupt means the timing of the trap is not connected to the specific sequence of instructions that were executing. It can come at any time and was not necessarily caused by anything in the instruction pipeline.

@stnolting
Copy link
Contributor Author

ebreak is also not an async exception.

Sorry, typo. This is what I actually meant:

The instruction access fault mepc value has the interrupt bit cleared. And as interrupts basically are async. exceptions (?) I thought "instruction access fault" was a SYNC. exception just like all the others (e.g. ebreak).


By "interrupt bit" I assume you are talking about the top bit of mcause?

Right.

When 0, it means it is not an interrupt, so why would you assume these exceptions have async behavior like interrupts?

That is the point :D

If instruction access fault has the interrupt bit cleared, I thought it updates mepc like other sync exceptions. But it turns out, that it does not. It rather updates mepc like an async exception by writing the address of the interrupted instruction to it.

An async interrupt means the timing of the trap is not connected to the specific sequence of instructions that were executing. It can come at any time and was not necessarily caused by anything in the instruction pipeline.

I understood that.

@stnolting
Copy link
Contributor Author

Another example to (hopefully) illustrate the cause of my confusion: 😅

  • An "instruction address misaligned" exception will be raised when jumping to a misaligned address. So mepc will contain "the address of the instruction that caused the exception" - so the address of the jump instruction.
  • (this is what I learned now) An "instruction access fault" exception will be raised by a faulty instruction fetch. So mepc will contain "the address of the instruction that was interrupted" / not-executed.

So in the first case we have a SYNC exception, but in the second case we have an ASYNC exception, right?

In both cases the "interrupt" bit of mcause is cleared. So this bit does not give any information about sync./async. exceptions, right?

Thanks again for all your help! 😉

@aswaterman
Copy link
Member

No, the second case is still synchronous.

It sounds like you are conflating the value that populates mepc with whether the event is synchronous. Synchronous events are those that occur in a consistent fashion with respect to the instruction stream. Obviously, access faults fall into that classification.

mepc always points to the place that execution should resume after the trap is processed, provided the trap is resumable. For most fetch-related exceptions, that means it contains the address of the instruction being fetched.

Misaligned fetch exceptions are never resumable—they signify an irrecoverable error in the program. They are the special case here. We report the exception on the earlier PC because it helps with debugging and because it’s easy to implement.

Access exceptions fall into the standard bucket: mepc points to where execution should resume.

@allenjbaum
Copy link

allenjbaum commented Dec 17, 2022 via email

@allenjbaum
Copy link

allenjbaum commented Dec 17, 2022 via email

@aswaterman
Copy link
Member

I think OP's question has been answered, so I'll close the issue. Please reopen if I'm wrong.

@stnolting
Copy link
Contributor Author

@allenjbaum @scottj97 @aswaterman

Thank you all for the detailed explanations!

For the records

I should have read the priv. spec, more carefully... This is what I just found: 😅

3.1.16 Machine Trap Value Register (mtval)
[...]
If mtval is written with a nonzero value when an instruction access-fault or page-fault exception occurs on a system with variable-length instructions, then mtval will contain the virtual address of the portion of the instruction that caused the fault, while mepc will point to the beginning of the instruction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants