-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible issue with FIRQ pending interrupt clearing #818
Comments
I suspect this code, in neorv32_cpu_control.vhd inside csr_write_data, may not be doing what was intended:
It seems csr.rdata is set to 0 on cycles when no CSR read operation is in progress, so it's possible that it does not have the right data when this is executing. The symptoms would match a value of 0 being used rather than the actual MIP register content, causing the entire register to be zeroed. |
It looks like doing a csr_write for the MIP register with the bit for the FIRQ being cleared as 0 and all other bits 1, rather than using csr_clear, can work around the problem in software, since writing 1 to bits in MIP has no effect. With this change I am only seeing the mip_firq_nclr bit for the desired interrupt going to 0 rather than all of them. |
Hey @robhancocksed
That sounds great - thank you very much!
I will have a closer look at this, but I think you might have found a severe bug here 🙈 |
This is really a mean bug... Thanks for identifying this! |
This issue should be fixed now. Thanks again for finding this bug! |
@stnolting It looks like the problem is not solved unfortunately - if I remove my previous workaround to use csr_write rather than csr_clear, and update the core, the same XIRQ hang eventually occurred, whereas with the workaround the system survived a test run for a significant period of time. And the change would actually break the workaround I was using as well, since writing 1 bits into MIP would now actually trigger those interrupts.. |
I've just tested the read/write/clear/set capabilities of Maybe this is just a software issue? CSR writeneorv32_cpu_csr_write(CSR_MIP, data):
mip <= data; This will override all writable bits of CSR clearneorv32_cpu_csr_clr(CSR_MIP, data):
mip <= mip & (~data); This will clear bit i of CSR setneorv32_cpu_csr_set(CSR_MIP, data):
mip <= mip | data; This will set bit i of
|
The code was using the csr_clear function in Zephyr which generates the assembly: 35774: 010007b7 lui a5,0x1000 which should be correct in terms of using an atomic clear instruction. It seems like the problem might be fairly timing dependent however, so a basic sequential test might not pick up whatever is happening? |
This seems fine as this is an atomic CSR access. Even if the according FIRQ source is firing right in that moment where neorv32/rtl/core/neorv32_cpu_control.vhd Lines 1455 to 1457 in 4155b72
What is the actual "symptom" of your use case? Does the CPU miss an incoming FIRQ trigger? |
I made some more tests and now I can confirm that a FIRQ trigger gets lost when it arrives exactly 1 or 2 cycles right before doing any I'll try to come up with another (better) fix 😅 I'll also check all further CSRw that have implicit updates (by the hardware). |
@stnolting Basically yes, it appears that an XIRQ interrupt gets raised very close to the moment that a CSR clear to clear a different interrupt is happening, and that write incorrectly clears the XIRQ interrupt as well. That means the XIRQ ISR never runs, and so because the first interrupt source is never cleared in XIRQ, it never raises any more interrupts. I think the latest issue you discovered likely explains what is going on there. |
This was harder to fix then I have expected... 🙈 But I think -> #829 |
My stress test ran over the weekend with no errors without the software workaround in place, so I think it's safe to say this is now resolved. |
Thanks for the feedback! I'm very glad to hear that 😅 |
Describe the bug
I am working on some changes to add support for the NEORV32 XIRQ interrupt controller to Zephyr, and have a setup that is mostly working. (I am planning to submit it upstream once it's debugged.) However, I'm running into an issue where XIRQ interrupts seem to stop being processed, and have narrowed it down what I think is an issue with how the CSRC instruction is being handled. I added some debug probes into the synthesized design in Vivado to try to see what is going on, and I suspect the CPU is not handling the FIRQ pending interrupt clearing properly.
The PC value shown corresponds to the following instructions which are part of the XIRQ interrupt handler:
35774: 010007b7 lui a5,0x1000
35778: 3447b073 csrc mip,a5
which should be clearing bit 24 in the MIP register, corresponding to FIRQ 8 for XIRQ, as far as I can tell. However, the logic capture seems to be showing that this results in ALL pending FIRQs being cleared out. So I suspect what is happening is that when another FIRQ handler runs, such as for the UART, it can sometimes end up clearing an XIRQ interrupt before it has a chance to be processed, resulting in an XIRQ interrupt hang.
In the Vivado ILA capture above, the neorv32_SystemTop_ax_0/U0/neorv32_top_inst/core_complex.neorv32_cpu_inst/neorv32_cpu_control_inst/csr[mip_firq_nclr][8]_i_1_n_0 signal corresponds (I think) to the state of the mip_firq_nclr value for FIRQ 8 which this instruction should be clearing, and it does, but you can see that mip_firq_nclr is also being pulsed low for the other used FIRQs (0, 2, 3, 12, 15) as well, and presumably shouldn't be.
Environment:
Hardware:
The text was updated successfully, but these errors were encountered: