Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCC optimization levels - See why the tests are not passing for -O1, -O3 #491

Open
amichai-bd opened this issue Feb 19, 2024 · 11 comments
Open
Assignees
Labels
build & env CORE_RRV documentation Improvements or additions to documentation Learning Learning
Milestone

Comments

@amichai-bd
Copy link
Collaborator

Currently in the GCC compile we are running without any optimizations when we are compiling are test.
Will note that we are adding the optimization level -O3 only in the linker and the final compilation, but the Original tests is already in assembly form that I don't know if there are other optimization that can take place when the code is already in assembly form.

See:

[COMMAND] riscv-none-embed-gcc.exe -S -ffreestanding -march=rv32i -I ../../../../../app/defines ../../../../.././verif/core_rrv/tests/hw_sw_lfsr_perf.c -o hw_sw_lfsr_perf_rv32i.c.s
[COMMAND] riscv-none-embed-gcc.exe -O3 -march=rv32i -T ../../../../../app/link.common.ld -I ../../../../../app/defines -Wl,--defsym=I_MEM_OFFSET=0 -Wl,--defsym=I_MEM_LENGTH=65536 -Wl,--defsym=D_MEM_OFFSET=65536 -Wl,--defsym=D_MEM_LENGTH=61440 -nostartfiles -D__riscv__ -Wl,-Map=hw_sw_lfsr_perf.map ../../../../../app/crt0/crt0_default.S hw_sw_lfsr_perf_rv32i.c.s -o hw_sw_lfsr_perf_rv32i.elf

We need to modify the build to compile the "-S" with optimization.
Test it with -O1, -O2 and -O3.
i think -O1 should be a good starting point
See descutions To read about it.

@amichai-bd amichai-bd added documentation Improvements or additions to documentation Learning Learning build & env CORE_RRV labels Feb 19, 2024
@amichai-bd amichai-bd added this to the MAFIA 0.5 milestone Feb 19, 2024
@amichai-bd amichai-bd self-assigned this Feb 19, 2024
@amichai-bd
Copy link
Collaborator Author

@roman012285 - just FYI

@amichai-bd
Copy link
Collaborator Author

@roman012285 - FYI
I ran this test before and after updating the build optimization flag:

python build.py -dut core_rrv -test alive -app -hw -sim

See the example of the elf without any optimization vs -O1 (minimal optimization)

[COMMAND] riscv-none-embed-gcc.exe -S -ffreestanding -march=rv32i -I ../../../../../app/defines ../../../../.././verif/core_rrv/tests/alive.c -o alive_rv32i.c.s

vs

[COMMAND] riscv-none-embed-gcc.exe -O3 -S -ffreestanding -march=rv32i -I ../../../../../app/defines ../../../../.././verif/core_rrv/tests/alive.c -o alive_rv32i.c.s

2 interesting points:

  1. In the "main" -> there is no use of the z so the compiler optimized away all of the logic!
  2. in the "sum" -> the compiler found that there is no need for a stack update and simply used the "add" and "ret"
    image

@amichai-bd
Copy link
Collaborator Author

@roman012285 Another FYI - seems like test that use are graphic libs are the one that are having issues.
seems like graphic libs have some issues when trying to run them with GCC optimizations.
image

I will need to debug this - and i would like to make the timeout shorter - ~50 sec is "too long" to figure out we have a timeout... If we have a specific test that needs a longer time we can override the TIMEOUT parameter using the "-g" parameter override.

@roman012285
Copy link
Collaborator

Crazy difference in sum

@amichai-bd
Copy link
Collaborator Author

I gave this a try and debug.. and things are confusing. the optimization makes it harder to understand.

First i tried comparing with the reference model.
But then i found out that the reference model is not able to execute the optimized program!
this is the test that i will debug & enable in the reference model. (not the rrv_core).
Once i have that up and running, i will be able to understand whats wrong with the rrv_core by comparing it to the reference model.
./build.py -dut rv32i_ref -test alive_vga -app -hw -sim

And this is the test:
image

Super simple - but its failing so its great to use for debug

@amichai-bd amichai-bd mentioned this issue Feb 20, 2024
@amichai-bd
Copy link
Collaborator Author

Found the issue!
The linker has a read-only section that for unknown reason was generated outside the "dmem" region! (0xd14 is outside the DMEM!!)
image

We explicitly wrote that the rodata section should be part of the "global_data"
image

But for some unknown reason, the -O1 flag moved some of it outside.
ChatGPT suggested fixing the linker SECTIONS a little:
image

And now it all works!

image

@roman012285
Copy link
Collaborator

wow!!! very nice

@amichai-bd
Copy link
Collaborator Author

Seems like its failing now the mini_core..

I really tried to understand what happened there..
But this time seems like the compiler just added random 4 reads to regions and location that i definitely did not want him to do..
So i added the cfg option to add the -O1, so currently its enabled in specific cfg..

@amichai-bd
Copy link
Collaborator Author

I added the optimization only to the "-cfg core_rrv_boot_trap"
See how the run time was cut in half!!!

image

vs this:
image

@amichai-bd
Copy link
Collaborator Author

Still need to figure out if we can enable the compile optimization also for the mini_core and big_core..

I don't like it that the compile added 3 reads that are not used.. but i don't know what we can do about it..

@roman012285
Copy link
Collaborator

nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build & env CORE_RRV documentation Improvements or additions to documentation Learning Learning
Projects
None yet
Development

No branches or pull requests

2 participants