Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vanadis] Implement the missing Zicntr extension instructions. #2372

Open
wants to merge 1 commit into
base: devel
Choose a base branch
from

Conversation

ldalessa
Copy link

@ldalessa ldalessa commented Jun 29, 2024

Chapter 8 in
https://github.com/riscv/riscv-isa-manual/releases/download/riscv-isa-release-f797123-2024-06-27/riscv-unprivileged.pdf details the RISC-V counter interface.

There are 3 bespoke counters detailed in the Zicntr extensions in 8.1, and up to 29 more user-programmable counters detailed in the Zihpm extenstion in 8.2. This commit provides space for all 32 counters in the register file, but only currently implements the three Zicntr counters.

The implementation consists of a few changes.

  1. I have extended the register file structure with 32, 64 bit counters, and increment and get members in order to update and read those counters.
  2. I have added a new instruction, Zicntr::VanadisReadCounterInstruction in order to read those counters. This instruction can be tailored for any of the 3 Zicntrs (implemented as tag types in inst/zicntr.h), XLEN=64 or 32, and with or without the [H] extension.
  3. I have added decoding to decoder/vriscv64decoder.h to handle RDCYCLE, RDTIME, and RDINSTRET. Because the [H] and XLEN=32 versions are only available in riscv32, the riscv64 decoder simply injects a decoding failure if those occur.
  4. I have extended the core's tick to update the three Zicntrs.
  5. I have added a small misc test example to demonstrate usage.

I have not implemented any functionality for programming the 29 Zihpm counters, nor have I added any special decoding to read these registers.

I have decided that the Zicntr instructions will be processed by the arithmetic functional unit. I don't know if this is appropriate, but it should manage any register port contention properly and seemed like the most expedient solution.

I have removed the cycle_count and the getCycleCount from the decoder base class, as it is no longer being used. The current cycle is still passed to the decoder tick() even though it is also still unused.

Chapter 8 in
https://github.com/riscv/riscv-isa-manual/releases/download/riscv-isa-release-f797123-2024-06-27/riscv-unprivileged.pdf
details the RISC-V counter interface.

There are 3 bespoke counters detailed in the Zicntr extensions in 8.1, and up to
29 more user-programmable counters detailed in the Zihpm extenstion in 8.2. This
commit provides space for all 32 counters in the register file, but only
currently implements the three Zicntr counters.

The implementation consists of a few changes.

1. I have extended the register file structure with 32, 64 bit counters, and
   `increment` and `get` members in order to update and read those counters.
2. I have added a new instruction, `Zicntr::VanadisReadCounterInstruction` in
   order to read those counters. This instruction can be tailored for any of the
   3 `Zicntr`s (implemented as tag types in `inst/zicntr.h`), XLEN=64 or 32, and
   with or without the `[H]` extension.
3. I have added decoding to `decoder/vriscv64decoder.h` to handle `RDCYCLE`,
   `RDTIME`, and `RDINSTRET`. Because the `[H]` and XLEN=32 versions are only
   available in `riscv32`, the `riscv64` decoder simply injects a decoding
   failure if those occur.
4. I have extended the core's `tick` to update the three `Zicntr`s.
5. I have added a small misc test example to demonstrate usage.

I have not implemented any functionality for programming the 29 `Zihpm`
counters, nor have I added any special decoding to read these registers.

I have decided that the `Zicntr` instructions will be processed by the
arithmetic functional unit. I don't know if this is appropriate, but it should
manage any register port contention properly and seemed like the most expedient
solution.

I have removed the `cycle_count` and the `getCycleCount` from the decoder base
class, as it is no longer being used. The current cycle is still passed to the
decoder `tick()` even though it is _also_ still unused.
@sst-autotester
Copy link
Contributor

Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging
NO INSPECTION HAS BEEN PERFORMED ON THIS PULL REQUEST! - This PR must be inspected by setting label 'AT: PRE-TEST INSPECTED'.

@ldalessa
Copy link
Author

ldalessa commented Jul 8, 2024

Is there someone that can approve this workflow?

@gvoskuilen
Copy link
Contributor

@ldalessa We'll get this reviewed. Our PR testing infrastructure is in the midst of upgrades and so test/merge will be delayed until it is back up.

@gvoskuilen gvoskuilen self-assigned this Jul 9, 2024
@hughes-c hughes-c self-assigned this Jul 10, 2024
@hughes-c hughes-c added this to the SST v14.1.0 milestone Jul 10, 2024
Copy link
Member

@hughes-c hughes-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to construct another test case for RV64 and do some more digging on the 'arbitrary' piece of the standard.

The biggest holdup is on the MIPS32 side since you've removed the cycle count. We need to see if there's a way to keep this.

src/sst/elements/vanadis/decoder/vdecoder.h Show resolved Hide resolved
src/sst/elements/vanadis/inst/regfile.h Show resolved Hide resolved
src/sst/elements/vanadis/inst/vzicntr.h Show resolved Hide resolved
src/sst/elements/vanadis/inst/vzicntr.h Show resolved Hide resolved
@ldalessa
Copy link
Author

I need to construct another test case for RV64 and do some more digging on the 'arbitrary' piece of the standard.

The biggest holdup is on the MIPS32 side since you've removed the cycle count. We need to see if there's a way to keep this.

Thanks for looking at this. I'm pretty sure that the mips decoder did nothing with the cycle count. getCycleCount wasn't used there, and all it did was assign cycle_count = cycle but cycle_count was unused. This was what I saw.

ldalessa@sst:~/sandia/sst-elements/src/sst/elements/vanadis$ grep -r getCycleCount
decoder/vdecoder.h:    uint64_t getCycleCount() const { return cycle_count; }
decoder/vriscv64decoder.h:                                    auto thread_call = std::bind(&VanadisRISCV64Decoder::getCycleCount, this);
ldalessa@sst:~/sandia/sst-elements/src/sst/elements/vanadis$ grep -r cycle_count
decoder/vdecoder.h:    uint64_t getCycleCount() const { return cycle_count; }
decoder/vdecoder.h:    uint64_t cycle_count;
decoder/vmipsdecoder.h:        cycle_count = cycle;
decoder/vriscv64decoder.h:        cycle_count = cycle;

If you want to retain it in the vmipsdecoder I think it makes sense to make cycle_count a member of that class rather than the base class.

@sst-autotester
Copy link
Contributor

Status Flag 'Pre-Test Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED by label AT: PRE-TEST INSPECTED! Autotester is Removing Label; this inspection will remain valid until a new commit to source branch is performed.

@sst-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects:

Pull Request Auto Testing STARTING (click to expand)

Build Information

Test Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements

  • Build Num: 1659
  • Status: STARTED

Build Information

Test Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_Make-Dist

  • Build Num: 1069
  • Status: STARTED

Build Information

Test Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_MT-2

  • Build Num: 1617
  • Status: STARTED

Build Information

Test Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_MR-2

  • Build Num: 1617
  • Status: STARTED

Build Information

Test Name: SST__AutotestGen2_NewFW_OSX-14-XC15-ARM2_OMPI-4.1.6_PY3.10_sst-elements

  • Build Num: 189
  • Status: STARTED

Using Repos:

Repo: ELEMENTS (ldalessa/sst-elements)
  • Branch: zicntr
  • SHA: f0eea4a
  • Mode: TEST_REPO
Repo: SQE (sstsimulator/sst-sqe)
  • Branch: devel
  • SHA: 2574c98896598820227190149834172b962dc3fc
  • Mode: SUPPORT_REPO
Repo: CORE (sstsimulator/sst-core)
  • Branch: devel
  • SHA: 4478448d4ba7e6c8f1ca598f1537071b15ffbd6d
  • Mode: SUPPORT_REPO
Repo: MACRO (sstsimulator/sst-macro)
  • Branch: devel
  • SHA: 50a62170b3681ea20cc2f56abd2eb3911053f1fc
  • Mode: SUPPORT_REPO

Pull Request Author: ldalessa

@sst-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED

Note: Testing will normally be attempted again in approx. 4 Hrs. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run.

Pull Request Auto Testing has FAILED (click to expand)

Job: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements

  • Result: FAILED
  • Build #: 1659
  • URL: Jenkins server at https://sst-jenkins.sandia.gov/view/SST/job/SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements/1659/consoleFull
  • Job: - Status: FAILURE

Job: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_Make-Dist

  • Result: FAILED
  • Build #: 1069
  • URL: Jenkins server at https://sst-jenkins.sandia.gov/view/SST/job/SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_Make-Dist/1069/consoleFull
  • Job: - Status: FAILURE

Job: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_MT-2

  • Result: FAILED
  • Build #: 1617
  • URL: Jenkins server at https://sst-jenkins.sandia.gov/view/SST/job/SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_MT-2/1617/consoleFull
  • Job: - Status: FAILURE

Job: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_MR-2

  • Result: FAILED
  • Build #: 1617
  • URL: Jenkins server at https://sst-jenkins.sandia.gov/view/SST/job/SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_MR-2/1617/consoleFull
  • Job: - Status: FAILURE

Job: SST__AutotestGen2_NewFW_OSX-14-XC15-ARM2_OMPI-4.1.6_PY3.10_sst-elements

  • Result: FAILED
  • Build #: 189
  • URL: Jenkins server at https://sst-jenkins.sandia.gov/view/SST/job/SST__AutotestGen2_NewFW_OSX-14-XC15-ARM2_OMPI-4.1.6_PY3.10_sst-elements/189/consoleFull
  • Job: - Status: FAILURE

Test Results

Test Name Status
test_memHSieve SKIPPED
test_merlin_polarfly_455 SKIPPED
test_merlin_polarstar_504 SKIPPED
test_prospero_binary_using_PIN_traces SKIPPED
test_prospero_binary_withtimingdram_using_PIN_traces SKIPPED
test_prospero_text_using_PIN_traces SKIPPED
test_prospero_text_withtimingdram_using_PIN_traces SKIPPED
test_rdmaNic_short_tests_001_app_rdma_msg SKIPPED
test_rdmaNic_short_tests_002_app_mpi_IMB_MPI1 SKIPPED
test_sstinfo_interactive SKIPPED
test_vanadis_short_tests_038_small_misc_fork_mipsel_gold1 FAILED
test_vanadis_short_tests_039_small_misc_fork_mipsel_gold2 FAILED
test_vanadis_short_tests_040_small_misc_fork_riscv64_gold1 FAILED
test_vanadis_short_tests_041_small_misc_fork_riscv64_gold2 FAILED
test_vanadis_short_tests_042_small_misc_clone_mipsel_gold1 FAILED
test_vanadis_short_tests_043_small_misc_clone_mipsel_gold2 FAILED
test_vanadis_short_tests_044_small_misc_clone_riscv64_gold1 FAILED
test_vanadis_short_tests_045_small_misc_clone_riscv64_gold2 FAILED
test_vanadis_short_tests_046_small_misc_pthread_mipsel_gold1 FAILED
test_vanadis_short_tests_047_small_misc_pthread_mipsel_gold2 FAILED
test_vanadis_short_tests_048_small_misc_pthread_riscv64_gold1 FAILED
test_vanadis_short_tests_049_small_misc_pthread_riscv64_gold2 FAILED
test_vanadis_short_tests_050_small_misc_openmp_mipsel_4core FAILED
test_vanadis_short_tests_051_small_misc_openmp_mipsel_4thread FAILED
test_vanadis_short_tests_052_small_misc_openmp_mipsel_2core_2thread FAILED
test_vanadis_short_tests_053_small_misc_openmp_riscv64_4core FAILED
test_vanadis_short_tests_054_small_misc_openmp_riscv64_4thread FAILED
test_vanadis_short_tests_055_small_misc_openmp_riscv64_2core_2thread FAILED
test_vanadis_short_tests_056_small_misc_openmp2_riscv64_16core FAILED
test_vanadis_short_tests_057_small_misc_openmp2_riscv64_32thread FAILED
test_vanadis_short_tests_058_small_misc_openmp2_riscv64_4core_8thread FAILED

@hughes-c hughes-c added the AT: WIP Mark PR as a Work in Progress (No Autotesting Performed) label Oct 7, 2024
@gvoskuilen
Copy link
Contributor

@ldalessa This PR fails to compile in our tests.
On Rocky/gcc 8:

./decoder/vriscv64decoder.h: In member function 'void SST::Vanadis::VanadisRISCV64Decoder::decode(SST::Output*, uint64_t, uint32_t, SST::Vanadis::VanadisInstructionBundle*)':
./decoder/vriscv64decoder.h:1112:132: error: class template argument deduction failed:
                                 bundle->addInstruction( new VanadisReadCounterInstruction( CYCLE, ins_address, hw_thr, options, rd ) );
                                                                                                                                    ^
./decoder/vriscv64decoder.h:1112:132: error: no matching function for call to 'VanadisReadCounterInstruction()'
In file included from ./inst/vinstall.h:125,
                 from ./decoder/vriscv64decoder.h:20,
                 from decoder/vriscv64decoder.cc:19:
./inst/vzicntr_readcounter.h:20:13: note: candidate: 'template<unsigned int id, long unsigned int XLEN, bool H> VanadisReadCounterInstruction(std::integral_constant<unsigned int, id>, uint64_t, uint32_t, const SST::Vanadis::VanadisDecoderOptions*, uint16_t)-> SST::Vanadis::Zicntr::VanadisReadCounterInstruction<id, XLEN, H>'
             VanadisReadCounterInstruction(
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./inst/vzicntr_readcounter.h:20:13: note:   template argument deduction/substitution failed:
In file included from decoder/vriscv64decoder.cc:19:
./decoder/vriscv64decoder.h:1112:132: note:   candidate expects 5 arguments, 0 provided
                                 bundle->addInstruction( new VanadisReadCounterInstruction( CYCLE, ins_address, hw_thr, options, rd ) );
                                                                                                                                    ^
./decoder/vriscv64decoder.h:1117:131: error: class template argument deduction failed:
                                 bundle->addInstruction( new VanadisReadCounterInstruction( TIME, ins_address, hw_thr, options, rd ) );
                                                                                                                                   ^
./decoder/vriscv64decoder.h:1117:131: error: no matching function for call to 'VanadisReadCounterInstruction()'
In file included from ./inst/vinstall.h:125,
                 from ./decoder/vriscv64decoder.h:20,
                 from decoder/vriscv64decoder.cc:19:
./inst/vzicntr_readcounter.h:20:13: note: candidate: 'template<unsigned int id, long unsigned int XLEN, bool H> VanadisReadCounterInstruction(std::integral_constant<unsigned int, id>, uint64_t, uint32_t, const SST::Vanadis::VanadisDecoderOptions*, uint16_t)-> SST::Vanadis::Zicntr::VanadisReadCounterInstruction<id, XLEN, H>'
             VanadisReadCounterInstruction(
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./inst/vzicntr_readcounter.h:20:13: note:   template argument deduction/substitution failed:
In file included from decoder/vriscv64decoder.cc:19:
./decoder/vriscv64decoder.h:1117:131: note:   candidate expects 5 arguments, 0 provided
                                 bundle->addInstruction( new VanadisReadCounterInstruction( TIME, ins_address, hw_thr, options, rd ) );
                                                                                                                                   ^
./decoder/vriscv64decoder.h:1122:134: error: class template argument deduction failed:
                                 bundle->addInstruction( new VanadisReadCounterInstruction( INSTRET, ins_address, hw_thr, options, rd ) );
                                                                                                                                      ^
./decoder/vriscv64decoder.h:1122:134: error: no matching function for call to 'VanadisReadCounterInstruction()'
In file included from ./inst/vinstall.h:125,
                 from ./decoder/vriscv64decoder.h:20,
                 from decoder/vriscv64decoder.cc:19:
./inst/vzicntr_readcounter.h:20:13: note: candidate: 'template<unsigned int id, long unsigned int XLEN, bool H> VanadisReadCounterInstruction(std::integral_constant<unsigned int, id>, uint64_t, uint32_t, const SST::Vanadis::VanadisDecoderOptions*, uint16_t)-> SST::Vanadis::Zicntr::VanadisReadCounterInstruction<id, XLEN, H>'
             VanadisReadCounterInstruction(
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./inst/vzicntr_readcounter.h:20:13: note:   template argument deduction/substitution failed:
In file included from decoder/vriscv64decoder.cc:19:
./decoder/vriscv64decoder.h:1122:134: note:   candidate expects 5 arguments, 0 provided
                                 bundle->addInstruction( new VanadisReadCounterInstruction( INSTRET, ins_address, hw_thr, options, rd ) );
                                                                                                                                      ^

@hughes-c
Copy link
Member

hughes-c commented Oct 7, 2024

@ldalessa
In file included from decoder/vriscv64decoder.cc:19: ./decoder/vriscv64decoder.h: In member function 'void SST::Vanadis::VanadisRISCV64Decoder::decode(SST::Output*, uint64_t, uint32_t, SST::Vanadis::VanadisInstructionBundle*)': ./decoder/vriscv64decoder.h:1112:132: error: class template argument deduction failed: bundle->addInstruction( new VanadisReadCounterInstruction( CYCLE, ins_address, hw_thr, options, rd ) ); ^ ./decoder/vriscv64decoder.h:1112:132: error: no matching function for call to 'VanadisReadCounterInstruction()' In file included from ./inst/vinstall.h:125, from ./decoder/vriscv64decoder.h:20, from decoder/vriscv64decoder.cc:19: ./inst/vzicntr_readcounter.h:20:13: note: candidate: 'template<unsigned int id, long unsigned int XLEN, bool H> VanadisReadCounterInstruction(std::integral_constant<unsigned int, id>, uint64_t, uint32_t, const SST::Vanadis::VanadisDecoderOptions*, uint16_t)-> SST::Vanadis::Zicntr::VanadisReadCounterInstruction<id, XLEN, H>' VanadisReadCounterInstruction( ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ./inst/vzicntr_readcounter.h:20:13: note: template argument deduction/substitution failed: In file included from decoder/vriscv64decoder.cc:19: ./decoder/vriscv64decoder.h:1112:132: note: candidate expects 5 arguments, 0 provided bundle->addInstruction( new VanadisReadCounterInstruction( CYCLE, ins_address, hw_thr, options, rd ) ); ^ ./decoder/vriscv64decoder.h:1117:131: error: class template argument deduction failed: bundle->addInstruction( new VanadisReadCounterInstruction( TIME, ins_address, hw_thr, options, rd ) ); ^ ./decoder/vriscv64decoder.h:1117:131: error: no matching function for call to 'VanadisReadCounterInstruction()'

@ldalessa
Copy link
Author

ldalessa commented Oct 7, 2024

Yeah. I’ll take a look at it. It’s likely to be a simple-but-annoying fix. I didn’t realize there was support for such old toolchains.

@hughes-c
Copy link
Member

hughes-c commented Oct 7, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AT: WIP Mark PR as a Work in Progress (No Autotesting Performed) Enhancement SST-vanadis
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants