Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update RISCV ISA by parsing binutils #44

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

patrick-rivos
Copy link
Contributor

The RISCV ISA is roughly 5 years outdated at this point. Existing instructions still work great, but there have been new extensions released in the meantime. One of those extensions is the Vector extension, which encodes a fair amount of complexity with its variable length instruction encoding. Vector spec

This PR adds a parser for the RISCV instruction set and references binutils 2.40 as a source of truth. This PR is a draft and still has work to be done (memory reads/writes using vector, actually encoding the binary format, etc.)

@patrick-rivos patrick-rivos marked this pull request as draft July 24, 2023 23:04
@patrick-rivos patrick-rivos changed the title Update riscv isa Update RISCV ISA by parsing binutils Jul 24, 2023
.gitmodules Outdated Show resolved Hide resolved
dev_tools/binutils-gdb Outdated Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that is not documented anywhere... but the conventions on the YAML instruction entries are the following:

- Name: "C.SD_V0"  -> Instruction name in capital letters. Usually is the assembly mnemonic
                                     plus "_V#" that indicates the instruction variant starting with 0.
                                     Sometimes the same instruction/mnemonic is implemented differently
                                     in the microrarchitecture depending on the input parameters
                                     (e.g. add immeidate can have two implementations: the regular
                                     one and when the immediate
                                     is 0 , which is essentially a no-operation)
  Mnemonic: "C.SD" -> mnemonic (string that will be generated) in capital letters
  Opcode: "0" -> opcode
  Format: "cl_d" -> instruction format
  Operands:                             -> Default operands are set from the instruction format, but if one
     funct3: ['7', 'funct3', '?']           need to override them you can do so with this entries. Check existing
                                                    RISCV definitions or the POWER backend for examples.
  MemoryOperands:  -> Memory operands are defined here.
    ....
  ImplicitOperands:

Make sure your parser generates maintaining this format and conventions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is super helpful - Thank you!

@rbertran
Copy link
Collaborator

Thanks, I added some feedback on some files. I assume that this is still a WIP PR, once it is more mature and passes the CI I'll integrate it. From the CI output, it looks like something broke the existing code generation fon RISCV.

If you hit a roadblock, let me know and I'll try to debug what can be going on.

@patrick-rivos
Copy link
Contributor Author

patrick-rivos commented Jul 25, 2023

Thanks! I'm pretty sure the CI is failing since the new instructions don't have their binary representations correctly specified yet.

And yes - it's an early WIP PR to make sure my general approach will be mergable once it's ready :)

@patrick-rivos
Copy link
Contributor Author

This is still a draft since I haven't had the time to dig into the actual opcodes - It's functional for generating assembly.

I updated the format/naming conventions to match existing instructions (V0 is unmasked operations or masked with no unmasked equivalent, V1 is masked). Also removed the submodule dependency on binutils. I'm not sure when I'll have the time to get the vector opcodes complete, but when I do I'll chip away at it and keep this PR up-to-date with my progress.

Signed-off-by: Patrick O'Neill <[email protected]>
I'm not sure if this is the right approach since mp_seq is a generic tool.

Signed-Off-By: Patrick O'Neill <[email protected]>
@patrick-rivos
Copy link
Contributor Author

Rebased and added some asm-level features on top.
At a high level still where it was before - generates valid assembly but the opcodes are TODO.

I do have a question about target-specific flags in generic tools:
In RISC-V: Expose sew and lmul to mp_seq.py command line I make -lmul and -sew settable on the command line but they have no meaning for the POWERPC target. Should I think of another way to expose those toggles or is this approach OK?

@rbertran
Copy link
Collaborator

Thanks @patrick-rivos for your contributions. Regarding your question, there is not a way to just show the arguments when a particular target is selected. For now, I think that best option to handle target specific command line arguments is to clearly specify the specific target in the argument description. I.e. "Selected element width for vector insns" becomes "Selected element width for vector insns (only valid for RISCV backend)" . Also, I'd group such options within another target specific group. Use cmdline.add_group to add another group with the description "RISCV Specific options" or similar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants