Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disassembler #77

Open
emoon opened this issue Mar 12, 2016 · 27 comments
Open

Disassembler #77

emoon opened this issue Mar 12, 2016 · 27 comments

Comments

@emoon
Copy link
Contributor

emoon commented Mar 12, 2016

Something that is very useful to have in an emulator core is the ability disassemble instructions for various reasons.

Currently r68k doesn't have one. I have implemented one (in C here) https://github.com/aquynh/capstone/blob/next/arch/M68K/M68KDisassembler.c

This was also based on Musashi but with a fair amount of bugs fixed. Also this version doesn't just do instruction printing but allow you to see which registers, addressing mode, etc is being used for an instruction.

Rewriting this code in Rust is possible for sure but a bunch of work. An alternative would be to rewrite this C code a bit and have a Rust wrapper around it so the user of r68k would only 'see' the Rust part.

Just wanted to hear your thoughts about it.

@marhel
Copy link
Owner

marhel commented Mar 12, 2016

I think an integrated disassembler/debugger would absolutely be useful!

I haven't thought much about it in the design of r68k, though, and I'm going to focus on getting the cpu part usable first, but I am led to believe that you know a little something about both disassemblers and debuggers so you are welcome to come up with some designs how that might work in/with r68k!

@emoon
Copy link
Contributor Author

emoon commented Mar 12, 2016

Sounds good :) I will try to think of something.

@marhel
Copy link
Owner

marhel commented Apr 28, 2016

If we're going to implement a disassembler in rust at some point, it would be a requirement, in my opinion to be able to QC that towards a known good implementation, much like we did the CPU. Can't imagine trying without it, in fact.

@emoon
Copy link
Contributor Author

emoon commented Apr 29, 2016

Yeah that would be good. Not really exactly sure how to do it though.

@marhel
Copy link
Owner

marhel commented Apr 29, 2016

Would it be possible to create something like libdissasembler.a based on a working program, set up a memory buffer with some bytes corresponding to some instruction, asking for a disassembly of that buffer and checking that both generate the same output?

@emoon
Copy link
Contributor Author

emoon commented Apr 29, 2016

Sure. Or actually generate a huge program from the QC tests that we already have here for valid instructions.

@emoon
Copy link
Contributor Author

emoon commented Apr 29, 2016

I can likely add capstone (slimed down to only use the m68k backend) and add a basic Rust interface for it so it can be called from QC tests. Also Capstone supports several instances which can run in parallel so that can be used to compare with.

@marhel
Copy link
Owner

marhel commented Apr 29, 2016

Yes, the optable contains useful data for the disassembler! It would be able to find the matching entry for the instruction it was looking at, but there's not enough information how to interpret the "holes" in the mask, such as X and Y, if they represent data or address registers, or something else, and also it doesn't know the addressing mode apart from the hints usually present in the function name. So more information would be needed.

Not having to use semaphores to enforce single threaded access would also be great!

@emoon
Copy link
Contributor Author

emoon commented Apr 29, 2016

True. I guess it may actually be possible to just try all combos from 1 - 65536. Now there will be a bunch of illegals in there but that would be good to validate that it all works anyway (might be bugs on both)

@emoon
Copy link
Contributor Author

emoon commented Apr 29, 2016

I can try to get a basic version of Capstone (68k disassembler part) in over the weekend and send a PR.

@marhel
Copy link
Owner

marhel commented Apr 29, 2016

Ok, I'm thinking we should do that work in a dev-branch for now, I just created the "disassembler"-branch for this.

@emoon
Copy link
Contributor Author

emoon commented Apr 29, 2016

Sure!

@marhel
Copy link
Owner

marhel commented May 3, 2016

I took a shot an an initial implementation yesterday, and got something I was not entirely unhappy with, by adding a disassembly module alongside the cpu module, but was really bugged by the fact that any trivial change there resulted in a minutes wait to recompile 12K lines of unrelated stuff (which after macro expansion seems to be more like 50K lines). I guess this is the non-incremental compilation showing its ugly head.

It made me want to rip out a few constants and other stuff to depend on, and work in an unrelated project, but I hope there's some better way.

You seem to have a much better grasp of cargo and crates than I have, so I wondered if there was some smarter way do divide stuff into crates or submodules in a way that would allow us to work on the disassembler, and let it use constants/enums/structs/traits that we've already defined without needing to recompile everything every time.

@marhel
Copy link
Owner

marhel commented May 3, 2016

Also, I could push my WIP to the disassembler branch if you want to have a peek.

@emoon
Copy link
Contributor Author

emoon commented May 3, 2016

Sure!

@marhel
Copy link
Owner

marhel commented May 3, 2016

Pushed now. I made a few constants and other stuff public in the old stuff, in order to be able to reuse it here. Also, I reused the LoggingMem to read ops out of "memory", but I guess that interface is not really useful if you are not disassembling a current r68k session with in-memory code.

Feel free to change any and all things as well, this was just to get this part going somewhere :)

@emoon
Copy link
Contributor Author

emoon commented May 3, 2016

what you could do is to split it up it to three separate crates

  1. Shared (constants/enums/etc)
  2. Emulator
  3. Disassembler

Now it would be possible to work "inside" the Disassembler crate only running cargo test this crate would still be a lib so there wouldn't be a main function to run stuff in.

In that case it's possible to add things under the example directory inside the Disassembler create and run them as cargo run --example some_example if you would like a real main one can do a create outside called disassembler_test or something that only depends on the Disassembler crate.

@marhel
Copy link
Owner

marhel commented May 3, 2016

Useful command to run/visualize the test I did write; cargo test -- disassembler --nocapture

@marhel
Copy link
Owner

marhel commented May 3, 2016

Yeah, I saw the --example param somewhere a while ago and immediately thought it would be a good match for r68k! It really should be a library crate, I guess.

@emoon
Copy link
Contributor Author

emoon commented May 3, 2016

Before release it should be a library for sure (that is the way people would use it anyway)

@emoon
Copy link
Contributor Author

emoon commented May 3, 2016

Also I'm not sure if you have push the disassembler.rs file

@marhel
Copy link
Owner

marhel commented May 3, 2016

Oops, you are right. I'll be pushed shortly!

@marhel
Copy link
Owner

marhel commented May 3, 2016

Now I pushed myself to push the missing file... ;)

@emoon
Copy link
Contributor Author

emoon commented May 3, 2016

👍

@marhel
Copy link
Owner

marhel commented Oct 1, 2016

Also got some time to update the disassembler/assembler to a state where I'm happy with the design. If you're interested, take a look at either the disassembler branch, or the new library branch. I've yet to actually use capstone, but it was quite fun to get the disassembler/assembler working in concert (anything that can be disassembled should also assemble back to the starting opcode).

The disassembler/assembler just knows a subset of the ADD opcodes at this point. Adding more of the same kind of instructions (formats) with already implemented encodings should be trivial. Other instruction formats will need new decode/encode fn support.

The assembler is quite primitive, and extremely picky about syntax at the moment - it will basically only accept exactly the syntax that the disassembler generates. The parser is also completely regex based, which is probably not that efficient (saw extreme speedups when I started compiling complex regexes once, instead of once per opcode)

@emoon
Copy link
Contributor Author

emoon commented Oct 2, 2016

Cool. I would suggest looking at https://github.com/Geal/nom for parsing in the assembler.

@marhel
Copy link
Owner

marhel commented Jan 9, 2017

Updated the assembler parser based on pest and it now parses the 10K lines of this basic interpreter in 68k assembly successfully(*), which is a big step forward, as the old regex-based parser was very limited, but the new parser accepts actual code. Note that while the parser is now good, the assembler itself still just supports a handful of opcodes.

I looked at nom, but found pest to be much more approachable.

*) Well, almost anyway. I decided to only support semicolon comments at the moment, so I edited the file slightly first, and it doesn't recognize the register lists of the movem instruction, nor the IF-statements (conditional assembly) yet. Movem needs to be supported when I get around to actually implement movem support in the disassembler/assembler - but conditional assembly is not a big priority at this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants