Add support for instructions and source ranges.#15368
Conversation
82e0668 to
51b69fc
Compare
ce130c1 to
feda85c
Compare
93749ff to
0dd51e5
Compare
feda85c to
fd1bb3a
Compare
0dd51e5 to
55503ce
Compare
32b21f6 to
fe6a4ca
Compare
fe6a4ca to
7917ca3
Compare
c885a10 to
c4faae4
Compare
fd1bb3a to
f3205fa
Compare
c4faae4 to
c9417ed
Compare
f3205fa to
3cfe207
Compare
863ffb2 to
4d01075
Compare
0b702ac to
23d9607
Compare
df8488a to
104f6a8
Compare
23d9607 to
7ea985d
Compare
104f6a8 to
8ace2ba
Compare
| instruction["arguments"] = arguments; | ||
| } | ||
|
|
||
| solAssert(assembly->codeSections().size() == 1); |
There was a problem hiding this comment.
only eof can have more
There was a problem hiding this comment.
But this should be solUnimplemented not an assert. Or rather, it can be an assert but only if you can assume it will never fail. And for this in all outputs exposed to the user you should have solUnimplemented checks that EOF is not enabled (note that existing outputs already have such checks).
The difference is that the user will get a nice message that this is not supported on EOF rather than an ugly ICE that he will report as a bug. On top of that, with solUnimplemented you can (and should) have a test covering that corner case.
There was a problem hiding this comment.
I just changed that to an solUnimplementedAssert. need to add a test.
There was a problem hiding this comment.
ok.. simple test added now
1f48910 to
f129ed1
Compare
|
I am still not convinced having 28k lines of test output is a good idea. There is no real way of assessing whether the test result is correct and in case of a test expectation mismatch I imagine one can only really see that "something" changed. Is there a way to make this more concise? |
|
Are there issues tracking ethdebug support? Could we create an issue for each part of the spec, so we can track what is supported and what is in development (like: source locations, local variables, etc.) @ekpyron |
46ca422 to
ae4aa88
Compare
clonker
left a comment
There was a problem hiding this comment.
First impressions :) I'll have another go at this tomorrow
cameel
left a comment
There was a problem hiding this comment.
I skimmed through the PR and found some things that looks like bugs and discrepancies with the spec. Is this tested properly?
test/cmdlineTests.sh
Outdated
| if grep -q "ethdebug" "$stdout_path"; | ||
| then | ||
| jq --indent 4 '(. | .. | objects | select(has("ethdebug"))) |= (.ethdebug = "<ETHDEBUG DEBUG DATA REMOVED>")' "$stdout_path" > tmp && mv tmp "$stdout_path" | ||
| fi |
There was a problem hiding this comment.
Don't assume you can create random junk files in the test dir. It may not even be writable. Use a proper temporary file or capture it in a variable.
And please don't randomly switch to 2-space indents in the middle of the file.
There was a problem hiding this comment.
Also, why is grep needed here? If you're trying to ensure that the .ethdebug key exists, this will have false-positives. You should check that using jq.
There was a problem hiding this comment.
ok, I will use a proper termporary file for this.
however, I did that grepstuff because our pretty-json is sadly creating a different formatting as jq - so I decided to only do this reformatting if ethdebug is found in the output - I know its not very ellegant. however, maybe we should always use jq for json reformatting - but this should be a different PR.
There was a problem hiding this comment.
I meant why use grep if you can check that more reliably with jq. See #15368 (comment).
I did that
grepstuff because our pretty-json is sadly creating a different formatting asjq
Ah, right, I did not realize it was messing up the formatting. You should really add comments for such things. But ok, I guess, as ugly as this is, it is unavoidable.
We should at some point extract all this logic for cleaning output files into a separate function or even script. That would at least keep the runner script more compact. But not in this PR in any case.
however, maybe we should always use
jqfor json reformatting - but this should be a different PR.
Well, I used to not like that idea, but considering the awful regexes we need to cut out the other things, maybe this would actually be lesser evil here.
The annoying thing is that this would reformat all our tests and create a massive PR.
| { | ||
| "definition", | ||
| {{ | ||
| "sources", |
There was a problem hiding this comment.
Assuming this is from Source range schema, shouldn't this be singular?
| "sources", | |
| "source", |
There was a problem hiding this comment.
Also, what's up with the double braces? Doesn't this create a one-element array or something?
You also have a weird double indent under "contract".
There was a problem hiding this comment.
with nlohmann json double braces are needed to ensure that an object is created - if not - depending on the structure - it may become an array
There was a problem hiding this comment.
Hmm... is this why you now replaced it all with assignments and manually created objects? :)
| uint8_t dataRefPush = static_cast<uint8_t>(pushInstruction(bytesPerDataRef)); | ||
|
|
||
| size_t instructionIndex = 0; | ||
| for (AssemblyItem const& item: items) |
There was a problem hiding this comment.
A more modern version of manually incrementing the instruction index would be to use ranges::views::enumerate(items) :)
Amounts to something like for (auto const& [instructionIndex, item]: enumerate(items)) { ... }
There was a problem hiding this comment.
At first I changed that to ranges::views::enumerate but then I thought it would be better readable to introduce that addInstructionOffset lambda.
There was a problem hiding this comment.
I think it would be more readable if you made it side-effect free and instead explicitly passed ret and instructionIndex into the function. The code is easier to understand when you see explicitly what goes in and out.
libevmasm/LinkerObject.h
Outdated
| /// Vector that stores bytecode offsets & instruction index per instruction. | ||
| /// offset: (first) points to the beginning of each instruction (including the opcode itself). | ||
| /// instruction index: (second) instruction index. | ||
| /// the instruction index can be used to find the related instruction, if a single "complex instruction" | ||
| /// generated multiple instructions during assembly. This is the case with e.g. AssignImmutable. | ||
| /// It generates multiple instruction from a single instruction. To be able to keep track of | ||
| /// the original instruction indices the original instruction index is stored. |
There was a problem hiding this comment.
Most of that doc can go to the struct
There was a problem hiding this comment.
I'd actually have a few suggestions for simplifying this in the hot path during assembling, but we can leave that for the second iteration of this. Also a few minor comments in that path, but none crucial.
Other than that, this won't affect regular compilation, so as far as I'm concerned it's good to merge as a first iteration of this.
cameel
left a comment
There was a problem hiding this comment.
Since you're still working on this, I decided I'd add my suggestions as comments after all rather than pushing fixups.
The biggest one is really the one about the helper. I'd really prefer to keep the Assembly changes sane, since it's already such a mess.
But overall, nothing critical. In terms of functionality the PR seems fine. Though TBH I only did a light review and I didn't really review the testing part.
| auto assembleInstruction = [&](auto&& _addInstruction) { | ||
| size_t start = ret.bytecode.size(); | ||
| _addInstruction(); | ||
| size_t end = ret.bytecode.size(); | ||
| codeSectionLocation.instructionLocations.emplace_back( | ||
| LinkerObject::InstructionLocation{ | ||
| .start = start, | ||
| .end = end, | ||
| .assemblyItemIndex = assemblyItemIndex | ||
| } | ||
| ); | ||
| }; |
There was a problem hiding this comment.
Regarding the helper, I had something like this in mind:
| auto assembleInstruction = [&](auto&& _addInstruction) { | |
| size_t start = ret.bytecode.size(); | |
| _addInstruction(); | |
| size_t end = ret.bytecode.size(); | |
| codeSectionLocation.instructionLocations.emplace_back( | |
| LinkerObject::InstructionLocation{ | |
| .start = start, | |
| .end = end, | |
| .assemblyItemIndex = assemblyItemIndex | |
| } | |
| ); | |
| }; | |
| auto assembleInstruction = [](LinkerObject const& _linkerObject, Instruction _opcode, bytes const& _immediates, size_t assemblyItemIndex) { | |
| _linkerObject.bytecode += static_cast<uint8_t>(_opcode); | |
| _linkerObject.bytecode += _immediates; | |
| _linkerObject.codeSectionLocations[0].instructionLocations.push_back( | |
| LinkerObject::InstructionLocation{ | |
| .start = ret.bytecode.size(), | |
| .end = ret.bytecode.size() 1 + _immediates.size(), | |
| .assemblyItemIndex = assemblyItemIndex | |
| } | |
| ); | |
| _linkerObject.codeSectionLocations[0].end += 1 + _immediates.size(); | |
| }; |
I.e. not depending on any external state, just arguments and clearly separating the bytecode from other stuff.
I'd not even make it a lambda - we'll need it in assembleEOF() later, so it should be a normal function, maybe even a class member.
There was a problem hiding this comment.
I'd even consider turning it into a class doing what it does currently as side-effect :-). But I'd say we can also wait until we inevitably port it to assemblyEOF to nicen it up.
There was a problem hiding this comment.
Well, it would be good to at least make it take bytecode, not a callback.
But yeah, it's not like it's a blocker here. It would be very nice to have, but I approved already because at this point these are not things critical to the functionality.
There was a problem hiding this comment.
But yeah, for the record: this is how I'd have done this: develop...ethdebug_instructions_and_source_ranges_refactor
Which includes doing it for EOF (which is just two lines in that version).
Just to keep anything ethdebug-related that's non-critical as separate from the rest as possible (I'd move the helper classes out of the file).
The only uglyiness there is the AssignImmutable case, which unfortunately is irregular compared to all other cases, but it concentrates that uglyness there. But yeah, we can see if we do something like that or some other solution after the release, maybe there's something even better.
cameel
left a comment
There was a problem hiding this comment.
Actually, since all of this is non-critical I'm going to approve already. But it would still be nice if you managed to address these previous comments.
ekpyron
left a comment
There was a problem hiding this comment.
Also reapproving regardless of the remaining comments.
Uh oh!
There was an error while loading. Please reload this page.