Skip to content

Add support for instructions and source ranges.#15368

Merged
ekpyron merged 1 commit intodevelopfrom
ethdebug_instructions_and_source_ranges
Mar 10, 2025
Merged

Add support for instructions and source ranges.#15368
ekpyron merged 1 commit intodevelopfrom
ethdebug_instructions_and_source_ranges

Conversation

@aarlt
Copy link
Contributor

@aarlt aarlt commented Aug 28, 2024

  • to reduce test-size ethdebug data will now be removed from standard-json based cmdline tests
  • simple smoke test that is just checking the basic structure of instructions
  • ethdebug related command line tests that are not using standard-json got removed - removal of ethdebug data is easier with standard-json - correct behaviour of the command line is still checked in soltest test suit

@aarlt aarlt added the ethdebug label Aug 28, 2024
@aarlt aarlt force-pushed the ethdebug_instructions_and_source_ranges branch 2 times, most recently from 82e0668 to 51b69fc Compare August 29, 2024 00:50
@argotorg argotorg deleted a comment from stackenbotten Aug 29, 2024
@aarlt aarlt force-pushed the ethdebug_instructions_and_source_ranges branch 2 times, most recently from ce130c1 to feda85c Compare August 30, 2024 12:59
@aarlt aarlt force-pushed the enable_debug_info_ethdebug branch from 93749ff to 0dd51e5 Compare August 30, 2024 14:54
@aarlt aarlt force-pushed the ethdebug_instructions_and_source_ranges branch from feda85c to fd1bb3a Compare August 30, 2024 15:06
@aarlt aarlt force-pushed the enable_debug_info_ethdebug branch from 0dd51e5 to 55503ce Compare September 9, 2024 11:28
@aarlt aarlt force-pushed the enable_debug_info_ethdebug branch 6 times, most recently from 32b21f6 to fe6a4ca Compare September 24, 2024 15:51
@aarlt aarlt force-pushed the enable_debug_info_ethdebug branch from fe6a4ca to 7917ca3 Compare September 30, 2024 17:50
@aarlt aarlt force-pushed the enable_debug_info_ethdebug branch 6 times, most recently from c885a10 to c4faae4 Compare October 7, 2024 10:56
@aarlt aarlt force-pushed the ethdebug_instructions_and_source_ranges branch from fd1bb3a to f3205fa Compare October 7, 2024 20:32
@aarlt aarlt force-pushed the enable_debug_info_ethdebug branch from c4faae4 to c9417ed Compare October 8, 2024 10:32
@aarlt aarlt force-pushed the ethdebug_instructions_and_source_ranges branch from f3205fa to 3cfe207 Compare October 8, 2024 10:51
@aarlt aarlt force-pushed the enable_debug_info_ethdebug branch 3 times, most recently from 863ffb2 to 4d01075 Compare October 15, 2024 11:29
@aarlt aarlt force-pushed the enable_debug_info_ethdebug branch from 0b702ac to 23d9607 Compare January 27, 2025 14:47
@aarlt aarlt force-pushed the ethdebug_instructions_and_source_ranges branch from df8488a to 104f6a8 Compare January 27, 2025 14:47
@aarlt aarlt requested a review from clonker January 27, 2025 15:31
@aarlt aarlt force-pushed the enable_debug_info_ethdebug branch from 23d9607 to 7ea985d Compare January 27, 2025 15:35
Base automatically changed from enable_debug_info_ethdebug to develop January 27, 2025 17:36
@aarlt aarlt force-pushed the ethdebug_instructions_and_source_ranges branch from 104f6a8 to 8ace2ba Compare January 27, 2025 17:38
instruction["arguments"] = arguments;
}

solAssert(assembly->codeSections().size() == 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can there only be one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only eof can have more

Copy link
Collaborator

@cameel cameel Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this should be solUnimplemented not an assert. Or rather, it can be an assert but only if you can assume it will never fail. And for this in all outputs exposed to the user you should have solUnimplemented checks that EOF is not enabled (note that existing outputs already have such checks).

The difference is that the user will get a nice message that this is not supported on EOF rather than an ugly ICE that he will report as a bug. On top of that, with solUnimplemented you can (and should) have a test covering that corner case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just changed that to an solUnimplementedAssert. need to add a test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.. simple test added now

@aarlt aarlt force-pushed the ethdebug_instructions_and_source_ranges branch 2 times, most recently from 1f48910 to f129ed1 Compare January 31, 2025 19:19
@clonker
Copy link
Member

clonker commented Feb 7, 2025

I am still not convinced having 28k lines of test output is a good idea. There is no real way of assessing whether the test result is correct and in case of a test expectation mismatch I imagine one can only really see that "something" changed. Is there a way to make this more concise?

@veniger
Copy link
Contributor

veniger commented Feb 10, 2025

Are there issues tracking ethdebug support? Could we create an issue for each part of the spec, so we can track what is supported and what is in development (like: source locations, local variables, etc.) @ekpyron

@ekpyron ekpyron added the 🟡 PR review label label Feb 24, 2025
@aarlt aarlt force-pushed the ethdebug_instructions_and_source_ranges branch 2 times, most recently from 46ca422 to ae4aa88 Compare February 26, 2025 23:30
Copy link
Member

@clonker clonker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First impressions :) I'll have another go at this tomorrow

clonker
clonker previously approved these changes Feb 28, 2025
Copy link
Collaborator

@cameel cameel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skimmed through the PR and found some things that looks like bugs and discrepancies with the spec. Is this tested properly?

Comment on lines +223 to +242
if grep -q "ethdebug" "$stdout_path";
then
jq --indent 4 '(. | .. | objects | select(has("ethdebug"))) |= (.ethdebug = "<ETHDEBUG DEBUG DATA REMOVED>")' "$stdout_path" > tmp && mv tmp "$stdout_path"
fi
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't assume you can create random junk files in the test dir. It may not even be writable. Use a proper temporary file or capture it in a variable.

And please don't randomly switch to 2-space indents in the middle of the file.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, why is grep needed here? If you're trying to ensure that the .ethdebug key exists, this will have false-positives. You should check that using jq.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I will use a proper termporary file for this.

however, I did that grepstuff because our pretty-json is sadly creating a different formatting as jq - so I decided to only do this reformatting if ethdebug is found in the output - I know its not very ellegant. however, maybe we should always use jq for json reformatting - but this should be a different PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant why use grep if you can check that more reliably with jq. See #15368 (comment).

I did that grepstuff because our pretty-json is sadly creating a different formatting as jq

Ah, right, I did not realize it was messing up the formatting. You should really add comments for such things. But ok, I guess, as ugly as this is, it is unavoidable.

We should at some point extract all this logic for cleaning output files into a separate function or even script. That would at least keep the runner script more compact. But not in this PR in any case.

however, maybe we should always use jq for json reformatting - but this should be a different PR.

Well, I used to not like that idea, but considering the awful regexes we need to cut out the other things, maybe this would actually be lesser evil here.

The annoying thing is that this would reformat all our tests and create a massive PR.

{
"definition",
{{
"sources",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming this is from Source range schema, shouldn't this be singular?

Suggested change
"sources",
"source",

Copy link
Collaborator

@cameel cameel Mar 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what's up with the double braces? Doesn't this create a one-element array or something?

You also have a weird double indent under "contract".

Copy link
Contributor Author

@aarlt aarlt Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with nlohmann json double braces are needed to ensure that an object is created - if not - depending on the structure - it may become an array

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... is this why you now replaced it all with assignments and manually created objects? :)

uint8_t dataRefPush = static_cast<uint8_t>(pushInstruction(bytesPerDataRef));

size_t instructionIndex = 0;
for (AssemblyItem const& item: items)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more modern version of manually incrementing the instruction index would be to use ranges::views::enumerate(items) :)

Amounts to something like for (auto const& [instructionIndex, item]: enumerate(items)) { ... }

Copy link
Contributor Author

@aarlt aarlt Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I changed that to ranges::views::enumerate but then I thought it would be better readable to introduce that addInstructionOffset lambda.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be more readable if you made it side-effect free and instead explicitly passed ret and instructionIndex into the function. The code is easier to understand when you see explicitly what goes in and out.

Comment on lines +47 to +53
/// Vector that stores bytecode offsets & instruction index per instruction.
/// offset: (first) points to the beginning of each instruction (including the opcode itself).
/// instruction index: (second) instruction index.
/// the instruction index can be used to find the related instruction, if a single "complex instruction"
/// generated multiple instructions during assembly. This is the case with e.g. AssignImmutable.
/// It generates multiple instruction from a single instruction. To be able to keep track of
/// the original instruction indices the original instruction index is stored.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of that doc can go to the struct

ekpyron
ekpyron previously approved these changes Mar 10, 2025
Copy link
Collaborator

@ekpyron ekpyron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually have a few suggestions for simplifying this in the hot path during assembling, but we can leave that for the second iteration of this. Also a few minor comments in that path, but none crucial.

Other than that, this won't affect regular compilation, so as far as I'm concerned it's good to merge as a first iteration of this.

ekpyron
ekpyron previously approved these changes Mar 10, 2025
Copy link
Collaborator

@cameel cameel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're still working on this, I decided I'd add my suggestions as comments after all rather than pushing fixups.

The biggest one is really the one about the helper. I'd really prefer to keep the Assembly changes sane, since it's already such a mess.

But overall, nothing critical. In terms of functionality the PR seems fine. Though TBH I only did a light review and I didn't really review the testing part.

Comment on lines +1288 to +1298
auto assembleInstruction = [&](auto&& _addInstruction) {
size_t start = ret.bytecode.size();
_addInstruction();
size_t end = ret.bytecode.size();
codeSectionLocation.instructionLocations.emplace_back(
LinkerObject::InstructionLocation{
.start = start,
.end = end,
.assemblyItemIndex = assemblyItemIndex
}
);
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the helper, I had something like this in mind:

Suggested change
auto assembleInstruction = [&](auto&& _addInstruction) {
size_t start = ret.bytecode.size();
_addInstruction();
size_t end = ret.bytecode.size();
codeSectionLocation.instructionLocations.emplace_back(
LinkerObject::InstructionLocation{
.start = start,
.end = end,
.assemblyItemIndex = assemblyItemIndex
}
);
};
auto assembleInstruction = [](LinkerObject const& _linkerObject, Instruction _opcode, bytes const& _immediates, size_t assemblyItemIndex) {
_linkerObject.bytecode += static_cast<uint8_t>(_opcode);
_linkerObject.bytecode += _immediates;
_linkerObject.codeSectionLocations[0].instructionLocations.push_back(
LinkerObject::InstructionLocation{
.start = ret.bytecode.size(),
.end = ret.bytecode.size() 1 + _immediates.size(),
.assemblyItemIndex = assemblyItemIndex
}
);
_linkerObject.codeSectionLocations[0].end += 1 + _immediates.size();
};

I.e. not depending on any external state, just arguments and clearly separating the bytecode from other stuff.

I'd not even make it a lambda - we'll need it in assembleEOF() later, so it should be a normal function, maybe even a class member.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd even consider turning it into a class doing what it does currently as side-effect :-). But I'd say we can also wait until we inevitably port it to assemblyEOF to nicen it up.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it would be good to at least make it take bytecode, not a callback.

But yeah, it's not like it's a blocker here. It would be very nice to have, but I approved already because at this point these are not things critical to the functionality.

Copy link
Collaborator

@ekpyron ekpyron Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But yeah, for the record: this is how I'd have done this: develop...ethdebug_instructions_and_source_ranges_refactor
Which includes doing it for EOF (which is just two lines in that version).
Just to keep anything ethdebug-related that's non-critical as separate from the rest as possible (I'd move the helper classes out of the file).
The only uglyiness there is the AssignImmutable case, which unfortunately is irregular compared to all other cases, but it concentrates that uglyness there. But yeah, we can see if we do something like that or some other solution after the release, maybe there's something even better.

cameel
cameel previously approved these changes Mar 10, 2025
Copy link
Collaborator

@cameel cameel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, since all of this is non-critical I'm going to approve already. But it would still be nice if you managed to address these previous comments.

ekpyron
ekpyron previously approved these changes Mar 10, 2025
Copy link
Collaborator

@ekpyron ekpyron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also reapproving regardless of the remaining comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants