Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-98831: "Generate" the interpreter #98830

Merged
merged 89 commits into from
Nov 3, 2022

Conversation

gvanrossum
Copy link
Member

@gvanrossum gvanrossum commented Oct 28, 2022

Overview

This is the first stage of a project to generate the interpreter, in particular the cases of the main switch. The references below explain the why. This PR introduces the following:

  • Two new files in the Python subdirectory, bytecodes.c and generated_cases.c.h. (UPDATE: cases.h -> generated_cases.c.h.)

    • bytecodes.c contains instruction definitions. These almost look like the old switch cases, but instead of TARGET(INAME) { ... } they have inst(INAME, stack_effect) { ... }, and the DISPATCH() macro call at the end of the block can be omitted. (The stack effect is not yet used by the generator, and it is not always correct.) At the top and bottom of bytecodes.c is some dummy C code that makes look like valid C code to editors like VS Code.
    • generated_cases.c.h is a generated file containing the switch cases. It is currently identical to the switch cases in ceval.c.
  • A new bunch of tooling, in Tools/cases_generator. This includes:

    • A temporary script, extract_cases.py, that extracts the cases from ceval.c and writes them, in the form of instruction definitions, to bytecodes.c.
    • A script, generate_cases.py, that reads the instruction definitions from bytecodes.c and regenerates the cases, writing them to generated_cases.c.h.
    • A parser for a subset of C that turns blocks of C code into an AST, so we can do transformations at the AST level. (There is currently one example of this -- the function always_exits() determines recursively whether a block is guaranteed to exit, using a custom definition of "exit" that includes macro calls like DISPATCH() and GO_TO_INSTRUCTION().)
  • Changes to ceval.c so that it uses #include "generated_cases.c.h". For now I've left the original switch cases in ceval.c, inside #else ... #endif, so that we can compare the original switch cases to the generated switch cases to validate the toolchain. However, I plan to rip these out either once the PR is (nearly) approved, or perhaps in a quick follow-up PR.

  • A new target in Makefile.pre.in to auto-generate generated_cases.c.h from bytecodes.c, using make regen-cases.

    • Currently missing is similar tooling on Windows. For now, you can only regenerate the cases on UNIX. They are checked into the repo, so Windows should still compile.

References

Metadata

This doesn't work yet, just a checkpoint. I'm going to bed now.

It's going to be a long slog.
(Don't do anything with them yet.)
(This is a checkpoint, doesn't work yet.)
- Change family syntax to `family(NAME) = {OP1, OP2, ...};`
- Some cleanups in parser.py and generate_cases.py
- Fixes to bytecodes.c to reduce red wiggles in VS Code
Copy link
Member Author

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pushing the new version in a few minutes, after merging two tough commits from main. The pushed version will once again have the cases in ceval.c, I'll remove those later (right before merging).

#define SET_TOP(v) (stack_pointer[-1] = (v))
#define GETLOCAL(i) (frame->localsplus[i])

#define inst(name, stack_effect) case name:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I added some more dummy arguments and labels, fixed the inst() macro, and I'm just copying all #includes from ceval.c...

#define GETLOCAL(i) (frame->localsplus[i])

#define inst(name, stack_effect) case name:
#define family(name) static int family_##name
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that was a last-minute change, I had a + separator there first. I'll put it inside curlies and make the macro define an unsized array.

Tools/cases_generator/parser.py Show resolved Hide resolved
Tools/cases_generator/generate_cases.py Outdated Show resolved Hide resolved
@gvanrossum
Copy link
Member Author

I have made the requested changes; please review again.

@bedevere-bot
Copy link

Thanks for making the requested changes!

@brandtbucher: please review the changes made to this pull request.

@gvanrossum
Copy link
Member Author

(Wait, I haven't pushed yet. In a minute.)

@gvanrossum
Copy link
Member Author

Now it's ready.

@gvanrossum gvanrossum added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Nov 3, 2022
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @gvanrossum for commit 7a327aa 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Nov 3, 2022
If you get a merge conflict, talk to Guido
@gvanrossum
Copy link
Member Author

I have one more commit (removing the cases from ceval.c) and then I'll merge. If you get a merge conflict after that, re-apply your changes to the instructions in ceval.c to bytecodes.c, and re-run make regen-cases.

@gvanrossum
Copy link
Member Author

All buildbots passed except the three s390 ones. I'm not going to wait for those, I'm pushing the final commit (fddc08c) and once the regular CI passes I'll merge. Hopefully before midnight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants