-
-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Acorn-based parsing #1820
Acorn-based parsing #1820
Conversation
BenchmarkDotNet v0.13.12, Windows 10 (10.0.19045.4170/22H2/2022Update) Before (Esprima)
** After (Acornima) **
|
799eb8a
to
1639545
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes are surprisingly small, this is great!
Directory.Packages.props
Outdated
@@ -4,9 +4,9 @@ | |||
<CentralPackageTransitivePinningEnabled>false</CentralPackageTransitivePinningEnabled> | |||
</PropertyGroup> | |||
<ItemGroup> | |||
<PackageVersion Include="Acornima" Version="0.9.1" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahah! nobody will install Jint now because "it's not a major version, blablabla" ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've got a point there, probably we can't expect them to read the release notes. 😄 Luckily, though, this can be easily solved. I plan to reach the v1.0 milestone in the upcoming weeks.
@@ -1,4 +1,4 @@ | |||
using Esprima; | |||
using Acornima; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would have been easier to just rename Acornina to Esprima ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely, just the name wouldn't really reflect the content then... 😄
Jint/Engine.Ast.cs
Outdated
@@ -22,12 +28,13 @@ public partial class Engine | |||
public static Script PrepareScript(string script, string? source = null, bool strict = false) | |||
{ | |||
var astAnalyzer = new AstAnalyzer(new ScriptPreparationOptions()); | |||
var options = ParserOptions.Default with | |||
var options = s_defaultParserOptions with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this name so ugly? Or do you fancy this style?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it came from Jint's .editorconfig:
Line 114 in 23a2a25
# Static fields are camelCase and start with s_ |
From what I see when looking around the codebase, this rule is not really followed. :) So, I'll change it to be consistent with the rest.
Would this give Jint better error messages with line and column information? |
The current parser, Esprima already provides this information (at least for syntax errors). So, if it's not reported, then it's because of Jint... As for the new parser, I paid careful attention to error reporting. It pretty closely replicates V8's error reporting behavior, plus it detects a lot of syntax errors which Esprima is currently unable to, so it would be nice if Jint were propagate those errors properly. I may look into this in a future PR - if this one gets accepted, of course. :) |
Awesome work @adams85 as always! We are very interested in this option and would offer better long time support as acorn-based solution is better maintained w.r.t. features. Another question indeed is when we could make the switch, probably would require v4 and v3 only took seven years to "complete" 😉 |
Jint has extensive test cases for error reporting on line/column level. There are still a lot of type errors thrown with lacking exception messages, but we take pull requests from people who need them to be improved. |
This comment was marked as resolved.
This comment was marked as resolved.
❤️
Even seven years sounds better than never... 😄 It would be nice to merge this PR, so V4 features can be created based on this, but it's nearly irrelevant to me when it gets released. So, absolutely no hurry. I'd just like to avoid constantly rebasing this one... So it would definitely be nice to find a clever solution to reduce friction with other open PRs. Two options come to my mind:
Option 1 (or some combination of the two) seems more flexible to me. But this is up to you.
AFAICS, this is done deal :D I like the idea very much. Probably we can improve this even further. What about (temporary) keeping the original names ( Also, we can add further using aliases:
|
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
508bac9
to
e265dcc
Compare
Phew, this was no fun but finally I managed to rebase the PR. It's much much cleaner this way, no doubt about it. So it was worth it in the end :) However, while doing this, I spotted something pretty problematic. Acornima's tolerant parsing mode differs from Esprima's, plus the latter has tolerant parsing turned on while the former has it off by default. Jint used the defaults of Esprima, so I enabled tolerant mode for internal parsing (which I don't think is a good default - but that's another story). The main problem here is that obviously I can't do anything about pre-parsed scripts - which is a pretty sneaky BC... On top of this, things are really starting to fall apart when Jint needs to parse some code dynamically (e.g. some eval code). At that point we don't have the information whether the containing script was parsed in tolerant or non-tolerant mode. So far Jint's just used the default parser settings. Because of the differences in the tolerant mode behavior I could make tests pass only if I disabled tolerant mode. So, it kinda works but meh... This is, of course, not something Acornima-specific. The current Esprima version has this problem as well. (For example, end user passes a pre-parsed script to Jint which has been parsed in non-tolerant mode, but Jint will parse dynamic parts in tolerant mode...) With Acornima this issue gets even worse, as it has support for e.g. setting the language version compatibility (EcmaVersion) and fine tuning which experimental features are enabled (ExperimentalESFeatures). So the current situation is pretty messy. It could be improved easily if Jint handled the parsing internally in all circumstances. However, since it allows end user to pass pre-parsed scripts, this can't be solved within Jint. The only correct solution to this issue I can think of (and doesn't involve big BCs) is to modify the parser to include the What do you think? It wouldn't be a big deal to add this to Acornima, I just doesn't feel like re-introducing |
After familiarizing myself a bit more with Jint's public API, I may see another way that doesn't involve changes to the parser: The root cause of the problem is that end users are allowed to pass pre-parsed code to Jint via However, from what I understood, the way of using pre-parsed scripts with Jint is to call So, if my understanding is correct, we could get away with much smaller changes: + public readonly record struct PrepareResult<T>(T Program, ParserOptions ParserOptions) where T : Program;
- public static Script PrepareScript(string script, string? source = null, bool strict = false)
+ public static PrepareResult<Script> PrepareScript(string script, string? source = null, bool strict = false, bool tolerant = true)
- public JsValue Evaluate(Script script)
+ public JsValue Evaluate(PrepareResult<Script> script) etc. This way Jint could retain full control over what parser settings are used for parsing the code to execute and it can store the settings for later use when dynamic code, code of referenced modules, etc. need to be parsed. Can this work in your opinion? |
This was exactly my initial thought that we could enforce input to be something that PrepareScript produces 👍🏻 Traveling today so mostly short replies and tinkering as background process 🙂 |
Okay, that's enough for now, I can start working with this. Just one more question: how about changing tolerant mode to off by default? I can only speak for myself, but I'd be surprised if I fed some script to Jint which is invalid per spec and get some results instead of Of course, this would be another potentially impactful BC, but we'll have a bunch apart from this anyway... |
Probably would be OK to make rules stricter. Maybe return outside of functions is something needed for REPL and environments where people want script like if they were inside function context. |
No problem with that as it's controlled by another switch than tolerant mode. |
Had to dive much deeper into Jint's internals than I had hoped, but I think finally I was able to clean up the issues around parser options and dynamic code parsing. But there are still a few questions I'm unsure about:
One more thing worth mentioning: I included this workaround in the regexp parser, so it's not needed here any more. |
It's always good to get a new fresh pair of eyes to look at years of hacks and horror 🙂
Well you're plain evil. This is the thing with JS, there's so many ways to get things in unusable state, great work on spotting this! This whole evaluation context going missing has bitten so many times already that I'm already questioning to whole idea of having it. I think the fix is fine and the regular band-aid that is being used elsewhere too (other places don't have try-finally IIRC).
Just to avoid answering the actual question... Should Jint actually take
Ah so basically a flag for Jint whether is can even try the compilation. I see that Acornima limits to |
Yep, JS is the language of surprises, so minor glitches like this are indeed hard to spot. They align so smoothly with the JS philosophy :D
The reason why I asked is that in other cases I've only seen this pattern: var ownsContext = _engine._activeEvaluationContext is null;
_engine. _activeEvaluationContext ??= new EvaluationContext(_engine);
try
{
/* ... */
}
finally
{
if (ownsContext)
{
_engine._activeEvaluationContext = null!;
}
} This is a bit different from what I did because it kind of "joins" the evaluation context if it's present already. So, don't we actually need this also in the case in question? (Unfortunately, I can't really tell because of my limited understanding of the concept.)
Yeah, this is the other solution that I referred to as a "massive BC". At the same time this is the clean solution, so I'd prefer it too if BCs are not a concern. However, I think we should introduce this change in a subsequent PR because this one is starting to lose focus... I'll be happy to do the change after merging this.
Indeed, I missed that the .NET runtime folks fixed this here. So we can assume that the fix will be included in .NET 9. I'll update the regexp parser accordingly. |
833ba09
to
e581df5
Compare
As far as I'm concerned, the PR is complete. Also, v1.0.0 of the parser has been released. So what do you think about merging this into some v4 branch so further development can be done in that? (Probably other ongoing PRs which are to be released in v4 should be rebased on that branch too.) For example, I plan to address some minor todos and issues and I'll also want to look into error reporting because syntax errors reported by the parser are not wrapped in |
My attack would probably be as follows
|
That can certainly work as well! :)
I'll do this soon, then leave the rest up to you. ;) |
Latest benchmark result with main branch and this PR updated to Acornima v1.0.0: Before (Esprima)
** After (Acornima) **
|
I made a few other runs and the regression in the case of Jint_ParsedScript + droma(...)ase64 [21] looks like just the inaccuracy of the benchmarks:
Probably there's some GC activity behind the differences... |
Compared to where we started from, the perf. characteristics virtually haven't changed. Looks like the new parser may bring a tiny perf. improvement in cases where parsing the input takes up a relatively larger part of the execution. |
This is quite expected. While Acornima is faster, the majority of time in Jint's use case is always spent in actual script execution/interpretation (performance PRs welcome! 😉). The major improvement is to have a clear(er) path to new ECMAScript features with robust parser infrastructure. Version 3.1 is being out for a moment and some have already auto-updated like docfx without problems. I've tested against other code bases and only small tweaks needed for consumers like RavenDB. We've broken advanced APIs mostly so people don't notice.. Thanks for championing this! |
😢 |
First tears of joy given! |
I have memories of the summer I spent migrating the Esprima codebase to dotnet. |
I also have memories of lost summer vacations building/improving these things. "Unfortunately" it seems that Acorn won the battle maintenance-wise and is a much safer bet. Maybe we can be happy about the things we learned along the way, whether it's spec, performance or other things 🙂
It's an option. Esprima is a good solution which can handle many things - it's been great for Jint, it's just harder to get new features and compliance with newer features without an upstream to follow. |
No reason to be sad, we can now waste many more summer vacations on keeping Acornima up to date :D Every Esprima maintainers are invited. @lahma is already a collaborator ;) If anyone else wants to join, just drop me a message! Besides that, a big part of Esprima will live on in Acornima. Many of the awesome performance tweaks have been applied to translated Acorn implementation. A lot of add-on features, source gen, tests also come from Esprima. So best to think about Acornima as the love child of two great parents :D |
Nicely summed up, thank you! As this has been merged, |
FWIW, I put together a list of the most impactful breaking changes: https://github.com/adams85/acornima?tab=readme-ov-file#migration-from-esprimanet Most surprises are to be expected when users are feeding ASTs which were built manually, not by the parser. Not sure how typical a use case this is, just saying that some extra attention is required in such cases. Apart from that, parsing has become stricter because it's become more compliant with the spec, plus because tolerant mode has been changed to off by default. (Acornima's tolerant mode is different and not as permissive as Esprima's, so it wouldn't have made much sense to keep that default. (Users who relied on tolerant mode would mostly be broken anyway. Top level returns in scripts are still allowed. That behavior hasn't been changed.) Finally, there are some changes to be aware of in error reporting too. For one, the format of syntax errors has changed significantly. (There are some issues around converting parser syntax errors to Jint |
Because adding notes to a closed PR is definitely a best practice, some things:
|
It was unavoidable, I had to see how my new acorn-based JS parser works with Jint... 😄
It seems that pretty well: it resolves most of the parser-related issues in test262 and brings a tiny perf. improvement in overall (see benchmark results below).
So, I'm curious to know what do you think about an acorn-based Jint? Of course, absolutely no pressure, plus this would mean some potentially annoying breaking changes, so if you'd rather stick with Esprima, I'm totally fine with that. (Even then this work won't have been done completely in vain as this is a great integration test for my parser, which uncovered some minor bugs and showed me where the AST has shortcomings currently.)
However, if you'd like to do the switch, that would make my very proud and gladly do the minor adjustments which are needed to finalize this PR!