Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Local Eval #1515

Open
wants to merge 1 commit into
base: static_h
Choose a base branch
from
Open

Conversation

robik
Copy link
Contributor

@robik robik commented Sep 11, 2024

Summary

This PR supports local eval in StaticHermes. This is done by persistently filling Frame's Environment value.

I decided to use this approach as it seems to be most stable in the current state. In static hermes environments are serialized into bytecode (it wasn't that easy in previous hermes with ScopeDescs).

I've seen that there is an EvalDataInstruction that serializes VariableScope into bytecode, however, I could not manage to leverage it to build Environment during eval call. I could not access local variables by indices.

This is my first contribution in Hermes ByteCode/interpreter area, so I am very open to be corrected, there might be some side effects I that am not yet aware of.

NOTE: I am yet to ensure it is spec compliant, opening this PR for early verification of solution

Fixes

TODO

  • Ensure all paths work
    • Static Hermes (or at least informs it is not supported)
    • Bytecode translation (or at least informs it is not supported)
  • Check for potential pitfalls
  • Update docs

Is there anything else to be done?

Test Plan

Before:

> let f = 1; eval('f = 5'); print(f)
1
undefined

After:

> let f = 1; eval('f = 5'); print(f)
5
undefined

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Sep 11, 2024
@tmikov
Copy link
Contributor

tmikov commented Sep 12, 2024

@robik thanks for this PR! We discussed it at length in our daily meeting today. Adding such fundamental functionality is very challenging and we love to see in-depth PRs like this.

There are two main problems with this approach.

The first problem, which is shared with the debugger, is that it breaks when block scoping is enabled, because there can be multiple environments in one function. Thus storing the environment in the well known slot is not enough. We are aware of this and we need to solve this in order to enable debugging with block scoping. The solution will apply to local eval() as well. It will be some kind of metadata mapping instruction offsets to active environment.

Anyway, this problem would not block this PR, but I thought you should know the associated complexity.

The second problem is, unfortunately, much more serious. This approach only works when running from source. If you try your example by first compiling to bytecode and then running the bytecode, it will probably throw a JS error (or crash?). In any case, it won't work.

The reason is that all of the metadata mapping variables to environments, etc, is not being serialized to bytecode. It is simply kept in memory by the compiler. When we run directly from bytecode, it is gone.

Again, we are well aware of this limitation. It means, for example, that we can't debug bytecode files. Our debugger only works when running from source. We have long term plans to add serialization.

The question is, are you up to the challenge of implementing this yourself? Serializing all the necessary data structures to bytecode, then de-serializing them for eval() when needed, possibly with some caching? This would be a fundamental improvement of Hermes.

@robik
Copy link
Contributor Author

robik commented Sep 12, 2024

@robik thanks for this PR! We discussed it at length in our daily meeting today. Adding such fundamental functionality is very challenging and we love to see in-depth PRs like this.

There are two main problems with this approach.

The first problem, which is shared with the debugger, is that it breaks when block scoping is enabled, because there can be multiple environments in one function. Thus storing the environment in the well known slot is not enough. We are aware of this and we need to solve this in order to enable debugging with block scoping. The solution will apply to local eval() as well. It will be some kind of metadata mapping instruction offsets to active environment.

Oh, I missed that. Thanks for the heads up! I think I have few ideas how to tackle this. Are there any unstable areas related to block scoping? Or is it reasonably stable and just hidden behind a flag?

The second problem is, unfortunately, much more serious. This approach only works when running from source. If you try your example by first compiling to bytecode and then running the bytecode, it will probably throw a JS error (or crash?). In any case, it won't work.

Oh right, this is true.

The reason is that all of the metadata mapping variables to environments, etc, is not being serialized to bytecode. It is simply kept in memory by the compiler. When we run directly from bytecode, it is gone.

How would you go about this? Should I base my work on EvalDataInstruction or serialize environments for both eval and debugger? Are there any gotchas with IRGen/BCGen/lowering/optimizations here? If I make it all environment based, should I remove the EvalDataInstruction? Are there any plans for it from your side?

Again, we are well aware of this limitation. It means, for example, that we can't debug bytecode files. Our debugger only works when running from source. We have long term plans to add serialization.

The question is, are you up to the challenge of implementing this yourself? Serializing all the necessary data structures to bytecode, then de-serializing them for eval() when needed, possibly with some caching? This would be a fundamental improvement of Hermes.

I believe I can do this 😁. I will try to work on it next week after solving some ReactNative issues.

What about Static Hermes and Bytecode Translation? Is there anything I should be aware of here?

Should I add the fixes in this PR or as a separate one? Are there any other blockers to get it merged as a partially working solution? (I ask because I'd rather avoid a big PR with all those features)

@tmikov
Copy link
Contributor

tmikov commented Sep 14, 2024

[...breaks with block scoping...]

Oh, I missed that. Thanks for the heads up! I think I have few ideas how to tackle this. Are there any unstable areas related to block scoping? Or is it reasonably stable and just hidden behind a flag?

Block scoping is an extremely complicated change to the compiler, as it modifies some basic assumptions, like the one-scope per function, and turns scopes into regular values. It is not stable yet and has half-landed. As I mentioned, it doesn't work with the debugger, but we need to tackle that problem, as it will require changing the IR.

The reason is that all of the metadata mapping variables to environments, etc, is not being serialized to bytecode. It is simply kept in memory by the compiler. When we run directly from bytecode, it is gone.

How would you go about this? Should I base my work on EvalDataInstruction or serialize environments for both eval and debugger? Are there any gotchas with IRGen/BCGen/lowering/optimizations here? If I make it all environment based, should I remove the EvalDataInstruction? Are there any plans for it from your side?

Removing EvalDataInstruction is unlikely to be a good approach, since since it is required in order to preserve various data. I don't want to scare you, but I want to emphasize how complicated this undertaking is - it involves understanding all of the compiler data structures, the bytecode file format, how the bytecode file is generated, figuring out how serialize these structures in the file, how to de-serialize them on demand, etc. The reason we haven't done it is that it is that it is hard and likely time consuming.

@avp is working on design documentation for lazy compilation, which is closely related to this. He will probably publish it next week. I recommend waiting for that and getting closely familiar with it before proceeding or making design decisions.

Should I add the fixes in this PR or as a separate one? Are there any other blockers to get it merged as a partially working solution? (I ask because I'd rather avoid a big PR with all those features)

I wouldn't qualify the necessary changes as "fixes" - they will be pretty significant changes to the compiler and runtime, and likely multiple times larger than this PR. I suspect it will be a series of many smaller PRs. So, definitely separate. Please, treat every stacked commit as a separate PR.

We can't merge this one - it will cause people to file bugs that eval() is broken, because it will work in their dev builds, which run from source, but not in production builds.

I want to warn you again, that if you want to do this, it will likely be a lengthy and difficult process. At the end you will emerge as an expert on the Hermes compiler, but it will not be easy. There is no embarrassment and no guilt if you decide to not do this for now. It is also OK if you decide to take it slow.

(Meanwhile, I promise, we haven't forgotten your array diffs!)

@tmikov
Copy link
Contributor

tmikov commented Sep 17, 2024

Here is the documentation: https://github.com/facebook/hermes/blob/static_h/doc/LazyEvalCompilation.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed Do not delete this pull request or issue due to inactivity.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants