Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MetaVM: initial base class for an extensible EVM #441

Closed
wants to merge 11 commits into from
Closed

MetaVM: initial base class for an extensible EVM #441

wants to merge 11 commits into from

Conversation

pinkiebell
Copy link
Contributor

Introducing the VM.MetaVM class to support developers implementing
special logic (override specific opcode handlers) along other things
for state manipulation.

Related #410

@axic
Copy link
Member

axic commented Feb 18, 2019

I think this is somewhat conflicting/replicating #424. If it does, it would be better to join efforts. Can you have a look?

@holgerd77
Copy link
Member

@pinkiebell yes, you should definitely coordinate with @s1na. It is generally risky to start such heavy rewrites/refactoring work without (deeper) prior discussion, though this looks promising. Nevertheless this can lead to a PR being rejected at the end (due to other work conflicting/too heavy API changes/introduction of new language features which can't be supported yet/...).

@pinkiebell
Copy link
Contributor Author

@axic @holgerd77
Thanks for the pointer!
I looked at the EEI PR, I guess we can merge the functionality somehow, @s1na do you have an update of the PR status?

My primary intention with this PR is the ability to overwrite opcode handlers, in a way that is most flexible for other developers to take advantage of that. This is something the EEI does not cover.

Would be nice to get a discussion/decision about this feature, as the solEVM-enforcer depends on this functionality for example.
I will fix the remaining tests and rebase the PR, as I need to use it anyway 😁

@holgerd77
Copy link
Member

Haven't had a deeper look, but I have a first impression that this is all functionality that might be better settled directly on runCode or I am wrong here?

Especially since there is a tendency (see comment) to separate this inner VM (and maybe rather call it EVM or something) and the outer processing engine.

//cc @s1na Thoughts?

@pinkiebell
Copy link
Contributor Author

Haven't had a deeper look, but I have a first impression that this is all functionality that might be better settled directly on runCode or I am wrong here?

runCode would be quite overloaded if we add more callbacks, state manipulation functions and so on.
As a developer who worked with the ethereumjs-vm, I can say that the extensible class-based system is really easy to work with.

Especially since there is a tendency (see comment) to separate this inner VM (and maybe rather call it EVM or something) and the outer processing engine.

//cc @s1na Thoughts?

The MetaVM could be further split-off. I also want to refactor the other runCall etc functions but did not in this PR, because that would be too big to review at once.

for use-cases like this are becoming pretty straightforward:

class MyVM extends MetaVM {
  async handleLOG(runState) {
    await super.handleLOG(runState)

    // do some custom magic with runState.logs
  }

  async handleMUL(runState) {
    const x = runState.stack.pop()
    const y = runState.stack.pop()

    // custom logic
    runState.stack.push(xy)
  }

  async runNextStep (runState) {
    // I can also modify runState for each execution step
    // And run the thing...
    await super.runNextStep(runState)
    // custom logic here? :)
  }
}

Introducing the `VM.MetaVM` class to support developers implementing
special logic (override specific opcode handlers) along other things
for state manipulation.

Related #410
dist-files are now using some core-js polyfills
@s1na
Copy link
Contributor

s1na commented Feb 19, 2019

Thanks for the PR. It really helps to hear pain points from users of the library. I'm sure we can join forces to satisfy you requirements somehow. In the meanwhile, a few points come to mind:

I see the VM class as a connector for the high-level pieces (runBlockchain, stateManager, etc.) (despite the name it's not only about evm) and personally don't think it's a good place to have low-level evm-related code.

My current thinking on the design is to have an evm module which has an Interpreter class (which would encompass most of the current runCode), as well as one for the opcode handlers (opFns), a class for the environment and interface (EEI and stateManager), etc. Somewhat similar to geth. Do you think something like that could work?

@pinkiebell
Copy link
Contributor Author

Thanks for the PR. It really helps to hear pain points from users of the library. I'm sure we can join forces to satisfy you requirements somehow. In the meanwhile, a few points come to mind:

I see the VM class as a connector for the high-level pieces (runBlockchain, stateManager, etc.) (despite the name it's not only about evm) and personally don't think it's a good place to have low-level evm-related code.

My current thinking on the design is to have an evm module which has an Interpreter class (which would encompass most of the current runCode), as well as one for the opcode handlers (opFns), a class for the environment and interface (EEI and stateManager), etc. Somewhat similar to geth. Do you think something like that could work?

Sounds great and is probably the same I intended for the MetaVM, except the EEI, stateManager and the high-level connector stuff.
You can think of the MetaVM as the Interpreter class (in this PR) and I also planned to move opFns directly into MetaVM, right now I have this little prototype wrapper (https://github.com/ethereumjs/ethereumjs-vm/pull/441/files#diff-7225c1b45dd3c5509c3db9d4aa729535R268)

@holgerd77
Copy link
Member

@s1na Not sure, haven't looked too deeply into it. Is it now possible to leave out the eWASM VM implementation stuff from #431 and just extract the agnostic outer structure where both eWASM and EVM can be plugged in and use this to continue here?

@holgerd77
Copy link
Member

@s1na @pinkiebell I would agree with Sina and see the VM as a high-level connector class and rather not go up to some meta VM but down to runCode for the refactoring work.

@holgerd77
Copy link
Member

How can we proceed here? This is a great and work-intensive PR and we should spent some time to think about possible integration paths. @s1na can you identify parts of the code here which can be integrated within the structure you have got in mind? Is it possible that you do some structural preparation, e.g. by opening a PR just introducing the relevant structural updates, and then you and @pinkiebell can work together on the PR?

@holgerd77
Copy link
Member

Side notes for @pinkiebell:

  1. Sina will be on leave until early next week, so there probably won't be an imminent answer
  2. If you both are going to EthCC (Sina does, I am not), it would probably make very much sense to meet! 😄

@s1na
Copy link
Contributor

s1na commented Mar 3, 2019

Sorry for being inactive here for some time. This topic is definitely on my mind and I'm thinking about how best we can proceed here. It would be much easier if we could meet at ethcc to discuss in person, @pinkiebell let me know if you'll be there.

Regarding the VM and higher-level stuff: In the current design the library has a single entry point (VM), and all the exposed code is exported as methods of VM. This is something that I believe can be improved in the modernisation effort (probably longer-term though), by exporting a variety of classes instead of only one. This would be much more flexible and modular. I'm not sure yet what these classes could be, and any ideas are welcome.

I think we can discuss the design in #456, then use this PR as inspiration, and introduce multiple small subsequent PRs to merge the logic gradually. This is also what I've been doing with #424, I only started drafting a structure there, and have been trying to gradually merge ideas from that in smaller PRs (e.g. #442, #425). I'll start working on a rough sketch for the structure that we can use as a starting point for discussion.

@pinkiebell
Copy link
Contributor Author

Sorry for being inactive here for some time. This topic is definitely on my mind and I'm thinking about how best we can proceed here. It would be much easier if we could meet at ethcc to discuss in person, @pinkiebell let me know if you'll be there.

I'm not attending ethcc, but @johannbarbie and maybe others from leapDAO going to be there ;).

Regarding the VM and higher-level stuff: In the current design the library has a single entry point (VM), and all the exposed code is exported as methods of VM. This is something that I believe can be improved in the modernisation effort (probably longer-term though), by exporting a variety of classes instead of only one. This would be much more flexible and modular. I'm not sure yet what these classes could be, and any ideas are welcome.

Yes, if we could have something like a Memory, Stack etc classes to extend would be nice.
I think it makes sense to move that to a new 'umbrella' class though, in a sense that we don't change/introduce something new to the 'old' interface (VM).

I think we can discuss the design in #456, then use this PR as inspiration, and introduce multiple small subsequent PRs to merge the logic gradually. This is also what I've been doing with #424, I only started drafting a structure there, and have been trying to gradually merge ideas from that in smaller PRs (e.g. #442, #425). I'll start working on a rough sketch for the structure that we can use as a starting point for discussion.

👍

@holgerd77
Copy link
Member

Hi @krzkaczor if you could join the discussion here with all your experience on VM prototyping and refactoring that would be really valuable! 😃

@s1na
Copy link
Contributor

s1na commented Mar 11, 2019

I went through your code in more detail and here's how I suggest we proceed here (as a first step):

  1. Leave VM mostly untouched for now.
  2. Move MetaVM to lib/vm and rename to Interpreter (won't be a base class for VM anymore).
  3. runCode only instantiates an Interpreter and runs it and parses the results (parseVmResults). For emitting events, Interpreter should accept a stepHook function and call it in runStep.
  4. Interpreter.runStep calls a method getOpHandler (or something like execOp), which gets the handler from opFns.js instead of adding handler functions directly to the Interpreter prototype. For custom handlers users can extend the Interpreter and override this function. This leaves us somewhat more flexible for coming up with designs in future.
  5. Make preprocessValidJumps a private method of Interpreter.
  6. Refactor stack manipulation in evm #460 has changed how the stack operates in the vm, but I think after rebasing only a few minor modifications would be required.

Please feel free to discuss any of them, if you disagree.

@pinkiebell
Copy link
Contributor Author

I went through your code in more detail and here's how I suggest we proceed here (as a first step):

  1. Leave VM mostly untouched for now.
  2. Move MetaVM to lib/vm and rename to Interpreter (won't be a base class for VM anymore).
  3. runCode only instantiates an Interpreter and runs it and parses the results (parseVmResults). For emitting events, Interpreter should accept a stepHook function and call it in runStep.
  4. Interpreter.runStep calls a method getOpHandler (or something like execOp), which gets the handler from opFns.js instead of adding handler functions directly to the Interpreter prototype. For custom handlers users can extend the Interpreter and override this function. This leaves us somewhat more flexible for coming up with designs in future.
  5. Make preprocessValidJumps a private method of Interpreter.
  6. Refactor stack manipulation in evm #460 has changed how the stack operates in the vm, but I think after rebasing only a few minor modifications would be required.

Please feel free to discuss any of them, if you disagree.

Sounds good. The only thing we have to consider is the event emitting stuff like in runCall.
As of now it expects a (this = VM).emit function.
If we change the design to the Interpreter class and VM does not extend Interpreter, we run into problems.

We can bubble-up events but that is ugly by design, we can introduce a Intepreter-delegate class,
or just class VM extends Interpreter. Thoughts? 🙂

@s1na
Copy link
Contributor

s1na commented Mar 12, 2019

Hmm, good point 🤔

I think we eventually have to pass the VM object to Interpreter, e.g. for initializing runState (hopefully we can improve on this later), and runState has a _vm property which can be used to call runCall (in opFns). I agree it's not the cleanest design, but we'll iterate on this 😃

One reason I'm resisting a direct coupling between VM (maybe a better name for VM would be state execution engine), is that we soon want to integrate ewasm, which will have an interpreter of its own.

@holgerd77
Copy link
Member

If I'm not mistaken in my thinking/analysis I think it should be the other way around, the Interpreter + execution engine should be passed as an option to the outer processing engine (atm called: the VM), similar to the state manager.

If someone wants to use events/do other modifications one is passing ones own inner VM, otherwise it gets instantiated with the EVM interpreter as default. Does this make sense?

@s1na
Copy link
Contributor

s1na commented Mar 12, 2019

Maybe an Interpreter can be passed to runCode as an optional opt. If it's undefined, runCode would instantiate a new one. But the way things are currently, Interpreter.initRunState would also need access to the VM object. Maybe runCode could pass the vm to initRunState (technically pass self, since runCode is defined on VM). @holgerd77 what do you think?

I'll be thinking about how we can refactor runState.

@s1na
Copy link
Contributor

s1na commented Mar 12, 2019

Initial impression (these can be done in later PRs):

  • Limit runState to internal evm state:

    • pc, gasLeft, code, etc.
    • stack and memory objects
    • a nested results object which includes returnValue, stopped, logs, `gasRefund, etc.
  • In addition to this runState, opcode handlers should only have access to a restricted API (EEI) for fetching message, tx and block context, reading/writing state, etc. However, CALL opcodes change the message context, so the Interpreter should take care of updating the EEI before calls.

I'm now thinking whether it makes sense to have Interpreter only deal with a new object calledMessage, and move the actual opcode handling loop, stack, memory to Message. Initially and on calls, Interpreter would create a new Message object, call execute on it, and pass the results to parent message. In other words, Interpreter is for the transaction level (in this case maybe another name would be clearar), Message is for the message level. Does that make sense?

@holgerd77
Copy link
Member

This limitation of runState to just the internal EVM state makes super-much sense together with some well-defined and restricted access ways to the outer environment/processing through the EEI. Also already had some look couple of days ago what actually went inside runState over time and how it is logically clustered.

@holgerd77
Copy link
Member

Can you try some concept-grasping outline what the different tasks of the Interpreter class are respectively what they could be, to get a bit clearer picture of the different responsibilities?

I have a rough feeling that this Message concept goes in a good direction but also can't fully grasp the different module responsibilities and interplay between them, maybe we want to elaborate on this a bit more, at some point it might also be helpful to sketch something on paper.

@s1na
Copy link
Contributor

s1na commented Mar 19, 2019

Hey @pinkiebell, just wanted to clarify that the last few comments are not directly for this PR and were for the future (will move the discussion to #456). Please let me know if any problem arose, or you can't work on this in the near-term future, because my work depends on this.

@pinkiebell
Copy link
Contributor Author

Hey @pinkiebell, just wanted to clarify that the last few comments are not directly for this PR and were for the future (will move the discussion to #456). Please let me know if any problem arose, or you can't work on this in the near-term future, because my work depends on this.

@s1na
👋 Sorry, I'm loaded with other stuff, so I can't really turn-around the changes in the near-term
but I hope that some bits of this PR are at least a bit useful 🙂

@holgerd77
Copy link
Member

@pinkiebell thanks for the work already done! 😀 So is it ok if we directly copy over parts of your code? Sorry that you don't get direct attribution then. 😕

@pinkiebell
Copy link
Contributor Author

@pinkiebell thanks for the work already done! 😀 So is it ok if we directly copy over parts of your code? Sorry that you don't get direct attribution then. 😕

@holgerd77
👋No problem 🙂. One thing to note tho: The babel changes I introduced needs some more investigation because I just stumbled yesterday on a missing regenerator-runtime on a fresh project.
Just a dependency issue I guess, if you use the async/await stuff.

@s1na
Copy link
Contributor

s1na commented Mar 19, 2019

@pinkiebell as Holger said, your work is definitely valuable and I'm really grateful :) Sorry it turned out this way, as you might have noticed we're in the middle of an overhaul...

I hope I can still ping you for comments, specially for the evm debugging and state manipulation parts.

missing regenerator-runtime

Yeah I've also faced that in another library, I think adding @babel/runtime to dependencies would solve this.

@pinkiebell
Copy link
Contributor Author

I hope I can still ping you for comments, specially for the evm debugging and state manipulation parts.

👍 Of course you can 😄

@s1na s1na mentioned this pull request Apr 1, 2019
@holgerd77
Copy link
Member

Superseded by the refactoring work by @s1na in #424 and following work, will close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants