-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple Subroutines for the EVM #2315
Comments
After EIP615 died, here is another chance to improve this archaic-ish VM architecture. I myself wholeheartedly wish this can be accepted. I work on compilers and virtual machines. Here is my personal evaluation:
I also wish |
This is one of the reasons EIP-615 run into issues.
I'm really in favor of having multi-byte instructions, but that requires EVM versioning. An example why doing it without versioning is risky is demonstrated here: https://ethereum-magicians.org/t/eip-663-unlimited-swap-and-dup-instructions/3346/11 |
@axic That specific problem exists already because we have variable opcodes ( And what is the best way to avoid that? Make dynamic jumps obsolete -- that was in the EIP615 proposal! I also think EVM versioning will happen sooner or later, unless we want to forever keep EVM as is. |
As you say that opcode is there since the beginning, which means every VM and tool is prepared for it. |
Also, the PUSH opcodes specify how much immediate data to push, whereas BEGINSUB needs a variable amount of immediate data because different subroutines can take different numbers of arguments. |
Edited for longer names, and to give |
Nope, can't use RETURNSUB semantics without BEGINSUB. Sigh. Upside is you can write functions that take and return a variable number of arguments. |
I think only the 2-byte jump target address as immediate operand is good enough, or we can have an additional "number of arguments pushed on stack" byte in Having static subroutine jumps is big deal, the good thing is that if we do it now, it will not break the existing convention. |
So another opcode for tagging the target address of subroutines. This will deliberately disallow previous internal calling convention (using We know that we definitely don't want this kinds of backward compatibilities, but I feel that it will create a little bit of inconsistency. |
@gcolvin can you clarify a few things? |
@lialan Thanks. |
@MrChico Thanks. |
The PR mentioned two comments above is now #2348 |
I'm confused. The issue still refers to
and your claim that
If this was a PR instead of an issue it would be easier to review and track changes to this proposal |
You are right on all counts @MrChico. I thought I had removed the SP reference, but it was still there--it's gone now. My comment on GOSUB was wrong, it's only RETSUB that leaves SP unchanged--I think the semantics section is right. And a PR is needed, but earning a living is taking priority, and editing an issue is easier than dealing with Git. I appreciate your patience. |
There is some inconsistency between the test case and the spec in EIP-2315. Particularly, The spec says that BEGINSUB is not supposed to be executed and its execution will cause error. JUMPSUB will land on the next instruction after BEGINSUB. However, the test case (and the current open-ethereum implementation) assumes BEGINSUB can be executed as a noop. JUMPSUB will land on the BEGINSUB instead of the next. Note that I believe the spec makes more sense from the security perspective. It prevents unintended control flow behavior in EVM crossing routine boundaries. |
Can you please comment this on https://ethereum-magicians.org/t/eip-2315-simple-subroutines-for-the-evm/3941 which is the discussion url? (This issue here should be closed.) |
Reopened as working draft. Will push to EIP repo at stable points. I'm coming to see that more can be achieved by way of safety and ease of static analysis with this proposal than I even realized, and am working it out here. |
A good summary of the history of subroutines, from Lovelace and Turing onward and the subroutine facilities provided by many CPUs including the following:
|
eip: 2315
title: Simple Subroutines for the EVM
description: Introduces two opcodes to support simple subroutines
status: Draft
type: Standards Track
category: Core
author: Greg Colvin (@gcolvin), Greg Colvin [email protected], Martin Holst Swende (@holiman), Brooklyn Zelenka (@expede)
discussions-to: https://ethereum-magicians.org/t/eip-2315-simple-subroutines-for-the-evm/3941
created: 2019-10-17
requires: 3540, 3670, 3779, 4200
Abstract
This proposal introduces two opcodes to support simple subroutines:
RJUMPSUB
andRETURNSUB
.Taken together with other recent propoposals it provides an efficient, static, and safe control-flow facility.
Motivation
The EVM does not provide subroutines as primitives.
Instead, calls can be synthesized by fetching and pushing the return address and subroutine address on the data stack and executing a
JUMP
to the subroutine; returns can be synthesized by getting the return address to the top of the stack and jumping back to it. These conventions cost gas and use stack slots unnecessarily. And they create unnecessary complexity that is borne by the humans and programs writing, reading, and analyzing EVM code.Subroutines
As an alternative, we propose to provide for subroutines with two simple operations:
RJUMPSUB
to call a subroutine andRETURNSUB
to return from it.Taken together with other required propoposals they provide an efficient, static, complete, and safe control-flow facility.
Efficient. Substantial reductions in the complexity and the gas costs of calling and optimizing simple subroutines -- as much as 56% savings in gas in the analysis below. Substantial performance advantages for static jumps are reported elsewhere.
Static. All possible jumps are known at contract creation time.
Complete. Together with EIP-4200 it provides a minimal set of control structures for implementing programs -- jumps, conditional jumps, and subroutines.
Safe. Valid programs will not halt with an exception unless they run out of gas or recursively overflow stack.
Prior Art
Facilities to directly support subroutines are provided by all but one of the real and virtual machines we have programmed. This includes physical machines like the Burroughs 5000, CDC 7600, IBM 360, DEC PDP 11 and VAX, Motorola 68000, Sun SPARC, ARM, and a few generations of Intel x86; as well as virtual machines for Lisp, Pascal, Java, Wasm, and the sole exception -- the EVM. The details and complexity vary, but in whatever form these facilities provide for
Over the years, for the machines we have programmed, native subroutine operations have proven their value for efficient implementation and simple coding of subroutines.
Original Design
The first specification of subroutines for a real machine goes back to Turing's Automatic Computing Engine of 1946:
Turings's machine held data in delay lines made of mercury-filled crystals. We have better hardware now, but the concept is simple and by now familiar. In less archaic language, Turing's full specification is that:
This design still sees use by virtual machines for Lisp, Forth, Java, Wasm and others.
Specification
We propose to follow Turing's original specification, as applied to the EVM architecture. We support calls and returns with a
return stack
of return addresses. The EVMreturn stack
, like the EVMdata stack
, is limited to1024
items. Thisreturn stack
is accessed via two new subroutine instructions.Instructions
RJUMPSUB (0x5e) jmpdest
The cost is low.
RETURNSUB (0x5f)
Notes:
PC
to be executed is beyond the last instruction then the opcode is implicitly aSTOP
, which is not an error.return stack
do not need to be validated, since they are alterable only byRJUMPSUB
andRETURNSUB
.return stack
. But the actual state of thereturn stack
is not observable by EVM code or consensus-critical to the protocol. (For example, a node implementer may codeRJUMPSUB
to unobservably pushPC
on thereturn stack
rather thanPC + 1
, which is allowed so long asRETURNSUB
observably returns control to thePC + 1
location.)return stack
is the functional equivalent of Turing's "delay line".The low cost of
RJUMPSUB
versusJUMP
is justified by needing only to add the immediate two byte destination to thePC
and push the return address on thereturn stack
, all using native arithmetric, versus using the data stack with emulated 256-bit instructions.The verylow cost of
RETURNSUB
is justified by needing only to pop thereturn stack
into thePC
. Benchmarking will be needed to tell if the costs are well-balanced.Safety
We define safety here as avoiding exceptional halting states, as defined in the Yellow Paper. A validator can always detect three of these states at validation time:
A validator can detect stack overflow only for non-recursive programs, so two states will still require tests at runtime:
EIP-3779: Safer Control Flow for the EVM specifies initialization-time validation to detect invalid contracts.
Rationale
Below we explore the advantages of have a subroutines as primitives, design alternatives, and the reasons for our choice of specification.
Efficiency Analysis
We show here how subroutine instructions can be used to reduce the complexity and gas costs of both ordinary and optimized subroutine calls compared to using
JUMP
.Simple Subroutine Call
Consider this example of calling a fairly minimal subroutine.
Subroutine call, using
JUMP
:Total: 50 gas
Subroutine call, using
RJUMPSUB
:Total 22 gas.
Using
RJUMPSUB
versusJUMP
saves 50 - 22 = 28 gas -- a 56% improvement.Tail Call Optimization
Of course in cases like this one we can optimize the tail call, so that the return from
SQUARE
actually returns fromTEST_SQUARE
.Tail call optimization, using
JUMP
:Total: 33 gas
Tail call optimization, using
RJUMP
andRETURNSUB
:Total: 17 gas
Using
RJUMPSUB
versusJUMP
saves 33 - 17 = 16 gas -- a 48% improvement.Call Using Data Stack
We can also consider an alternative call mechanism -- call it
DATASUB
-- that pushes its return address on thedata_stack
:Total 26 gas.
Using
DATASUB
versusJUMP
saves 50 - 26 = 24 gas -- a 48% improvement.By comparison, the
RJUMPSUB
version saves 28 gas -- a 56% improvement.Note: The gas cost for
DATASUB
is 6 here versus 5 forRJUMPSUB
because using the wide-integerdata_stack
is less efficient than using a stack of native integers.Conlusion
We can see that these instructions provide a simpler and more gas-efficient subroutine mechanism than using
JUMP
.Clearly, the benefits of these efficiencies are greater for programs that have been factored into smaller subroutines, but a routine could use 200 more gas than our first example and
RJUMPSUB
would still use better than 10% less gas thanJUMP
.Note: A stack rotation operator to move items on the stack and implicitly shift the intervening items could simplify code using
JUMP
. It would be a potentionally expensive operation with a dynamic gas cost.Design Alternatives
There are at least two designs for a subroutine facility.
Turing's design keeps return addresses on a dedicated stack.
As specified above, the instruction to call a subroutine will
And the instruction to return from a subroutine will
We have chosen Turing's design, as have Forth, Java, Wasm, and others. On a stack machine almost all computation is done on the stack, and on these machines the return information is not conveniently
For these machines, an advantage of keeping a separate return stack is that it leaves to the implementation the representation of the data stack, which can remain opaque to the user. All that matters is that calls and returns work.
An alternative design is to keep return addresses on the
data stack
.The instruction to call a subroutine will:
data stack
.The instruction to return from a subroutine will:
data stack
.This design became popular on silicon in the 1970's and remains so. Examples include the PDP 11, VAX, M68000, SPARC, and x86. These are all register machines, not stack machines. The registers are used for computation, and the stack is used by subroutine calls for return addresses and call frames.
For all of these machines instruction addresses fit efficiently into machine words on the
data stack
.We give an example of this alternative design above.
We prefer Turing's design for a few reasons.
data stack
slot for the return address,Keeping code simple is good. And keeping control flow opaque has clear safety advantages -- the state of the VM -- the stack, frame, and instruction pointers -- is not made visible or mutable as data.
Finally, given that the EVM is a stack machine with very wide words, the performance advantages and the decades of successful use of Turing's design by similar machines weighed heavily in our decision.
Backwards Compatibility
These changes affect the semantics of existing EVM code: bytes that would have been interpreted as valid jump destinations may now be interpreted as immediate data. Since this proposal depends on the Ethereum Object Format to signal the change this is not a practical issue.
Test Cases
Simple routine
This should jump into a subroutine, back out and stop.
Bytecode:
0x60045e005b5d
(PUSH1 0x04, JUMPSUB, STOP, JUMPDEST, RETURNSUB
)Output: 0x
Consumed gas:
10
Two levels of subroutines
This should execute fine, going into one two depths of subroutines
Bytecode:
0x 00000000000000c5e005b60115e5d5b5d
(PUSH9 0x00000000000000000c, JUMPSUB, STOP, JUMPDEST, PUSH1 0x11, JUMPSUB, RETURNSUB, JUMPDEST, RETURNSUB
)Consumed gas:
20
Failure 1: invalid jump
This should fail, since the given location is outside of the code-range. The code is the same as previous example,
except that the pushed location is
0x01000000000000000c
instead of0x0c
.Bytecode: (
PUSH9 0x01000000000000000c, JUMPSUB,
0x6801000000000000000c5e005b60115e5d5b5d, STOP, JUMPDEST, PUSH1 0x11, JUMPSUB, RETURNSUB, JUMPDEST, RETURNSUB
)Failure 2: shallow
return stack
This should fail at first opcode, due to shallow
return_stack
Bytecode:
0x5d5858
(RETURNSUB, PC, PC
)Subroutine at end of code
In this example. the JUMPSUB is on the last byte of code. When the subroutine returns, it should hit the 'virtual stop' after the bytecode, and not exit with error
Bytecode:
0x6005565b5d5b60035e
(PUSH1 0x05, JUMP, JUMPDEST, RETURNSUB, JUMPDEST, PUSH1 0x03, JUMPSUB
)Consumed gas:
30
Security Considerations
These changes introduce new flow control instructions. They do not introduce any new security considerations. In concert with EIP-3779: Safer Control Flow for the EVM they will increase security by providing for validated control flow.
References
A.M. Turing, "Proposals for the development in the Mathematics Division of an Automatic Computing Engine (ACE)." Report E882, Executive Committee, NPL February 1946. Available: http://www.alanturing.net/turing_archive/archive/p/p01/P01-001.htm
B.E. Carpenter, R.W. Doran, "The other Turing machine."
The Computer Journal, Volume 20, Issue 3, January 1977. Available: https://doi.org/10.1093/comjnl/20.3.269
Copyright
Copyright and related rights waived via CC0.
The text was updated successfully, but these errors were encountered: