Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ABI for syscall and call instructions #971

Merged
merged 9 commits into from
Nov 26, 2024
Merged

ABI for syscall and call instructions #971

merged 9 commits into from
Nov 26, 2024

Conversation

Fumuran
Copy link
Contributor

@Fumuran Fumuran commented Nov 14, 2024

This PR updates the ABI of the assembly procedures, making their usage with call and syscall safe.
Corresponding issue: #685

@Fumuran
Copy link
Contributor Author

Fumuran commented Nov 14, 2024

During the refactoring I also updated the doc comments, making them consistent and uniform. I think this is a good opportunity to create some kind of format guidebook (or stylebook) to have some rules and templates of how doc comments should be formatted. It is still a matter of discussion, but here is what I used during this refactoring.

Format guidebook

A book defining the format for documentation comments and regular comments for masm procedures.

General

Entire procedure doc comment should be created using the #! pair of symbols as the commenting sign.
Doc comment for a procedure should have these blocks:

  • Procedure description.
  • Inputs and outputs.
  • Description of the values used in the "Inputs and outputs" block (optional).
  • Panic block (optional).
  • // Annotation hint (optional and will become redundant after the procedure annotations will be implemented)

Each block should be separated from the others with a blank line.

Example:

#! Sets the code of the account the transaction is being executed against.
#!
#! Inputs:  [CODE_COMMITMENT]
#! Outputs: []
#!
#! Where:
#! - CODE_COMMITMENT is the hash of the code to set.
#!
#! Panics if:
#! - this procedure is executed on the account which type differs from the `basic mutable`.
#!
#! Annotation hint: is used only with `exec`

Procedure description

Contains the general information about the purpose of the procedure and the way it works. May contain any other valuable information.

If some list is used for description, it should be formatted like so:

  • The description of the list should not have a blank like between it and the list.
  • The description of the list should have a colon at the end.
  • Depending on what kind of sentences form a list, they should start with a capital letter and end with a dot, or start with a lowercase letter and without dot at the end.
  • List should use a - symbol in case of unordered list or arabic numerals for ordered ones (for example, for the description of the execution steps).
  • Nested list should follow the same format.

Some data could be formatted as a subparagraph, in that case a blank line should be used to separate them (note: not sure about that, wouldn't it be confusing since we are using a blank line to separate different blocks?)

Example:

#! Transaction kernel program.
#!
#! This is the entry point of the transaction kernel, the program will perform the following
#! operations:
#! 1. Run the prologue to prepare the transaction's root context.
#! 2. Run all the notes' scripts.
#! 3. Run the transaction script.
#! 4. Run the epilogue to compute and validate the final state.
#!
#! See `prologue::prepare_transaction` for additional details on the VM's initial state, including 
#! the advice provider.

Inputs and outputs

Each variable could represent a single value or a sequence of four values (a Word). Variable representing a single value should be written in lowercase, and a variable for the word should be written in uppercase.

Example:

#! Inputs: [single_value, SOME_WORD]

It is strongly not recommended to use the single-letter names for variables, with just an exception for the loop indexes (i.e. i). So, for example, instead of H a proper HASH or even more expanded version like INIT_HASH should be used.

Inputs

Inputs block could contain three containers: operand stack, advice stack and advice map.
Description of the each container should be offseted with two spaces relative to the start of the Inputs word.
Each name of the container should be separated from its value by the colon (e.g. Operand stack: [value_1]).

Operand stack and advice stack should be presented as an array containing some data.
// Not sure about the next section, I have two different options
// Option 1:
If the input line exceeds 100 elements it should be broken, and the end of the line should be moved to the new line with an offset such that the first character of the first element on the second line should be directly above the first character of the first element on the first line (see the value of the FOREIGN_ACCOUNT_ID in the example in Formats section).
// Option 2:
If the input line exceeds 100 elements, an array should be formatted in a column, forming a Word on each line. Each word should be offseted with two spaces relative to the name of the container.
Example:

#! Inputs:
#!   Operand stack: []
#!   Advice stack: [
#!     account_id, 0, 0, account_nonce, 
#!     ACCOUNT_VAULT_ROOT, 
#!     ACCOUNT_STORAGE_COMMITMENT, 
#!     ACCOUNT_CODE_COMMITMENT
#!   ]

// end of the option 2
Edit: probably we could use the second option for the input stacks, and the first one for the stacks state for the comments inside the procedure.

To show that some internal value array could have dynamic length, additional brackets should be used (see the [VALUE_B] in the advice stack in the example in Formats section). (note: not sure that this is a good idea)

In case some inputs are presented on the stack only if some condition is satisfied, such inputs should be placed in the "optional" box: inside the parentheses with a question mark at the end. Opening and closing brackets should be placed on a new line with the same offset as the other inputs, and values inside the brackets should be offseted by two spaces.

Example:

#!   ...
#!   Advice stack: [
#!      NOTE_METADATA,
#!      assets_count,
#!      (
#!        block_num,
#!        BLOCK_SUB_HASH,
#!        NOTE_ROOT,
#!      )?
#!   ]
#!   ...

Advice map should be presented as a sequence of the key-value pairs in the curly brackets. Opening bracket should stay on the same line, and the closing bracket should be placed on the next line after the last key-value pair with the same offset as the Advice map phrase.
Each pair should start at the new line with additional two spaces offset relative to the start of the Advice map phrase. Pairs should be separated with comma. The same formatting rules as to the operand and advice stacks should be applied for the each key-value pair .

Outputs

Outputs should show the final state of each container, used in the inputs, except for the advice map: almost always the final state of the advice map is unimportant (since it is always the same as at the beginning of the procedure execution).

Formats

Full version

In case the values are provided not only through the operand stack, but also through any other container, the full version if the inputs should be used.
Notice that operand stack should be presented in any case, even if it is empty. Other containers should be presented only if they have some values used in the describing function.

Example:

#! Inputs:
#!   Operand stack: []
#!   Advice stack: [VALUE_A, [VALUE_B]]
#!   Advice map: {
#!     FOREIGN_ACCOUNT_ID: [[foreign_account_id, 0, 0, account_nonce], VAULT_ROOT, STORAGE_ROOT, 
#!                          CODE_ROOT],
#!     STORAGE_ROOT: [[STORAGE_SLOT_DATA]],
#!     CODE_ROOT: [num_procs, [ACCOUNT_PROCEDURE_DATA]]
#!   }
#! Outputs:
#!   Operand stack: [value]
#!   Advice stack: []

Short version

In case the values are provided only through the operand stack, a short version of the inputs and outputs should be used. In that case only Inputs and Outputs containers are used, representing the values on the operand stack.

Input values array should be offseted by one space to be inline with the output values array (see the example).

Example:

#! Inputs:  [single_value, WORD_1]
#! Outputs: [WORD_2] 

Description of the used values

If some value was used in the inputs and outputs block (and its meaning is not obvious) this value should be described.

Values description block should start with Where word with a colon at the end.
Definitions should be represented as an unordered list constructed with - symbols, without any space offset.
Each definition should start with the name of the variable followed by the is/are the phrase (note: not sure about that), after which the definition should be placed. At the end of each definition should be a dot.

Example:

#! Where:
#! - tag is the tag to be included in the note.
#! - aux is the auxiliary metadata to be included in the note.
#! - note_type is the storage type of the note.
#! - execution_hint is the note's execution hint.
#! - RECIPIENT is the recipient of the note.
#! - note_idx is the index of the crated note.

Panic block

If the describing procedure could potentially panic, a panic block should be specified.

Panic block should start with Panics if phrase with a colon at the end.
Panic cases should be represented as an unordered list constructed with - symbols, without any space offset. Definitions should start with lowercase letter, except for the cases which form the nested list (see example). Each case should end with a dot.

Example:

#! Panics if:
#! - the transaction is not being executed against a faucet.
#! - the invocation of this procedure does not originate from the native account.
#! - the asset being burned is not associated with the faucet the transaction is being executed
#!   against.
#! - the asset is not well formed.
#! - For fungible faucets:
#!   - the amount being burned is greater than the total input to the transaction.
#! - For non-fungible faucets:
#!   - the non-fungible asset being burned does not exist or was not provided as input to the
#!     transaction via a note or the accounts vault.

Annotation hint

A temporary comment showing how the procedure is used. It will help to implement the procedure annotations in future.
The hint could show how this procedures is invoked:

  • only with exec
  • only with call/syscall
  • with both exec and call
  • is not used anywhere

@Fumuran
Copy link
Contributor Author

Fumuran commented Nov 14, 2024

Regarding the procedures ABI the only thing left is to handle the auth_tx_rpo_falcon512 and end_foreign_context procedures. The point why I left them is because they don't require anything be on the stack and return nothing, so each call looks like so:

# pad the stack before call
padw padw padw padw

call.<...>

# clean the stack 
dropw dropw dropw dropw 

This looks quite strange to me, but that's how the call should be handled.
So I'm not sure does it worth it to apply ABI rules for these procedures, since we will have a lot of these redundant pads and drops.

cc @bobbinth

Copy link
Contributor

@PhilippGackstatter PhilippGackstatter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I really like the standardization and consistency that brings.

Some comments and thoughts.

Regarding the formatting book:

Description should have a colon at the end.

Did you mean "period" instead of "colon"?

Edit: probably we could use the second option for the input stacks, and the first one for the stacks state for the comments inside the procedure.

Sounds good! I would probably allow formatting according to how the elements are related to each other. What I mean is if we have a double word representing one logical value, like potentially non fungible assets in the future, or two elements that belong together as input, like a u64 split into hi and lo then it seems to me it would make senes to format them that way:

#!   Advice stack: [
#!     u64_value_hi, u64_value_lo,
#!     ACCOUNT_VAULT_ROOT,
#!     NON_FUNGIBLE_ASSET_HI, NON_FUNGIBLE_ASSET_LO
#!   ]

(not sure if double words use "hi" and "lo", but I hope you get my point).

Some data could be formatted as a subparagraph, in that case a blank line should be used to separate them (note: not sure about that, wouldn't it be confusing since we are using a blank line to separate different blocks?)

Another option would be to include the Notes: section we sometimes have in the guideline? It often states assumptions of procedures. This could contain everything that does not fit elsewhere like:

#! Requires that the account exposes:
#! - miden::contracts::wallets::basic::receive_asset procedure.

or

#! Note inputs are assumed to be as follows:
#! - target_account_id is the ID of the account for which the note is intended.

Having some structure may be beneficial for later automatic operations on the docs. I imagine we'll later have some Rust wrappers for these procedures and we'll want to document those. Having the docs be structured would be helpful so they can be automatically copied and converted. In the same vein I'm also wondering if we should perhaps try to use Markdown for these doc comments to align even more with Rust doc comments. But, as long as we have some standardization like these guidelines then we could probably write some conversion script fairly easily. In any case, consistency is king 😁️. But I guess one question is: Would having a Notes section be beneficial over standalone paragraphs?

We might want to extend the guidelines with how to document executable procedures which don't have an ABI per se, but sometimes modify the stack beyond its actual inputs (see the finalize_transaction inline comment for details).

You mostly replaced [...] with pad or nothing, and I think that's fine. There are still some occurrences of this string in the codebase, and we should probably also update those. If I understood correctly, what was previously [x, y, ...] is now [x, y] and as mentioned for the executable procedures which truncate the stack, those should use pad. Which is all to ask that we no longer need ... anywhere, right?

A meta comment about the way the guidelines are written. My assumption is that we'll add this guideline to some documentation somewhere so it can be easily found. So from time to time we might look it up to find out how to format a procedure (for less common cases). In that case, having the guidelines described as a set of fairly exhaustive examples would be great. Examples are easier to read and extrapolate from than a textual description like:

Definitions should be represented as an unordered list constructed with - symbols, without any space offset.
Each definition should start with the name of the variable followed by the is/are the phrase (note: not sure about that), after which the definition should be placed. At the end of each definition should be a dot.

My motivation here is to make the guidelines easy to parse at a glance and find the formatting rule I'm looking for. So I'm just saying: Having fairly exhaustive examples in the guidelines that cover many possible cases would be great, and you've already done that for many parts, and I would go further in that direction.

Comment on lines 109 to 121
#! Inputs: [pad(16)]
#! Outputs: [H, pad(12)]
#!
#! Where:
#! - H is the initial account hash.
export.get_initial_account_hash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we perhaps have a rule for a minimum length of a value name or alternatively just ban single-letter names for non-trivial things*? I think we should avoid those because they increase cognitive load and HASH is just more readable than H and still short enough to write easily.
Below in get_current_account_hash we're using an even longer name so INIT_HASH might be even better.
*The common exception being loop indices with names like i, and things like that, where it's fine.

Copy link
Contributor Author

@Fumuran Fumuran Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree! My initial though was to move the update of the names to another PR (since this one is already huge, and I wanted to leave this one to be a format+ABI only (but I already failed to do so :) )), but probably it is not a common problem, so I can update the names here.

Edit: I also think that we should add this to the guidebook.

Comment on lines 138 to 147
#! Stack: [value]
#! Output: [0]
#! Inputs: [value, pad(15)]
#! Outputs: [pad(16)]
#!
#! Where:
#! - value is the value to increment the nonce by.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think increment would be a more precise name here than value.

Comment on lines 239 to 252
#! Output: [H]
#! Inputs: []
#! Outputs: [H]
#!
#! Where:
#! - H is the initial account hash.
export.memory::get_init_acct_hash->get_initial_hash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Perhaps we can also use INIT_HASH here instead of H.

#! - CODE_COMMITMENT is the hash of the code to set.
#!
#! Panics if:
#! - this procedure is executed on the account which type differs from the `basic mutable`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#! - this procedure is executed on the account which type differs from the `basic mutable`.
#! - this procedure is executed on an account whose type differs from `regular mutable`.

Feels like we're not using a consistent language for account types. Sometimes it's "regular updatable" sometimes "regular mutable". Personally, I would also go with "mutable" as the pendant to "immutable" rather than "updatable".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the notation we're using in the docs here, but I'm fine with other options, I think it is more important to make types consistent rather than use some specific term.

#! Stack: []
#! Output: []
#! - All storage offsets and sizes are in bounds.
#! - All storage offsets adhere to account type specific rules.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this line actually. It's not very helpful for the reader because it doesn't state a precise condition and we're spelling out the precise conditions in the other bullet points.
The only account-type specific thing that is being checked is that slot 0 is not accessed on faucet accounts, which we already mention.

#! - acct_id is the account id.
#!
#! Annotation hint: is used only with `exec`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#! Annotation hint: is used only with `exec`
#! Annotations: exec

Maybe a shorter version of this is sufficient for now? I would exhaustively describe what it can be used with and the unmentioned variants should not be used. Not sure about the backticks. Since we're not using markdown and this is temporary anyway I think we can omit them.
For example, the following can only be call-ed or syscall-ed but not used with exec.

#! Annotations: call + syscall

#! Stack: []
#! Output: [H]
#! Inputs: []
#! Outputs: [H]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#! Outputs: [H]
#! Outputs: [INIT_HASH]

#! Stack: [index, V']
#! Output: [R', V]
#! Inputs: [index, V']
#! Outputs: [R', V]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#! Outputs: [R', V]
#! Outputs: [ROOT', VALUE]

Somewhat related, but doesn't have to be done in this PR or at all: Another thing for consistency is that we're sometimes using "root" and sometimes "commitment" for the vault's "root". There is AssetVault::commitment, but we're calling it the vault_root in many places. I think root is more prevalent right now, so renaming commitment to root might be best?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I agree, we should use one thing. We should revisit it during the work on the #858 issue.

#! - index is the index of the map where the KEY VALUE should be set.
#! - KEY is the key to set at VALUE.
#! - VALUE is the value to set at KEY.
#! - OLD_MAP_ROOT is the old map root.
#! - OLD_MAP_VALUE is the old value at KEY.
#!
#! Panics if:
#! - the index for the map is out of bounds, means >255.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#! - the index for the map is out of bounds, means >255.
#! - the index for the map is out of bounds, meaning > 255.

(ERR_FAUCET_INVALID_STORAGE_OFFSET, "Storage offset is invalid for a faucet account (0 is prohibited as it is the reserved data slot for faucets)"),
(ERR_FAUCET_INVALID_STORAGE_OFFSET, "for faucets)"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, only the line directly above error messages is parsed as the error, so we have two options:

  • Allow error messages to go beyond 100 characters,
  • or implement multi-line comment parsing in the build script. I briefly tried when I implemented it but wasn't able to come up with a regex that worked, but it surely is possible to do this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, my bad, I forgot that we parse just one line for the error message. For now I'll roll back these changes, but probably it makes sense to create a new issue for this.

@Fumuran
Copy link
Contributor Author

Fumuran commented Nov 18, 2024

@PhilippGackstatter Thank you for the detailed review!

Description should have a colon at the end.

Did you mean "period" instead of "colon"?

Sorry, I wasn't specific enough, I phrased it in a confusing way. By the "colon at the end" I meant only the list description, not the overall procedure description. I'll update this line.

Note section sounds good to me, but I'm not sure what is better in terms of standardization: if I remember correctly, we don't use a dedicated note section in miden-vm, just a new paragraph.

You mostly replaced [...] with pad or nothing, and I think that's fine. There are still some occurrences of this string in the codebase, and we should probably also update those. If I understood correctly, what was previously [x, y, ...] is now [x, y] and as mentioned for the executable procedures which truncate the stack, those should use pad. Which is all to ask that we no longer need ... anywhere, right?

Right. Previously we used the ... to show that there could be other elements on the stack, so we should not modify (or do it carefully) the rest of the stack (it was mostly referred to the call instructions). Now for the procedures which are executed this is true by default, although these procedures should not modify the rest of the stack, and for the call instructions we always should explicitly pad the stack before the call, so we show this with the pad(x), which should guarantee that we don't have anything valuable deeper on the stack.

Examples are easier to read and extrapolate from than a textual description <...> Having fairly exhaustive examples in the guidelines that cover many possible cases would be great,<...> and I would go further in that direction.

I agree, it's much easier for me to look at the example than to read the definition. We should definitely improve the current version, for now it is just a raw and relatively poorly structured set of my thoughts.

@bobbinth
Copy link
Contributor

Not a review yet, but a couple of comments:

Regarding the procedures ABI the only thing left is to handle the auth_tx_rpo_falcon512 and end_foreign_context procedures. The point why I left them is because they don't require anything be on the stack and return nothing, so each call looks like so:

# pad the stack before call
padw padw padw padw

call.<...>

# clean the stack 
dropw dropw dropw dropw 

This looks quite strange to me, but that's how the call should be handled. So I'm not sure does it worth it to apply ABI rules for these procedures, since we will have a lot of these redundant pads and drops.

As discussed offline, I would go for consistency here even if it ends up costing extra 32 cycles.

You mostly replaced [...] with pad or nothing, and I think that's fine. There are still some occurrences of this string in the codebase, and we should probably also update those. If I understood correctly, what was previously [x, y, ...] is now [x, y] and as mentioned for the executable procedures which truncate the stack, those should use pad. Which is all to ask that we no longer need ... anywhere, right?

I would make a distinction here as follows:

[a, b, c] - means that c is at the bottom of the stack (i.e., there are no elements after it).
[a, b, c, ...] - means that there may be other values on the stack that a procedure doesn't care about
                 and does not modify.

So, for something like truncate_stack, we'd use:

#! Inputs:  [pad(16)]
#! Outputs: [OUTPUT_NOTES_COMMITMENT, FINAL_ACCOUNT_HASH, tx_expiration_block_num, pad(7)]

because we expect that the depth of the stack is exactly 16 when the procedure is entered.

@Fumuran Fumuran marked this pull request as ready for review November 19, 2024 22:07
Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job! Thank you! I left some comments inline - but most are pretty small.

miden-lib/asm/kernels/transaction/api.masm Show resolved Hide resolved
miden-lib/asm/kernels/transaction/lib/epilogue.masm Outdated Show resolved Hide resolved
miden-lib/asm/note_scripts/P2ID.masm Show resolved Hide resolved
miden-lib/asm/note_scripts/P2ID.masm Outdated Show resolved Hide resolved
miden-lib/asm/note_scripts/P2IDR.masm Show resolved Hide resolved
miden-lib/asm/note_scripts/P2IDR.masm Outdated Show resolved Hide resolved
miden-lib/asm/note_scripts/SWAP.masm Outdated Show resolved Hide resolved
Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you! I left a few more comments inline. Once these are addressed, we are good to merge.

miden-lib/asm/note_scripts/P2ID.masm Outdated Show resolved Hide resolved
miden-lib/asm/note_scripts/P2IDR.masm Outdated Show resolved Hide resolved
miden-lib/asm/note_scripts/SWAP.masm Outdated Show resolved Hide resolved
miden-lib/asm/kernels/transaction/api.masm Show resolved Hide resolved
Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good! Thank you!

@bobbinth bobbinth merged commit 1e597ca into next Nov 26, 2024
9 checks passed
@bobbinth bobbinth deleted the andrew-abi branch November 26, 2024 22:44
@bobbinth
Copy link
Contributor

cc @igamigo - changes in this PR may have small effects on the client (e.g., in how we call note scripts). Could you check if everything still works fine there?

@bobbinth
Copy link
Contributor

@Fumuran - could you extract the Format Guidebook from #971 (comment) and add it as a stand alone Markdown file in miden-lib crate?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants