Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ssz: byte type and canonical JSON mapping #3506

Merged
merged 4 commits into from
Jan 11, 2024

Conversation

arnetheduck
Copy link
Contributor

@arnetheduck arnetheduck commented Sep 19, 2023

This PR introduces a new byte type equivalent in all aspects to uint8 except that it has additional intent and display semantics attached.

On top of this, the PR adds a canonical JSON mapping to the SSZ specification, documenting current usage of JSON in tests, API:s and simplifying future interop work between clients and adjacent specifications such as the Beacon API. The encoding is appropriate to use with YAML as well.

As an important property, this mapping contains a 1:1 mapping of SSZ type to JSON encoding - this allows round-tripping any object between JSON and SSZ based on the SSZ schema and usage of the core SSZ types alone.

The encoding presented in this PR is used in tests and API:s with one exception: the ParticipationFlags type from the Altair spec - it is recommended we switch encoding in tests and eventually the beacon API to address this irregularity, so as to avoid a proliferation "special" primitive types in the SSZ spec that only appear in particular schemas (and thus making validating general-purpose SSZ/JSON parsers more complex) as well as differences in encoding between fields of the same SSZ type.

Edit: the change in encoding has been removed from this PR (could be done separately down the line if desired)

The PR also clarifies that the introduction of new aliases does not lead to changes in their canonical JSON specification - this allows building general SSZ/JSON libraries that do not further depend on open-ended knowledge about aliases.

This PR should be seen as an alternative to
#2983.

This PR introduces a new `byte` type equivalent in all aspects to
`uint8` except that it has additional intent and display semantics
attached.

On top of this, the PR adds a canonical JSON mapping to the SSZ
specification, documenting current usage of JSON in tests, API:s and
simplifying future interop work between clients and adjacent
specifications such as the Beacon API. The encoding is appropriate to
use with YAML as well.

As an important property, this mapping contains a 1:1 mapping of SSZ
type to JSON encoding - this allows round-tripping any object between
JSON and SSZ based on the SSZ schema and usage of the core SSZ types
alone.

The encoding presented in this PR is used in tests and API:s with one
exception: the `ParticipationFlags` type from the Altair spec - it is
recommended we switch encoding in tests and eventually the beacon API to
address this irregularity, so as to avoid a proliferation "special"
primitive types in the SSZ spec that only appear in particular schemas
(and thus making validating general-purpose `SSZ/JSON` parsers more
complex) as well as differences in encoding between fields of the same
SSZ type.

The PR also clarifies that the introduction of new aliases does not lead
to changes in their canonical JSON specification - this allows building
general SSZ/JSON libraries that do not further depend on open-ended
knowledge about aliases.

This PR should be seen as an alternative to
ethereum#2983.
@arnetheduck
Copy link
Contributor Author

arnetheduck commented Sep 19, 2023

cc @ajsutton @dapplion @mcdee

Comment on lines 288 to 298
Integers are encoded as strings to avoid loss of precision in 64-bit values.

Aliases are encoded as their underlying type.

`hex-byte-string` is a `0x`-prefixed hex encoding of byte data, as it would appear in an SSZ stream.

`List` and `Vector` of `byte` (and aliases thereof) are encoded as `hex-byte-string`. `Bitlist` and `Bitvector` similarly map their SSZ-byte encodings to a `hex-byte-string`.

`Union` is encoded as an object with a `selector` and `data` field, where the contents of `data` change according to the selector.

> This encoding is used in [beacon-APIs](https://github.com/ethereum/beacon-APIs) with one exception: the `ParticipationFlags` type for the `getStateV2` response, although it is an alias of `uint8`, is encoded as a list of numbers. Future versions of the beacon API may address this incompatibility.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it not previously agreed to encode uint8, uint16, uint32 as numbers (because they fit in the JS size), and only encode uint64 as string? I believe that's why ParticipationFlags is the way it is.
AFAIK the specs do not encode uint8, uint16, uint32 anywhere else, so changing it wouldn't be too difficult, but I still prefer to avoid unnecessary number stringification.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was it not previously agreed to encode uint8, uint16, uint32 as numbers

no idea and no strong opinion either way.

AFAIK the specs do not encode uint8

https://github.com/ethereum/consensus-specs/blob/dev/specs/deneb/beacon-chain.md#blob is the only (other) offender I can find that potentially also should be byte-ified.

I believe that's why ParticipationFlags is the way it is.

I'd pin it on lack of interest since the two types were the same in this repo ;) ie the python code liberally uses uint8 where byte is intended for simplicity since the two are after all... the same. Flags are not numbers in most CS-inspired interpretations of numbers that I can think of so byte it becomes. We'll have to deal with the fallout at some point maybe, but this way we start with a clean slate at spec level.

@hwwhww hwwhww added scope:SSZ Simple Serialize Altair aka HF1 labels Sep 21, 2023
specs/altair/beacon-chain.md Outdated Show resolved Hide resolved
@mcdee
Copy link
Contributor

mcdee commented Sep 21, 2023

Was it not previously agreed to encode uint8, uint16, uint32 as numbers (because they fit in the JS size), and only encode uint64 as string?

No, all integers are meant to be encoded as quoted decimal values.

I'm fine with the change, except retrospectively altering the spec which will have downstream effects and I don't see bringing any value.

@dapplion
Copy link
Member

I'm fine with the change, except retrospectively altering the spec which will have downstream effects and I don't see bringing any value.

It's very useful to have a canonical representation of SSZ as JSON/YAML. All spec tests render participation flags as byte strings, so naming adjustment aligns with current usage.

@mcdee
Copy link
Contributor

mcdee commented Sep 25, 2023

All spec tests render participation flags as byte strings, so naming adjustment aligns with current usage.

I don't see this in values within the consensus spec tests repo, for example:

grep participation tests/mainnet/altair/ssz_static/BeaconState/ssz_random/case_0/value.yaml
previous_epoch_participation: [93, 102]
current_epoch_participation: [126, 191, 230]

And querying /eth/v2/debug/beacon/states/finalized on beacon nodes also returns lists of (quoted) integers.

This can be done separately in the future, if wanted
Copy link
Contributor

@djrtwo djrtwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No fundamental opposition from be, barring any significant downstream affects in tools, apis, etc.

@dapplion
Copy link
Member

dapplion commented Nov 8, 2023

barring any significant downstream affects in tools, apis, etc.

PR's content just document existing practices so should not cause any practical change

ssz/simple-serialize.md Outdated Show resolved Hide resolved
@rolfyone
Copy link
Contributor

rolfyone commented Nov 9, 2023

No fundamental opposition from be, barring any significant downstream affects in tools, apis, etc.

This exactly. I don't have a strong objection with the assumption its not going to need a ton of downstream changes.

@dapplion
Copy link
Member

dapplion commented Jan 9, 2024

To further motivate this PR

@djrtwo
Copy link
Contributor

djrtwo commented Jan 11, 2024

to be merged after ACDC 2024/1/11

@djrtwo djrtwo merged commit 9f533cf into ethereum:dev Jan 11, 2024
13 checks passed
@djrtwo djrtwo mentioned this pull request Jan 16, 2024
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Altair aka HF1 scope:SSZ Simple Serialize
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants