feat(tests): ckzg functions for peerdas tests#1614
Conversation
Blobs are not entirely opaque sequence of bytes. You can think of a blob as being 32 byte chunks where each 32 byte must represent an integer less than q, where q is the bls12-381 scalar field. The "easier" way to always have it pass is to generate a 31 byte integer and then serialize that into 32 bytes. This is because every 31 byte integer will be less than q since q is roughly a 32 byte integer.
Yeah, in the original API it was deduplicated, but then you need another index to index the deduplicated blob because the verify function can take in multiple commitments -- I found this to be more confusing though |
jtraglia
left a comment
There was a problem hiding this comment.
Hey @felix314159, nice work. I would suggest only loading the trusted setup once and removing the wrapper functions. I don't believe these really add anything, right? Ckzg will do these checks too.
tests/osaka/eip7594_peerdas/spec.py
Outdated
| # below are mine, not sure if they are defined in specs somewhere | ||
| KZG_PROOF_LENGTH = 48 | ||
| KZG_COMMITMENT_LENGTH = 48 | ||
| CELL_LENGTH = 2048 |
There was a problem hiding this comment.
There are constants for these in the specs! Please see:
These are not exposed to the bindings, but maybe they should be. I can do this later if you want.
| if int(hex_byte, 16) > 115: | ||
| # TODO: figure out why this happens | ||
| print("For some reason bytes larger than this will lead to errors later in compute_cells(), so it's not allowed for now.") # noqa: E501 | ||
| return None | ||
|
|
||
| hex_byte = hex_byte.lower() |
There was a problem hiding this comment.
Yes, this is because of BLS_MODULUS. The uint256 value of each field element must be less than this. Given that Python supports large integers, I think it would be best if this function took a seed value, then generated FIELD_ELEMENTS_PER_BLOB random field elements and joined them together to make the blob. Something like:
import random
seed = 42
random.seed(seed)
BLS_MODULUS = 0x73eda753299d7d483339d80809a1d80553bda402fffe5bfeffffffff00000001
ints = [random.randrange(BLS_MODULUS) for _ in range(FIELD_ELEMENTS_PER_BLOB)]
encoded = [i.to_bytes(BYTES_PER_FIELD_ELEMENT, "big") for i in ints]
blob = b"".join(encoded)There was a problem hiding this comment.
Agreed, great suggestion, and the seed can be an int to replace the hex_byte: str in the original function.
| expected_blob_length = 2**int(math.log2(Spec.BYTES_PER_BLOB)) | ||
| assert len(blob) == expected_blob_length, f"Expected blob of length {expected_blob_length} but got blob of length {len(blob)}" # noqa: E501 |
There was a problem hiding this comment.
This feels a strange. Why don't we just do assert len(blob) == Spec.BYTES_PER_BLOB?
| assert len(blob) == expected_blob_length, f"Expected blob of length {expected_blob_length} but got blob of length {len(blob)}" # noqa: E501 | ||
|
|
||
| # calculate commitment | ||
| ts = ckzg.load_trusted_setup("trusted_setup.txt", 0) |
There was a problem hiding this comment.
This is a slow operation. It would be best if this were only done once. Maybe in some setup function?
There was a problem hiding this comment.
We should add the trusted setup as a module constant at the beginning of the file:
from os.path import realpath
from pathlib import Path
TRUSTED_SETUP_FILE_NAME = "trusted_setup.txt"
TRUSTED_SETUP_PATH = Path(realpath(__file__)).parent / TRUSTED_SETUP_FILE_NAME
TRUSTED_SETUP = ckzg.load_trusted_setup(str(TRUSTED_SETUP_PATH), 0)| def eest_compute_cells(blob: bytes) -> list[int]: | ||
| """Take a blob and returns a list of cells that are derived from this blob.""" | ||
| ts = ckzg.load_trusted_setup("trusted_setup.txt", 0) | ||
|
|
||
| cells = ckzg.compute_cells(blob, ts) | ||
|
|
||
| assert len(cells) == 128 | ||
|
|
||
| return cells |
There was a problem hiding this comment.
Several of these wrapper functions don't appear to be very useful. Could we not just call ckzg directly?
| return commitment | ||
|
|
||
|
|
||
| def eest_compute_cells(blob: bytes) -> list[int]: |
There was a problem hiding this comment.
I believe this return type isn't correct. It doesn't return a list of ints. It returns a list of bytes.
marioevz
left a comment
There was a problem hiding this comment.
This is excellent, thanks for putting this together!
Just a couple of comments to make some parts of it more pythonic, just overall we should try to avoid to use strings when other more appropriate primitive python types can be used.
| if int(hex_byte, 16) > 115: | ||
| # TODO: figure out why this happens | ||
| print("For some reason bytes larger than this will lead to errors later in compute_cells(), so it's not allowed for now.") # noqa: E501 | ||
| return None | ||
|
|
||
| hex_byte = hex_byte.lower() |
There was a problem hiding this comment.
Agreed, great suggestion, and the seed can be an int to replace the hex_byte: str in the original function.
| class PersistentBlobGenerator: | ||
| # PersistentBlobGenerator("4a") creates blob_4a.json in cwd and contains blob, commitment, cells and proofs | ||
| encoding: str = "utf-8" | ||
|
|
There was a problem hiding this comment.
Pydantic can be of excellent help here:
from typing import List
from pydantic import BaseModel
from ethereum_test_base_types import Bytes
class Blob(BaseModel):
name: str
blob: Bytes
commitments: List[Bytes]
cells: List[Bytes]
proofs: List[Bytes]And this can be then converted to json using model_dump or model_dump_json, and the to_json and from_json methods can be removed.
| def __init__(self, singular_byte: str): | ||
| # safely derive blob from input | ||
| blob: bytes | None = generate_blob_from_hex_byte(singular_byte) | ||
| assert blob is not None, f"PersistentBlobGenerator received invalid input: {singular_byte}" | ||
|
|
||
| if "0x" in singular_byte: | ||
| singular_byte = singular_byte.replace("0x", "", 1) | ||
|
|
||
| commitment: bytes = eest_blob_to_kzg_commitment(blob) | ||
| cells, proofs = eest_compute_cells_and_kzg_proofs(blob) | ||
|
|
||
| # populate instance | ||
| self.name: str = "blob_" + singular_byte | ||
| self.blob: bytes = blob | ||
| self.commitments: list[bytes] = [commitment] * len(cells) | ||
| self.cells: list[bytes] = cells | ||
| self.proofs: list[bytes] = proofs |
There was a problem hiding this comment.
If we change this class to pydantic, the only other change is that this init function needs to be a class method instead.
@classmethod
def from_index(cls, index: int):
...
return cls(blob, proofs, ...)or similar.
| @@ -1,10 +1,6 @@ | |||
| version = 1 | |||
There was a problem hiding this comment.
Hey @felix314159, as we discussed, can you please revert the updates to all packages and only modify the required packages.
Instructions available here:
https://eest.ethereum.org/main/dev/deps_and_packaging/
d33b90d to
ccfedf8
Compare
marioevz
left a comment
There was a problem hiding this comment.
Left a lot of comments but I think these are necessary to future-proof the new code introduced 👍
Thanks!
src/ethereum_test_types/blob.py
Outdated
|
|
||
| TRUSTED_SETUP_FILE_NAME = "blob_trusted_setup.txt" | ||
| TRUSTED_SETUP_PATH = Path(realpath(__file__)).parent / TRUSTED_SETUP_FILE_NAME | ||
| TRUSTED_SETUP = ckzg.load_trusted_setup(str(TRUSTED_SETUP_PATH), 0) |
There was a problem hiding this comment.
Convert to ClassVar inside of Blob that is loaded upon first use:
class Blob(CamelModel):
...
_trusted_setup: ClassVar[Any | None] = None
...
@classmethod
def trusted_setup(cls):
...
if cls._trusted_setup is None:
# load trusted setup
return cls._trusted_setupThere was a problem hiding this comment.
Good idea but I will make this function void, no need to return the trusted setup if it is already accessible
src/ethereum_test_types/blob.py
Outdated
| return would_be_static_blob_path | ||
|
|
||
| @staticmethod | ||
| def NewBlob(fork: Fork, seed: int = 0, timestamp: int = 0) -> "Blob": # noqa: N802 |
There was a problem hiding this comment.
| def NewBlob(fork: Fork, seed: int = 0, timestamp: int = 0) -> "Blob": # noqa: N802 | |
| def from_seed(fork: Fork, seed: int = 0, timestamp: int = 0) -> "Blob": |
I think this name is a bit more accurate and follows python method naming better.
There was a problem hiding this comment.
Yes we should use snake case I agree. However, I strongly disagree calling it from_seed as the seed is just an optional parameter. You probably wanted to name it from_fork.
src/ethereum_test_types/blob.py
Outdated
| fork_str: str = fork.name().lower() | ||
|
|
||
| # if this blob already exists then load from file | ||
| blob_location: Path = Blob.get_filepath(fork_str, seed, timestamp) |
There was a problem hiding this comment.
Passing the timestamp is a good idea, but we could improve it a bit further here since I think using the timestamp to get the filename could result in too many duplicates:
| fork_str: str = fork.name().lower() | |
| # if this blob already exists then load from file | |
| blob_location: Path = Blob.get_filepath(fork_str, seed, timestamp) | |
| fork = fork.fork_at(timestamp=timestamp) # This resolves to the correct fork in fork transition tests, but it's no-op in normal forks | |
| fork_str: str = fork.name().lower() | |
| # if this blob already exists then load from file | |
| blob_location: Path = Blob.get_filepath(fork_str, seed) |
And then we just remove the timestamp parameter from get_filepath.
src/ethereum_test_types/blob.py
Outdated
| def generate_blob_data(rng_seed: int = 0) -> Bytes: | ||
| """Calculate blob data deterministically via provided seed.""" | ||
| # apply RNG seed | ||
| random.seed(rng_seed) |
There was a problem hiding this comment.
We can change this to use a local thread-safe instance:
| random.seed(rng_seed) | |
| rng = random.Random(rng_seed) |
There was a problem hiding this comment.
Very good catch, thanks
src/ethereum_test_types/blob.py
Outdated
| assert fork.engine_get_blobs_version() is not None, ( | ||
| f"You provided fork {fork.name()} but it does not support blobs!" | ||
| ) # TODO: why does mypy say '"type[BaseFork]" has no attribute "engine_get_blobs_version"'? |
There was a problem hiding this comment.
Can instead use supports_blobs here:
| assert fork.engine_get_blobs_version() is not None, ( | |
| f"You provided fork {fork.name()} but it does not support blobs!" | |
| ) # TODO: why does mypy say '"type[BaseFork]" has no attribute "engine_get_blobs_version"'? | |
| assert fork.supports_blobs(), ( | |
| f"Provided fork {fork.name()} but it does not support blobs!" | |
| ) |
src/ethereum_test_types/blob.py
Outdated
| detected_fork = parts[1] | ||
| detected_seed = parts[2] | ||
| detected_timestamp = parts[3].removesuffix(".json") | ||
| assert detected_seed.isdigit(), ( | ||
| f"Failed to extract seed from filename. Ended up with seed {detected_seed} given filename {file_path}" # noqa: E501 | ||
| ) | ||
| assert detected_timestamp.isdigit(), ( | ||
| f"Failed to extract timestamp from filename. Ended up with timestamp {detected_timestamp} given filename {file_path}" # noqa: E501 | ||
| ) | ||
| file_path = Blob.get_filepath( | ||
| detected_fork, int(detected_seed), int(detected_timestamp) | ||
| ) |
There was a problem hiding this comment.
Feels unnecessary, is it just for debugging purposes?
There was a problem hiding this comment.
If the test writer tries to import an existing blob via filename this code ensures that the name of the blob file is valid (as in matches the expected format). I would prefer to keep this, if someone tries to import an invalidly named blob this way we at least get a descriptive error message
Edit: it was removed because entire function was reworked due to suggested filename change
src/ethereum_test_types/blob.py
Outdated
| assert self.fork in ["osaka"], ( | ||
| f"verify_cell_kzg_proof_batch() is not available for fork: {self.fork}" | ||
| ) |
There was a problem hiding this comment.
| assert self.fork in ["osaka"], ( | |
| f"verify_cell_kzg_proof_batch() is not available for fork: {self.fork}" | |
| ) | |
| assert self.fork.blob_kzg_cell_proofs() > 0, ( | |
| f"verify_cell_kzg_proof_batch() is not available for fork: {self.fork}" | |
| ) |
src/ethereum_test_types/blob.py
Outdated
| assert len(self.cells) == 128, ( | ||
| f"You are supposed to pass a full cell list with 128 elements to this function, but got list of length {len(self.cells)}" # noqa: E501 | ||
| ) |
There was a problem hiding this comment.
| assert len(self.cells) == 128, ( | |
| f"You are supposed to pass a full cell list with 128 elements to this function, but got list of length {len(self.cells)}" # noqa: E501 | |
| ) | |
| assert len(self.cells) == self.fork.blob_kzg_cell_proofs(), ( | |
| f"You are supposed to pass a full cell list with {self.fork.blob_kzg_cell_proofs()} elements to this function, but got list of length {len(self.cells)}" # noqa: E501 | |
| ) |
Same for the rest of the function, it can be adapted in the same way.
There was a problem hiding this comment.
I really don't think this rename is necessary, were you getting an error somehow with the original name? If so, could you share so we can help debug pls?
There was a problem hiding this comment.
We are shadowing the stdlib "types" when we have types.py. This means any import that implictly requires stdlib types will fail with ImportError: cannot import name 'GenericAlias' from partially initialized module 'types' (most likely due to a circular import). Try the following once with types.py and then again with eest_types.py:
echo 'from enum import Enum' > ./src/ethereum_test_types/abcd.py
uv run python ./src/ethereum_test_types/abcd.py
There was a problem hiding this comment.
Good catch :D I think this is particular to using uv run python ./src/... but still I think it should be ok to finally split types.py into different files.
But I think that one deserves its own PR, wdyt?
There was a problem hiding this comment.
When we take the local cache approach, we should remove these static blob files from the branch.
There was a problem hiding this comment.
Yes, let's remove them
a7861a5 to
35ffb69
Compare
marioevz
left a comment
There was a problem hiding this comment.
This is taking shape and I think it's closer to being done! 🚀
I've left a couple more minor comments.
Thanks for putting the effort into this so far! Let's try to get it through the finish line :)
src/ethereum_test_forks/__init__.py
Outdated
| from typing import Literal | ||
|
|
There was a problem hiding this comment.
Seems unnecessary.
| from typing import Literal |
| TransactionException, | ||
| ) | ||
| from ethereum_test_types import Blob | ||
| from tests.cancun.eip4844_blobs.common import blobs_to_transaction_input |
There was a problem hiding this comment.
| from tests.cancun.eip4844_blobs.common import blobs_to_transaction_input | |
| from .common import blobs_to_transaction_input |
There was a problem hiding this comment.
I think this file got sorted at some point? Could you try to reset the file to the version in the base branch (should be main) and then just add the lines that are required for this PR?
| kzg_proof=INF_POINT, | ||
| ), | ||
| # you can provide the blob via file name | ||
| Blob.from_file("blob_cancun_0_0"), |
There was a problem hiding this comment.
We should only use the more-abstract from_fork method in tests.
There was a problem hiding this comment.
Minor nit-pick but could we rename this file to "kzg_trusted_setup.txt"?
|
|
||
| versioned_hash: Hash | ||
| name: str # blob_<fork>_<seed> | ||
| fork: str |
There was a problem hiding this comment.
| fork: str | |
| fork: Fork |
After #1686, and can also delete all str<->Fork conversions everywhere in this file.
There was a problem hiding this comment.
A side-effect of this change is that pydantic now struggles to serialize and deserialize Blob object (when we sticked to basic python types like fork: str sth like Blob.model_validate_json(json_str) and self.model_dump_json() 'just worked' but now we have to add hard-to-read logic to tell pydantic how to handle the Fork type at serialization and deserialization:
@field_validator("fork", mode="before")
@classmethod
def validate_fork(cls, v):
"""
When reading JSON file and trying to go back from fork string to fork object we must
tell pydantic how to do this.
"""
if isinstance(v, str):
return fork_string_to_object(v)
return v
@field_serializer("fork")
def serialize_fork(self, fork: Fork) -> str:
"""
When trying to serialize a Blob object into a JSON file we must
tell pydantic how to do this.
"""
return fork.name()
|
@marioevz Please look at the latest commit, I added Edit: I think everything is working now, can be reviewed & merged |
4bf0f25 to
f07306e
Compare
marioevz
left a comment
There was a problem hiding this comment.
LGTM!
Just a couple of comments but we can merge after those 👍
| kzg_commitment=INF_POINT, | ||
| kzg_proof=INF_POINT, | ||
| ), | ||
| Blob.from_fork(Cancun), |
There was a problem hiding this comment.
| Blob.from_fork(Cancun), | |
| Blob.from_fork(fork), |
We can just pass the fork here to avoid constraining the test to a hard-coded fork.
| kzg_commitment=INF_POINT, | ||
| kzg_proof=INF_POINT, | ||
| ) | ||
| Blob.from_fork(Cancun), |
There was a problem hiding this comment.
| Blob.from_fork(Cancun), | |
| Blob.from_fork(fork), |
Same here.
| kzg_commitment=INF_POINT, | ||
| kzg_proof=INF_POINT, | ||
| ) | ||
| Blob.from_fork(Cancun), |
There was a problem hiding this comment.
| Blob.from_fork(Cancun), | |
| Blob.from_fork(fork), |
|
|
||
| from ethereum_test_base_types.base_types import Hash | ||
| from ethereum_test_forks import Fork | ||
| from ethereum_test_forks.forks.forks import Cancun |
There was a problem hiding this comment.
| from ethereum_test_forks.forks.forks import Cancun |
If the rest of the comments are applied.
96fb60d to
31a541c
Compare
841673b to
329e19f
Compare
18393da to
abab88c
Compare
marioevz
left a comment
There was a problem hiding this comment.
More comments but hopefully this is one of the last iterations, thanks again for putting this together!
src/ethereum_test_rpc/rpc.py
Outdated
| shorten_errors: bool = False, | ||
| log_rlp_data: bool = False, |
There was a problem hiding this comment.
I like this idea, and I think it'll be extremely useful for consume, just have a couple of suggestions:
- Break this out from this PR into a dedicated PR that enhances RPC logging
- Instead of granular
shorten_errosandlog_rlp_data, we can tie these two to a logging level: e.g. have the RLP data only printed when the log level is Debug or higher, and print the full (shorten_errors = False) rlp or error message only on Trace debug level or higher.
src/ethereum_test_rpc/rpc.py
Outdated
| timestamp = time.time_ns() | ||
| file_name = rlp_logs_folder / f"{timestamp}_rlp_data.log" | ||
| temp_logger = logging.getLogger(f"rlp_{timestamp}") | ||
| temp_logger.setLevel(logging.ERROR) |
There was a problem hiding this comment.
I think modifying the logging level based on internal logic instead of user flags/options is confusing and could lead to printing of errors when the user did not request any of them.
It's better for it to be the other way around and adjust behavior depending on the log level.
src/ethereum_test_rpc/rpc.py
Outdated
| f"I am about to send a POST request of approximated size: {payload_size_mb:.2f} MB " | ||
| f"({payload_size_bytes} bytes)" |
There was a problem hiding this comment.
This is for the follow up logging-only PR:
| f"I am about to send a POST request of approximated size: {payload_size_mb:.2f} MB " | |
| f"({payload_size_bytes} bytes)" | |
| f"POST request of approximated size: {payload_size_mb:.2f} MB " | |
| f"({payload_size_bytes} bytes)" |
| CACHED_BLOBS_DIRECTORY: Path = ( | ||
| Path(platformdirs.user_cache_dir("ethereum-execution-spec-tests")) / "cached_blobs" | ||
| ) |
There was a problem hiding this comment.
We can simply import this constant from the blob_types module.
| CACHED_BLOBS_DIRECTORY: Path = ( | |
| Path(platformdirs.user_cache_dir("ethereum-execution-spec-tests")) / "cached_blobs" | |
| ) | |
| from ..blob_types import CACHED_BLOBS_DIRECTORY |
fa99ce5 to
74d49a4
Compare
…erwise it cant be used as input to function that requires trusted setup to be passed
|
Only thing left to do is splitting the logging changes into a separate PR, will do this tomorrow |
|
It can be merged now (surely merging a cryptography related PR on a friday afternoon is not a bad idea:) |
* added kzg functions for peerdas tests * fix: use proofs only instead of full cells * tests passing except for third one, too much gas (nethermind) or too many blobs (7vs9, geth) * implemented feedback * function that loads trusted setup now has to return trusted setup otherwise it cant be used as input to function that requires trusted setup to be passed * removed debug print * added filelock for r/w blobs * removed logging changes from this PR * removed changes to conversions.py * fixes * fix * fixes --------- Co-authored-by: Mario Vega <marioevz@gmail.com>
🗒️ Description
Adds an eest wrapper for ckzg functions we will use in our tests. This PR allows us to send valid blobs.
🔗 Related Issues
✅ Checklist
mkdocs servelocally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.