Skip to content

Conversation

@tersec
Copy link
Contributor

@tersec tersec commented Feb 16, 2023

https://github.com/ethereum/execution-apis/blob/main/src/engine/experimental/blob-extension.md#blobsbundlev1 notes

BlobsBundleV1

The fields are encoded as follows:

  • blockHash: DATA, 32 Bytes
  • kzgs: Array of DATA - Array of KZGCommitment as defined in EIP-4844, 48 bytes each (DATA).
  • blobs: Array of DATA - Array of blobs, each blob is FIELD_ELEMENTS_PER_BLOB * BYTES_PER_FIELD_ELEMENT = 4096 * 32 = 131072 bytes (DATA) representing a SSZ-encoded Blob as defined in EIP-4844

This corresponds with the mainnet preset, but not the minimal preset. This renders the minimal preset Cancun-engine-API incompatible.

@mkalinin mkalinin requested a review from hwwhww February 16, 2023 15:30
@hwwhww
Copy link
Contributor

hwwhww commented Feb 16, 2023

The CI situation

  • Although the CI machines are not the same, we can still see test-deneb (it runs all test cases with Deneb version) getting ~3 times slower.
  • We already run test-deneb with 16 processes in parallel.
  • Currently, we only have ~10 test cases that touch blob verification.

Before

image

With this PR

image

😭

Also, looks like it found a bug in polynomial-commitments.md. /cc @asn-d6


Potential solutions to improve it

1. Implement low-level field element computation and pairing helpers in milagro-bls-binding

@dankrad once suggested it.

Pros: Probably can accurate pyspec.
Cons:

  • Adding an extra project to maintain and debug.
  • It seems hard to make the binding support Python class features. Our current G1 point and G2 point are PyECC-defined types, I'm not sure if milagro-bls-binding can provide the same interface and readability.

/cc @ChihChengLiang the milagro-bls-binding maintainer. Any thoughts about this?

2. Use c-kzg-4844 or other KZG library python binding in state tests

Pros: it can accelerate the state tests part
Cons:

  • polynomial-commitments.md tests are still slow. Less coverage in polynomial-commitments.md.
    • Side issue: @dankrad @asn-d6 @kevaundray requested for polynomial-commitments.md test vectors. @dankrad wants consensus-specs repo to generate the test vectors together but I am kinda reluctant due to the speed. :p Using FIELD_ELEMENTS_PER_BLOB := 4096 in minimal preset will also make it more painful to add more tests.
  • [probably short-term issue?] polynomial-commitments.md is still in active development recently. Pyspec needs to wait for c-kzg-4844 implementation.

minimal preset exists for making testing experiences more efficient. I'm still very against this change.

@mkalinin
Copy link
Contributor

I am wondering if there is an option to run tests on CI with 4 field elements e.g. by overriding the parameter while use minimal presets for test vectors generation?

@ethereum ethereum deleted a comment from KuliHaHa Feb 16, 2023
@ethereum ethereum deleted a comment from KuliHaHa Feb 16, 2023
@ethereum ethereum deleted a comment from KuliHaHa Feb 16, 2023
@hwwhww
Copy link
Contributor

hwwhww commented Feb 16, 2023

@mkalinin

I am wondering if there is an option to run tests on CI with 4 field elements e.g. by overriding the parameter while use minimal presets for test vectors generation?

We don't have such an option. I think it requires at least:

  • Move the FIELD_ELEMENTS_PER_BLOB related preset values to configs/minimal.yaml (including the trusted setups)
  • Store FIELD_ELEMENTS_PER_BLOB := 4 trusted setup file somewhere.
  • Update the with_config_overrides decorator to make it do different overriding when a flag is set. All the blob test cases should use this decorator.
  • Probably need to hard-code the field names as a special case in context.py and try to override the trusted setups case-by-case.

@hwwhww
Copy link
Contributor

hwwhww commented Feb 16, 2023

looks like it found a bug in polynomial-commitments.md.

I cherry-picked #3256 to fix this branch. After 30min, we got:

make: *** [Makefile:117: citest] Terminated
Too long with no output (exceeded 10m0s): context deadline exceeded

CI job link: https://app.circleci.com/pipelines/github/ethereum/consensus-specs/8626/workflows/9c1e0e2a-ede4-4671-91dc-0830d072adb7/jobs/61123

test_polynomial_commitments.py::test_compute_kzg_proof_within_domain is an expensive unit test to verify the cryptography assumption and implementation with all 4096 ROOT_OF_UNITY.


I am wondering if there is an option to run tests on CI with 4 field elements e.g. by overriding the parameter while use minimal presets for test vectors generation?

I want to add that, we've also skipped more than 100 mainnet preset test cases in test vectors because they are too slow. We may want to write a test but sink into unbearable speed in both mainnet and minimal presets when FIELD_ELEMENTS_PER_BLOB := 4096.

@ethereum ethereum deleted a comment from KuliHaHa Feb 16, 2023
@ChihChengLiang
Copy link
Contributor

ChihChengLiang commented Feb 17, 2023

Hi @hwwhww, I think supporting G1 and G2 operations from milagro-bls-binding is impossible. We need to write a new binding library. I was talking with @dankrad on this the other day. I wrote a proposal for some background context for people interested in the implementation.

@dankrad
Copy link
Contributor

dankrad commented Feb 22, 2023

Kev has now implemented an interface for arkworks: https://pypi.org/project/py-arkworks-bls12381/

@mkalinin
Copy link
Contributor

mkalinin commented Feb 23, 2023

From a discussion with @hwwhww another option to consider is defining minimal-el presets which would be a copy of minimal with FIELD_ELEMENTS_PER_BLOB == 4096. The minimal-el preset could be used to run devnets with minimal configuration and real execution layer client while minimal would still be used to generate consensus test vectors and run tests on CI.

The downside of this approach is an overhead of keeping minimal and minimal-el presets consistent with each other, but this seems to be the least invasive solution of this problem.

@tersec
Copy link
Contributor Author

tersec commented Feb 23, 2023

Before resorting to separate minimal and minimal-el presets, would be useful to see benchmarks from https://pypi.org/project/py-arkworks-bls12381/

If this is an optimization issue which can be solved by optimizing the test generation sufficiently, then that's cleaner in general, rather than allowing sort of incidental complexity to ramify out into the specs.

@hwwhww
Copy link
Contributor

hwwhww commented Mar 10, 2023

I merged the dev branch to use --bls-type=fastest option. However, the CI job still got a timeout with test_compute_kzg_proof_within_domain test case.

@tersec
Copy link
Contributor Author

tersec commented Jun 16, 2023

Deneb/Cancun is still on track to launch with a broken minimal preset -- if the only reason for this is a CI timeout, that's ... quite an imposition of some random temporary technical artifact into the spec and across the ecosystem indefinitely. Because the default action, if people do nothing, is there will be "minimal" which works with the consensus test vectors, and a "minimal" which can actually be run, but which does not and cannot align with the test vectors.

If it's truly the case that generating test vectors for this is that much of an imposition, it should perhaps be cause to reflect on the protocol more broadly; test vectors would in that situation be a downstream effect, not cause.

@potuz
Copy link
Contributor

potuz commented Oct 9, 2023

I support this PR, we rely on minimal spec presets for a large part of our testing, and this issue is blocking our entire testing pipeline for deneb

@hwwhww
Copy link
Contributor

hwwhww commented Oct 10, 2023

  1. The main bottleneck is test_compute_kzg_proof_within_domain polynomial test. So we can either consider deleting it or only test it in c-kzg or other libraries. /cc @asn-d6 what do you think?
  2. If we remove test_compute_kzg_proof_within_domain, for the test-deneb CI job (the job that runs tests with Deneb fork), it increases ~26% of the testing time (from 9m14s -> 11m43s). It will only increase the time for testgen (e.g., random tests are not in CI). It is acceptable now, but my primary concern is the potential cumulative overhead this could introduce for all future forks in our tests.

@asn-d6
Copy link
Contributor

asn-d6 commented Oct 10, 2023

  1. The main bottleneck is test_compute_kzg_proof_within_domain polynomial test. So we can either consider deleting it or only test it in c-kzg or other libraries. /cc @asn-d6 what do you think?
  2. If we remove test_compute_kzg_proof_within_domain, for the test-deneb CI job (the job that runs tests with Deneb fork), it increases ~26% of the testing time (from 9m14s -> 11m43s). It will only increase the time for testgen (e.g., random tests are not in CI). It is acceptable now, but my primary concern is the potential cumulative overhead this could introduce for all future forks in our tests.

Hey @hwwhww . Thanks for looking into this.

Before we commit to removing test_compute_kzg_proof_within_domain (which is generally a useful test), I wanted to ask if the times above are with py-ecc or with py-arkworks-bls. It seems like using py-arkworks-bls for the tests in polynomial_commitments would be a reasonable way forward here?

EDIT: For some context, running the test_polynomial_commitments.py unittests using py-ecc takes 4.11 seconds on my machine, whereas it takes 0.26 seconds with py-arkworks-bls.

@hwwhww
Copy link
Contributor

hwwhww commented Oct 10, 2023

Hey @asn-d6

It's py-arkworks-bls. We set --bls-type=fastest in

python3 -m pytest -n 16 --bls-type=fastest --preset=$(TEST_PRESET_TYPE) --fork=$(fork) --junitxml=test-reports/test_results.xml eth2spec

You can try to benchmark it with --preset=mainnet:

pytest --bls-type=fastest eth2spec/test/deneb/unittests/polynomial_commitments/test_polynomial_commitments.py --fork=deneb --preset=mainnet

@asn-d6
Copy link
Contributor

asn-d6 commented Oct 10, 2023

Hey @asn-d6

It's py-arkworks-bls. We set --bls-type=fastest in

python3 -m pytest -n 16 --bls-type=fastest --preset=$(TEST_PRESET_TYPE) --fork=$(fork) --junitxml=test-reports/test_results.xml eth2spec

You can try to benchmark it with --preset=mainnet:

pytest --bls-type=fastest eth2spec/test/deneb/unittests/polynomial_commitments/test_polynomial_commitments.py --fork=deneb --preset=mainnet

Thanks for the explanation!

I added a commit to this PR which revives test_compute_kzg_proof_within_domain() but makes it non-exhaustive and hence faster to run. By non-exhaustive I mean that it tests 6 random elements instead of all 4096. I did the same for test_barycentric_within_domain().

Hoping that performance is looking better now.

Copy link
Contributor

@hwwhww hwwhww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirmed that testgen works fine with this PR. I hope to get approvals from other maintainers too, before merging it.

In the meantime, I will go add the official trusted setups based on this PR so we don't need to create a minimal preset version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants