-
Notifications
You must be signed in to change notification settings - Fork 372
fix: ensure reproducibility of the ledger_suite_orchestrator_canister #5347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: ensure reproducibility of the ledger_suite_orchestrator_canister #5347
Conversation
gregorydemay
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @basvandijk for getting to the bottom of this. Just some understanding questions from my side.
| @@ -1,3 +1,4 @@ | |||
| load("@rules_rust//cargo:defs.bzl", "cargo_build_script") | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understanding question: we use askama for other canisters, are they also not a problem for the reproducibility issue?
- ic-btc-checker
- ic-cketh-minter
- ic-ckbtc-kyt (this one we plan on deleting)
| format!( | ||
| r#" | ||
| #[derive(Template)] | ||
| #[template(escape = "html", source = {:?}, ext = "html")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
understanding question: why was this line added?
| ], | ||
| ) | ||
|
|
||
| cargo_build_script( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@basvandijk : If I understand that failure correctly, this means that it doesn't solve the problem or there is another source of non-determinism?
//rs/ethereum/ledger-suite-orchestrator:_ledger_suite_orchestrator_canister.wasm.gz_finalize bazel-out/k8-opt/bin/rs/ethereum/ledger-suite-orchestrator/ledger_suite_orchestrator_canister.wasm.gz: 46b544a2e01428a5cf29384e2858f0bed73c98340b563a8d4157133a64ecb6b3
!= 5379b1dc3c8cea64d032c5d6ade6a4069a234aaadc945184fe6d122de69daf65
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean? That this PR is insufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mraszyk The description of the PR states
it's causing the Build Reproducibility check to fail in: https://github.com/dfinity/ic/actions/runs/15296835171/job/43032042139?pr=5194.
However, that PR seems to suffer from the same problem
https://github.com/dfinity/ic/actions/runs/15303873653/job/43056720920?pr=5348
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think I was too soon turning this from draft into a PR. There seems to be another source of non-determinism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the logs it seems that there is a non-determinism with the archive. This would definitely produce non-determinism in the ledger (because it embeds its wasm) and in the ledger suite orchestrator (which embeds among other also the archive wasm)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gregorydemay we're embedding other canisters into the ledger_suite_orchestrator_canister right? Are those other canisters also using askama? That could be one explanation while there's still non-determinism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're embedding other canisters into the ledger_suite_orchestrator_canister right?
yes that's correct, they are defined here
Are those other canisters also using askama?
I don't think so, none of the canisters in the ledger suite have a dependency on askama
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@basvandijk To me the root cause of the non-determinism seen here could be explained by the archive
//rs/ledger_suite/icrc1/archive:_wasm_archive_canister_u256 bazel-out/k8-opt-ST-42b6d6ef7a37/bin/rs/ledger_suite/icrc1/archive/_wasm_archive_canister_u256.wasm: 4977a63fc611b224d47a4c0004139dda86f6ef69563870011ac6e3ddbbf99b17
since this would trigger non-determinism for ledger_u256 and the ledger suite orchestrator.
It's interesting that only the u256 variant seems to have that problem (and not the u64 variant)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right the archive canister is the source.
I think I jumped to conclusions too soon since we have been suffering from non-determinism from a very similar use of askama recently which we fixed using #5282. But maybe the symbol name generator of the rustc WASM backend is not sensitive to a changing metadata hash and so canisters remain reproducible while on x86_64 things blow up.
The
ledger_suite_orchestrator_canisteris using a proc-macro from the askama crate:This is known to cause non-determinism (askama-rs/askama#461) and it's causing the Build Reproducibility check to fail in: https://github.com/dfinity/ic/actions/runs/15296835171/job/43032042139?pr=5194. It's unclear why it didn't fail earlier. #5348 is testing whether cherry-picking this fix into #5194 fixes the Build Determinism job.
We therefor apply the same work-around as in:
rs/http_endpoints/public/build.rsandrs/ic_os/config/build.rsby loading the template inbuild.rsfrom a string instead of from a path.