Skip to content

feat!: CLAUDE-GENERATED use class ID to deduplicate bytecode hashing and class ID derivation, use address for address derivation#15683

Closed
dbanks12 wants to merge 14 commits intonextfrom
db/dedup
Closed

feat!: CLAUDE-GENERATED use class ID to deduplicate bytecode hashing and class ID derivation, use address for address derivation#15683
dbanks12 wants to merge 14 commits intonextfrom
db/dedup

Conversation

@dbanks12
Copy link
Contributor

@dbanks12 dbanks12 commented Jul 13, 2025

Changes bytecode ID to just be the same as class ID, unless bytecode retrieval fails in which case execution just uses bytecode ID of 0.

Adds deduplication to address derivation, class ID derivation and bytecode hashing.

Copy link
Contributor Author

dbanks12 commented Jul 13, 2025

Copy link
Contributor

@fcarreiro fcarreiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, this is why I was asking if deduplication was done in your PR, I thought you would: https://docs.google.com/document/d/1CNHZrtBukR4CMRxF2vKnYiC77C-6J6ChOrqO9Gu_xr8/edit?disco=AAABnM22Gmw

void ClassIdDerivation::assert_derivation(const ContractClassId& class_id, const ContractClass& klass)
{
// TODO: Cache and deduplicate.
if (derived_class_ids.contains(class_id)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When do you add the id?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here you could just make the event emitter a deduplicated event emitter and avoid all this. In non-assert mode, they will be equivalent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes yeah I didn't actually insert anywhere

FF BytecodeHasher::compute_public_bytecode_commitment([[maybe_unused]] const BytecodeId bytecode_id,
const std::vector<uint8_t>& bytecode)
{
if (bytecode_commitments.contains(bytecode_id)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, when do you add to the map?


void AddressDerivation::assert_derivation(const AztecAddress& address, const ContractInstance& instance)
{
if (derived_addresses.contains(address)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one has a more expensive computation, so you could chose to add a set (but you still need to insert at some point!). However, it might be simpler to just do it with a deduplicating event emitter.

}

ContractClassId current_class_id = maybe_instance.value().current_class_id;
BytecodeId bytecode_id = BytecodeId(current_class_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have to sit down quite some time to convince myself that using bytecode id works... in PIL. A few concerns
(1) in any given table that has bytecodeid how would you constrain (if needed) that there are no duplicates. Observe that even if two identical rows are not bad, you could claim 2 different commitments for the same bytecode id and that would be bad. With bytecode ids (and not class ids) you can ask for monotonically increasing numbers. Can we/should we do it for class id and how?
(2) It feels bytecode ids and class ids are being mixed. If you think that class id can be used, then maybe just keep/rename things to class id where it is what it represents.

I'd like to be sure that you think things will work before I spend a few hours myself looking in detail at all the connections. LMK!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't really thought 1 all the way through. I guess this was handled for bc_retrieval.pil just by enforcing that bytecode_id is doing ++ on each row.... Which I didn't realize and didn't unconstrain in this PR 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yes, bytecode and class IDs are being mixed. Or well.... they're == except when it doesn't exist. I agree that maybe it makes sense to just use "class ID".

Copy link
Contributor Author

@dbanks12 dbanks12 Jul 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Observe that even if two identical rows are not bad, you could claim 2 different commitments for the same bytecode id and that would be bad.

I'm not convinced that claiming 2 different commitments for a bytecode ID would actually be bad if the bytecode ID == class ID. Because you'd have a constraint failure when one of your 2 different commitments doesn't match the commitment used when deriving the class ID. So I don't think we actually need to constrain that bytecode ID is deduplicated.

Although as discussed last week, we do need to constrain that bytecode ID is right

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll write up a document with my thoughts and send it your way.

Base automatically changed from db/emitnullifier to merge-train/avm July 22, 2025 15:22
@dbanks12 dbanks12 force-pushed the db/dedup branch 2 times, most recently from 2f9bbfa to 8689ea0 Compare July 22, 2025 21:12
Base automatically changed from merge-train/avm to next July 23, 2025 09:07
@dbanks12 dbanks12 changed the base branch from next to graphite-base/15683 July 24, 2025 15:14
@dbanks12 dbanks12 force-pushed the graphite-base/15683 branch from f52457d to 5f4f579 Compare July 24, 2025 15:14
@dbanks12 dbanks12 changed the base branch from graphite-base/15683 to merge-train/avm July 24, 2025 15:14
@dbanks12 dbanks12 changed the title feat!: deduplicate bytecode hashing, class ID derivation, address derivation feat!: CLAUDE-GENERATED deduplicate bytecode hashing, class ID derivation, address derivation Jul 24, 2025
@dbanks12 dbanks12 changed the title feat!: CLAUDE-GENERATED deduplicate bytecode hashing, class ID derivation, address derivation feat!: CLAUDE-GENERATED use class ID to deduplicate bytecode hashing and class ID derivation, use address for address derivation Jul 25, 2025
Base automatically changed from merge-train/avm to next July 25, 2025 18:50
@dbanks12
Copy link
Contributor Author

dbanks12 commented Aug 5, 2025

Replaced with #15977

@dbanks12 dbanks12 closed this Aug 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants