Skip to content

feat!: use bytecode hash as bytecode ID to deduplicate bytecode hashing, instruction fetching, bc decomp#15977

Merged
dbanks12 merged 1 commit intonextfrom
db/dedup-bc-commit
Aug 8, 2025
Merged

feat!: use bytecode hash as bytecode ID to deduplicate bytecode hashing, instruction fetching, bc decomp#15977
dbanks12 merged 1 commit intonextfrom
db/dedup-bc-commit

Conversation

@dbanks12
Copy link
Contributor

@dbanks12 dbanks12 commented Jul 25, 2025

Discussion, reasoning and alternatives approaches can be found here.

What's done

  1. Bytecode ID is now the bytecode commitment.
  2. Bytecode retrieval uses Bytecode ID of 0 and a failure flag when the bytecode does not exist.
  3. Bytecode hashing and decomposition are deduplicated by bytecode ID.
  4. Instruction fetch is deduplicated by ( bytecode_id, pc ) via the event.get_key() mechanism.
  5. Address derivation is deduplicated by address.
  6. Class ID derivation is deduplicated by class ID.
  7. Bytecode ID is now in the context & context stack and is propagated to the next row except on context change.

Questions

  1. Should I have instead used the event.get_key() mechanism to deduplicate more (bytecode decomp, hashing, addr deriv, class ID deriv)?
  2. If instruction fetching deduplication relies on event.get_key(), does that mean that we are not deduplicating the PC range check where we could be? (screenshotted)
image

Base automatically changed from merge-train/avm to next July 25, 2025 18:51
@dbanks12 dbanks12 changed the base branch from next to graphite-base/15977 July 28, 2025 14:03
@dbanks12 dbanks12 force-pushed the db/dedup-bc-commit branch from 2f45e07 to f931ff6 Compare July 28, 2025 14:03
@dbanks12 dbanks12 force-pushed the graphite-base/15977 branch from a6fa349 to 50e553c Compare July 28, 2025 14:03
@dbanks12 dbanks12 changed the base branch from graphite-base/15977 to merge-train/avm July 28, 2025 14:03
@dbanks12 dbanks12 force-pushed the db/dedup-bc-commit branch from f931ff6 to d7c280c Compare July 28, 2025 16:07
Base automatically changed from merge-train/avm to next July 28, 2025 19:52
@dbanks12 dbanks12 changed the base branch from next to graphite-base/15977 August 5, 2025 17:20
@dbanks12 dbanks12 force-pushed the db/dedup-bc-commit branch from d7c280c to 4493428 Compare August 5, 2025 17:20
@dbanks12 dbanks12 force-pushed the graphite-base/15977 branch from 0bae325 to 50e553c Compare August 5, 2025 17:20
@dbanks12 dbanks12 changed the base branch from graphite-base/15977 to merge-train/avm August 5, 2025 17:20
@dbanks12 dbanks12 force-pushed the db/dedup-bc-commit branch from 4493428 to d8c3996 Compare August 5, 2025 17:25
@dbanks12 dbanks12 changed the base branch from merge-train/avm to graphite-base/15977 August 5, 2025 17:28
@dbanks12 dbanks12 force-pushed the db/dedup-bc-commit branch from d8c3996 to 75e3431 Compare August 5, 2025 17:29
@dbanks12 dbanks12 force-pushed the graphite-base/15977 branch from c227e71 to 836cfef Compare August 5, 2025 17:29
@dbanks12 dbanks12 changed the base branch from graphite-base/15977 to next August 5, 2025 17:29
@dbanks12 dbanks12 changed the title feat!: CLAUDE-GENERATED use bytecode hash as bytecode ID to deduplicate bytecode hashing, instruction fetching, bc decomp feat!: use bytecode hash as bytecode ID to deduplicate bytecode hashing, instruction fetching, bc decomp Aug 5, 2025
@dbanks12 dbanks12 force-pushed the db/dedup-bc-commit branch 5 times, most recently from ca79592 to 399afd7 Compare August 6, 2025 16:24
@dbanks12 dbanks12 changed the base branch from next to graphite-base/15977 August 6, 2025 20:39
@dbanks12 dbanks12 force-pushed the db/dedup-bc-commit branch from 399afd7 to baee102 Compare August 6, 2025 20:39
@dbanks12 dbanks12 force-pushed the graphite-base/15977 branch from fd36eb9 to 1fb7a5f Compare August 6, 2025 20:39
@dbanks12 dbanks12 changed the base branch from graphite-base/15977 to db/fix-context August 6, 2025 20:39
@dbanks12 dbanks12 force-pushed the db/dedup-bc-commit branch from 323c66e to 1ff8911 Compare August 7, 2025 15:11
@dbanks12 dbanks12 changed the base branch from graphite-base/15977 to next August 7, 2025 15:11
@dbanks12 dbanks12 marked this pull request as ready for review August 7, 2025 15:13
Comment on lines +105 to +113
// If class ID derivation is disabled (instance does not exist), force all members to 0.
#[CURRENT_CLASS_ID_IS_ZERO_IF_INSTANCE_DOES_NOT_EXIST]
sel * (1 - instance_exists) * (current_class_id - 0) = 0;
#[ARTIFACT_HASH_IS_ZERO_IF_INSTANCE_DOES_NOT_EXIST]
sel * (1 - instance_exists) * (artifact_hash - 0) = 0;
#[PRIVATE_FUNCTION_ROOT_IS_ZERO_IF_INSTANCE_DOES_NOT_EXIST]
sel * (1 - instance_exists) * (private_function_root - 0) = 0;
#[BYTECODE_ID_IS_ZERO_IF_INSTANCE_DOES_NOT_EXIST]
sel * (1 - instance_exists) * (bytecode_id - 0) = 0;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these were necessary before but were missing

Comment on lines -33 to -42
TestTraceContainer trace({ {
{ C::execution_next_context_id, 0 },
{ C::precomputed_first_row, 1 },
// Context Stack Rows
{ C::context_stack_sel, 1 },
{ C::context_stack_entered_context_id, 2 },
{ C::context_stack_context_id, 1 },
{ C::context_stack_parent_id, 0 },
{ C::context_stack_next_pc, 2 },
{ C::context_stack_msg_sender, 0 },
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This big test diff is formatting. I just added bytecode id.

Comment on lines +45 to +64
// Check if we've already processed this bytecode. If so, don't do hashing and decomposition again!
auto existing_bytecode = bytecodes.find(bytecode_id);
if (existing_bytecode != bytecodes.end()) {
// Already processed this bytecode - just emit retrieval event and return
auto tree_states = merkle_db.get_tree_state();
retrieval_events.emit({
.bytecode_id = bytecode_id,
.address = address,
.current_class_id = current_class_id,
.contract_class = klass,
.nullifier_root = tree_states.nullifierTree.tree.root,
.public_data_tree_root = tree_states.publicDataTree.tree.root,
});
return bytecode_id;
}

// First time seeing this bytecode - do hashing and decomposition
FF computed_commitment = bytecode_hasher.compute_public_bytecode_commitment(bytecode_id, klass.packed_bytecode);
(void)computed_commitment; // Avoid GCC unused parameter warning when asserts are disabled.
assert(computed_commitment == klass.public_bytecode_commitment);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could deduplicate bytecode hashing inside of the hasher. But we already need to deduplicate decomposition event-generation here in the bytecode manager, so I thought it was natural to also dedup the call to bytecode hashing here.

Comment on lines 26 to 31
struct BytecodeNotFoundError : public std::runtime_error {
BytecodeNotFoundError(BytecodeId id, const std::string& message)
BytecodeNotFoundError(const std::string& message)
: std::runtime_error(message)
, bytecode_id(id)
{}

BytecodeId bytecode_id;
};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot include the ID in the error if it is an FF or we get some annoying compilation error about alignment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah I remember Ilyas got that as well; I guess it's ok you can stringify it

Comment on lines 15 to 17

using BytecodeId = uint8_t;
using BytecodeId = FF;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opted to leave this alias, but I'm happy to remove if someone feels strongly otherwise

Copy link
Contributor

@fcarreiro fcarreiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Comment on lines 26 to 31
struct BytecodeNotFoundError : public std::runtime_error {
BytecodeNotFoundError(BytecodeId id, const std::string& message)
BytecodeNotFoundError(const std::string& message)
: std::runtime_error(message)
, bytecode_id(id)
{}

BytecodeId bytecode_id;
};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah I remember Ilyas got that as well; I guess it's ok you can stringify it

// Throws BytecodeNotFoundError if contract not found (for use in temporality group 1)
virtual BytecodeId get_bytecode_id() = 0;

// Returns the id of the current bytecode, or nullopt if contract not found.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: "if contract not found."

Does this try to fetch the contract? Or "contract not found" means "nobody tried to fetch yet"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It tries to fetch the contract. "Not found" means "contract does not exist". I'll clarify in the comments.


StrictMock<MockContractDB> contract_db;
StrictMock<MockHighLevelMerkleDB> merkle_db;
StrictMock<MockPoseidon2> poseidon2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to use a FakePoseidon2 if it makes life easier/code cleaner

Copy link
Contributor Author

@dbanks12 dbanks12 Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did it for class id and address deriv, but opted not to here for the bytecode manager tests because I'd end up needing to replicate too much of the algorithm in the test.

…te bytecode hashing, instruction fetching, bc decomp
@dbanks12 dbanks12 force-pushed the db/dedup-bc-commit branch from 32b01b3 to 7ff82e3 Compare August 8, 2025 18:57
@dbanks12 dbanks12 enabled auto-merge August 8, 2025 19:05
@dbanks12 dbanks12 added this pull request to the merge queue Aug 8, 2025
Merged via the queue into next with commit 3e97323 Aug 8, 2025
5 checks passed
@dbanks12 dbanks12 deleted the db/dedup-bc-commit branch August 8, 2025 20:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants