Add agave-scheduler-bindings crate#7823
Conversation
f8a3896 to
ec77626
Compare
| pub transactions: [SharableTransaction; MAX_TRANSACTIONS_PER_PACK_MESSAGE], | ||
| } | ||
|
|
||
| pub mod pack_message_flags { |
There was a problem hiding this comment.
Flags for supporting bundles:
DROP_ON_FAILUREALL_OR_NONE
will be added here once the functionality is supported.
There was a problem hiding this comment.
Additional flags for:
- checking balance
- checking status (already processed or not)
outside of any slot (similar to RESOLVE) will also likely be added
There was a problem hiding this comment.
Granularity is great but in general I think any scheduler is interested in 'is this still valid' so that includes basically everything in recv_and_buffer; If can enumerate everything separately; great; might never be used individually though
There was a problem hiding this comment.
yeah, i'm planning to add the checking balance and status here as mentioned above. but it'll likely support having all 3 flags set to check/perform all of them.
ec77626 to
8f8a871
Compare
| pub num_transactions: u8, | ||
| /// Transactions in the message. Only the first `num_transactions` | ||
| /// entries are valid to read. | ||
| pub transactions: [SharableTransaction; MAX_TRANSACTIONS_PER_PACK_MESSAGE], |
There was a problem hiding this comment.
Thought about making this an allocation from the external pack, which would allow us an arbitrary number of txs per message.
But I don't think there's enough value in that vs the static sized array to avoid additional indirection.
There was a problem hiding this comment.
We could start with variable length txs if unsure MAX_TRANSACTIONS_PER_PACK_MESSAGE will change often in the future.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #7823 +/- ##
=========================================
- Coverage 83.0% 83.0% -0.1%
=========================================
Files 808 808
Lines 356265 356265
=========================================
- Hits 296043 296006 -37
- Misses 60222 60259 +37 🚀 New features to boost your workflow:
|
| /// begins. | ||
| pub progress: i16, | ||
| /// The number of compute units packed in the current block, if leader. | ||
| pub total_compute_units: u64, |
There was a problem hiding this comment.
Should this be total_compute_units_left? Signals capacity (easier forward compatibility for CU Limit increases)
There was a problem hiding this comment.
made it remaining_cost_units since these are actually cost_units not compute_units 😬
| /// Message: [Worker -> Pack] | ||
| /// Message from worker threads in response to a [`PackToWorkerMessage`]. | ||
| /// [`PackToWorkerMessage`] may have multiple response messages that | ||
| /// will follow the order of transactions in the original message. |
There was a problem hiding this comment.
Might be easier to reason about things if the response message contains results for everything in one 'batch'
There was a problem hiding this comment.
the actual message types here are somewhat large, if we had a static array of the tagged-unions, total batch size would be 2048 bytes. If doing small batch sizes of say 3 messages (bundles) then the messages would be 90% useless. Effectively that would make it necessary to increase the size of the queue to accomodate smaller batches.
While the responses for each tx are in different messages, agave will always commit them as a batch. i.e. if you sent a batch of 10 txs, you will not receive 8 of the responses but not the last 2; it's always 10 or none that are visible (the shared queue writer position) to the external scheduler.
There was a problem hiding this comment.
if all possible, would be great to stick with one response per request. Speak of which, is Packer expected to have a timer or something when wait for response from Workers, or it will hold on to original message/txs until response is received?
There was a problem hiding this comment.
if all possible, would be great to stick with one response per request.
See comment above - it's possible but wastes an incredible amount of space or causes additional indirection. It's a loop either way, this is a simpler approach that creating a pointer which then needs to be freed.
expected to have a timer
Not sure what value having a timer would have, wdym? Pack sends messages with transactions, worker will respond in order.
or it will hold on to original message/txs until response is received?
It doesn't need to hold onto anything from the original message, except ALT pubkey ptrs if it doesn't want to re-resolve for unlocking the keys in it's internal lock-tracking (if any).
There was a problem hiding this comment.
what packer should do in case worker never responded (due to unexpected cause)?
There was a problem hiding this comment.
If it never responds or responds in a different order it indicates a bug in the worker. So it'd probably be reasonable to shutdown the external scheduler to let agave fall back to internal scheduler, or to make that switch manually.
| /// If [`pack_message_flags::RESOLVE`] flag is not set, this is the slot | ||
| /// the transactions can be processed in. If the flag is set to | ||
| /// [`ANY_SLOT`], the transactions can be processed in any slot in agave. | ||
| pub slot: u64, |
There was a problem hiding this comment.
🤔 yeah that seems reasonable. then I don't even need to declare ANY_SLOT as a special flag...that's implied!
| /// The transaction that was included. | ||
| pub transaction: SharableTransaction, | ||
| /// Compute units used by the transaction. | ||
| pub compute_units: u64, |
There was a problem hiding this comment.
I think need account_data_size_loaded too for proper CostModel updates
There was a problem hiding this comment.
yeah this should be cost-units not compute-units
|
|
||
| /// Reference to a transaction that can shared safely across processes. | ||
| #[repr(C)] | ||
| pub struct SharableTransaction { |
There was a problem hiding this comment.
nit: I found calling it "...Transaction" is bit misleading, and we have so many "transaction" types already. What do you think call it "SharableTransactionRegion" or "SharableTransactionDescriptor"?
| /// Message: [Worker -> Pack] | ||
| /// Message from worker threads in response to a [`PackToWorkerMessage`]. | ||
| /// [`PackToWorkerMessage`] may have multiple response messages that | ||
| /// will follow the order of transactions in the original message. |
There was a problem hiding this comment.
if all possible, would be great to stick with one response per request. Speak of which, is Packer expected to have a timer or something when wait for response from Workers, or it will hold on to original message/txs until response is received?
| /// the transaction will not be processed. | ||
| pub max_execution_slot: u64, | ||
| /// The number of transactions in this message. | ||
| /// MUST be in the range [1, [`MAX_TRANSACTIONS_PER_PACK_MESSAGE`]]. |
There was a problem hiding this comment.
How do you think to implement basic validation alone with message type def?
"max_execution_slot MUST be in the range [1, [MAX_TRANSACTIONS_PER_PACK_MESSAGE]]" is a good example for impl MessageValidation for PackToWorkerMessage {...}
There was a problem hiding this comment.
I'm really not sure what you mean. It'll be an if statement lol
There was a problem hiding this comment.
Oh I see what you mean, you mean have validation inside this file.
I'm against that. I've intentionally kept any functions out of this file, it is intended only as type definitions.
edit: This file is intended to be a defintion of the binary interface. Any functionality should not be in here imo.
There was a problem hiding this comment.
yea, I see your point. Ideally the message type doesn't have such subtle inner logic. Maybe it can be a reason to remove MAX_TRANSACTIONS_PER_PACK_MESSAGE, have Packer to dynamically reserve transactions?
| pub num_transactions: u8, | ||
| /// Transactions in the message. Only the first `num_transactions` | ||
| /// entries are valid to read. | ||
| pub transactions: [SharableTransaction; MAX_TRANSACTIONS_PER_PACK_MESSAGE], |
There was a problem hiding this comment.
We could start with variable length txs if unsure MAX_TRANSACTIONS_PER_PACK_MESSAGE will change often in the future.
7696b96 to
8125c7c
Compare
tao-stones
left a comment
There was a problem hiding this comment.
Thanks for explaining and updating, it looks so much better 🔥
Wondering your think on this, in case for some reason, Workers got backed up, Packer can't release Transactions, then can cascade to TPU can't reserve space for new Transaction. Add a TTL field to Pack->Work message can help in that particular case. wdyt?
Yeah there will be some logic inside of agave to fall back to an internal scheduler. |
520ae84 to
1be3ecf
Compare
tao-stones
left a comment
There was a problem hiding this comment.
thanks for documenting the workflows 👍🏼
| /// this offset. The type of each inner message is indicated by `tag`. | ||
| /// There are `num_transaction_responses` inner messages. | ||
| /// See [`worker_message_types`] for details on the inner message types. | ||
| pub transaction_responses_offset: u32, |
There was a problem hiding this comment.
nit: to keep the existing pattern elsewhere, these two fields can be defined as a "region"?
struct TransactionResponseRegion {
transaction_responses_offset: u32,
num_transaction_responses: u8,
}
tao-stones
left a comment
There was a problem hiding this comment.
The Packer ↔ Worker flow is much clearer now. Let me try to summarize it:
-
Packer → Worker: sends a
TransactionBatchRegion, with process instructions such as vanilla execution (None) or resolve. -
Expectation: Packer expects a single response message from Worker.
-
Worker → Packer: replies with the same
TransactionBatchRegion(so Packer doesn’t need to retain a local copy) along with aResponseRegion. The reply indicates whether the original Packer2Worker message was invalid, or, if valid, provides per-transaction processing results.
Hope that's accurate?
| /// Tag indicating the type of message. | ||
| /// See [`worker_message_types`] for details. | ||
| /// If the tag is [`worker_message_types::INVALID_MESSAGE`], | ||
| /// no inner messages will be present and `num_transaction_responses` |
There was a problem hiding this comment.
note: would be useful to provide reasons of why original Packer2Worker message was invalid?
| /// Tag indicating [`InvalidMessage`] inner message. | ||
| /// Tag indicating the sent message was invalid. | ||
| /// No inner messages will be present. | ||
| pub const INVALID_MESSAGE: u8 = 0; |
There was a problem hiding this comment.
per early suggestion, this can be removed
|
|
||
| pub mod not_included_reasons { | ||
| /// The transaction was included in the block - i.e. no reason it was not included. | ||
| pub const NONE: u8 = 0; |
There was a problem hiding this comment.
nerdy nit: the double negative ("no reason it was not included" == "included, cost_units and fee_payer_balance can be looked at"). I don't know what's good way to straight it out without adding an additional field.
def658d to
7cb9c3a
Compare
tao-stones
left a comment
There was a problem hiding this comment.
thanks for addressing all the comments. it looks great
Problem
Summary of Changes
Fixes #