-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apply transaction batches in periodic intervals. #4504
Conversation
Roughly how much slower is it for the client if they submit with |
Each batch takes place 100ms after the preceding. So the average additional wait time should be about 50ms. |
Yeah, duh. I should have realized that. But anyway, the reason I ask is that I suspect most clients aren't going to notice an extra 50ms (or maybe less because they're already paying the overhead of preparing the open ledger). Why not make the default behavior to And while I'm talking about ideas for the future, it might be worthwhile to make the default configurable, and even to disallow certain modes for non-admin connections. I'm thinking about public nodes who might want to reduce their load by not letting anybody submit with |
I want to keep "sync" the default for now to not surprise anybody. The necessity to phase this out and to either use "wait" or "async" will happen as volumes increase to well over 2000/s based on the testing I've done. Consider that if a current tx submission takes 10ms from the client's perspective, then adding 50ms to that will decrease throughput from 100/s to about 17/s. That's for somebody doing a lot of transaction submission. I think the idea of allowing administrators to disable certain modes is a good idea--I can see somebody like XRPLF wanting to do that, as well as Ripple. But in practice would also mean updating clients to be wise to this change. Maybe refinements such as this can be something the broad community can debate also? |
src/ripple/protocol/jss.h
Outdated
@@ -562,6 +562,7 @@ JSS(sub_index); // in: LedgerEntry | |||
JSS(subcommand); // in: PathFind | |||
JSS(success); // rpc | |||
JSS(supported); // out: AmendmentTableImpl | |||
JSS(sync); // in: Submit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I am correct that this becomes part of our client-facing API, we should make sure this is the name we want. I find myself wondering if another name, such as submit
or maybe submit_mode
would be better. I'm not the right person to answer this question. I'm simply trying to make this decision more visible. If sync
is the best name, I'm fine with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@intelliot what's the best way to define this new API field? I like it as is, since the behavior has to do with being synchronous. But I also don't care enough either way. How should we handle this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Can you (re)explain the purpose and meaning of the field?
- Why does it need to be user-facing?
- How should users decide what value to set in the field?
Once we have simplified and accurate answers to the above, then we should collect feedback from API users like @justinr1234, @mvadari, @mDuo13 to get a recommendation for how to define and introduce the field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mDuo13 can you please review the proposed API change, described at the top of this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mvadari ^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ximinez what about "sync_mode"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sync mode sounds confusing to me. I prefer mode or submit_mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could see the value in sync_mode
. Like I said earlier, we already know that it's related to submit, but what happens if down the road there's some other processing option that needs a separate mode. I have no idea what that would be, but you never know. If we do sync_mode
now, then it'll be less confusing to add foo_mode
later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"sync_mode" means that the particular mode defines the behavior having to do with synchronicity.
"mode" can be anything and "submit_mode" is redundant. On the other hand, the interface is something that strikes everybody differently. What do you guys think at this point? @justinr1234 @ximinez
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sync_mode
works for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments, but no blockers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left several suggestions, but I like the changes overall.
src/ripple/protocol/jss.h
Outdated
@@ -562,6 +562,7 @@ JSS(sub_index); // in: LedgerEntry | |||
JSS(subcommand); // in: PathFind | |||
JSS(success); // rpc | |||
JSS(supported); // out: AmendmentTableImpl | |||
JSS(sync); // in: Submit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like submit_mode
or maybe just mode
(since it would be a param to a "submit" command, it would be a "submit mode" either way).
Is it possible to asynchronously send an error message to the websocket connection that the client program can still handle later? |
I created an issue for this so it can be considered in a future project: #4587 |
note: blocked until #4505 is ready to merge |
Proposed release notes blurb:
|
Good. Perhaps reiterate that this is only a potential problem if people use multiple rippled servers for the same sending account and rapidly send transactions. Up to you. Otherwise, thanks @mDuo13
|
Add new transaction submission API field, "sync", which determines behavior of the server while submitting transactions: 1) sync (default): Process transactions in a batch immediately, and return only once the transaction has been processed. 2) async: Put transaction into the batch for the next processing interval and return immediately. 3) wait: Put transaction into the batch for the next processing interval and return only after it is processed.
This reverts commit b580049.
Add new transaction submission API field, "sync", which determines behavior of the server while submitting transactions: 1) sync (default): Process transactions in a batch immediately, and return only once the transaction has been processed. 2) async: Put transaction into the batch for the next processing interval and return immediately. 3) wait: Put transaction into the batch for the next processing interval and return only after it is processed.
Add new transaction submission API field, "sync", which determines behavior of the server while submitting transactions: 1) sync (default): Process transactions in a batch immediately, and return only once the transaction has been processed. 2) async: Put transaction into the batch for the next processing interval and return immediately. 3) wait: Put transaction into the batch for the next processing interval and return only after it is processed.
This reverts commit b580049.
Add new transaction submission API field, "sync", which determines behavior of the server while submitting transactions: 1) sync (default): Process transactions in a batch immediately, and return only once the transaction has been processed. 2) async: Put transaction into the batch for the next processing interval and return immediately. 3) wait: Put transaction into the batch for the next processing interval and return only after it is processed.
This reverts commit 002893f.
This reverts commit 002893f.
This reverts commit 002893f. Therefore two files with conflicts in the automated revert: - src/ripple/rpc/impl/RPCHelpers.h and - src/test/rpc/JSONRPC_test.cpp Those files were manually resolved. There is currently no evidence that any problems were introduced by XRPLF#4504. However something is misbehaving on the current state of develop, and pull request XRPLF#4504 was identified as a possible suspect.
Open question: How would this perform if the transaction rebroadcast interval is reduced?
|
|
…XRPLF#4852) This reverts commit 002893f. There were two files with conflicts in the automated revert: - src/ripple/rpc/impl/RPCHelpers.h and - src/test/rpc/JSONRPC_test.cpp Those files were manually resolved.
…XRPLF#4852) This reverts commit 002893f. There were two files with conflicts in the automated revert: - src/ripple/rpc/impl/RPCHelpers.h and - src/test/rpc/JSONRPC_test.cpp Those files were manually resolved.
This reverts commit b580049.
Add new transaction submission API field, "sync", which determines behavior of the server while submitting transactions: 1) sync (default): Process transactions in a batch immediately, and return only once the transaction has been processed. 2) async: Put transaction into the batch for the next processing interval and return immediately. 3) wait: Put transaction into the batch for the next processing interval and return only after it is processed.
…XRPLF#4852) This reverts commit 002893f. There were two files with conflicts in the automated revert: - src/ripple/rpc/impl/RPCHelpers.h and - src/test/rpc/JSONRPC_test.cpp Those files were manually resolved.
Add new transaction submission API field, "sync_mode", which determines behavior of the server while submitting transactions: 1) sync (default): Process transactions in a batch immediately,
and return only once the transaction has been processed.
2) async: Put transaction into the batch for the next processing
interval and return immediately.
3) wait: Put transaction into the batch for the next processing
interval and return only after it is processed.
This PR is related to 2 others that, when combined, increase throughput significantly:
#4503
#4505
High Level Overview of Change
This improves transaction throughput. For background, transactions are applied to the open ledger in batches. Only one batch is applied at a time. As they are received from either a client or a peer, transactions are added to a batch. If no batches are being processed, then the current batch is processed immediately. Otherwise, it will be processed once the current batch completes. Batches are applied continuously until no more transactions are queued this way. This pattern optimizes both throughput and latency, but only if applying batches does not contend with other activities.
However, the problem is that batch application contends with numerous other activities for the MasterLock and the LedgerMaster lock. As transaction volume increases, so does lock contention. Analysis under heavy transaction load shows that a large amount of time is spent in each transaction batch setting up the open ledger for modification. However, each individual transaction takes a very small amount of time. More importantly, duration preparing to modify the ledger is not affected by the size of the batch! Instead, this duration is related to the amount of transactions in the current open ledger--as transaction volume increases, so does the time it takes to apply each batch.
For example, assume it takes 5ms to prepare the ledger for each batch, and 50us per transaction. Minimizing wall clock time, and therefore lock contention, means minimizing the number of batches. To put in perspective, single transactions submitted just under 5ms apart would consume all available wall clock time, with the vast majority simply preparing the open ledger for modification! That's not a problem if our only workload is applying transactions. But other things need to use the lock, also.
The solution implemented instead attempts to apply batches approximately every 100ms, or 10 times per second. To contrast with the above example, reducing the number of batches this way would reduce open ledger preparation time from nearly 1 full second to only 50ms, while actual transaction processing is a trivial 10ms. That's a 94% reduction in lock contention! That's a contrived example, of course, but it plays out in testing--as transaction volume increases, lock contention decreases and the server is able to process significantly higher volume.
Along with this fix is an enhancement to the submission API. Namely, the existing behavior is to immediately process transactions as they are submitted by the client (but not the peer). However, this tends to diminish the effectiveness of the fix as volume submitted directly to the server increases. This is because more batches are being applied. The problem exhibits itself only under very high volume, but if not addressed will cause problems as livenet volume increases. The API enhancement creates an optional new field called "sync_mode" with the following possible settings:
1) sync (default): Process transactions in a batch immediately,
and return only once the transaction has been processed.
2) async: Put transaction into the batch for the next processing
interval and return immediately. If successful, return a new code: terSUBMITTED.
3) wait: Put transaction into the batch for the next processing
interval and return only after it is processed.
Trade-offs for each option are as follows:
Context of Change
Type of Change