Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RIPD-1847 fix select query condition #4955

Merged
merged 1 commit into from
Mar 21, 2024

Conversation

oleks-rip
Copy link
Collaborator

@oleks-rip oleks-rip commented Mar 18, 2024

High Level Overview of Change

perf: improve account_tx SQL query:

The witness server makes heavily use of the account_tx RPC command. Perf
testing showed that the SQL query used by account_tx became unacceptably slow
when the DB was large and there was a marker parameter. The plan for the query
showed only indexed reads. This appears to be an issue with the internal SQLite
optimizer. This patch rewrote the query to use UNION instead of OR and
significantly improves performance. See RXI-896 and RIPD-1847 for more details.

Example

was:

SELECT AccountTransactions.LedgerSeq,AccountTransactions.TxnSeq,
  Status,RawTxn,TxnMeta
  FROM AccountTransactions, Transactions WHERE
  (AccountTransactions.TransID = Transactions.TransID AND
  AccountTransactions.Account = 'rHb9CJAWyB4rj91VRWn96DkukG4bwdtyTh' AND
  AccountTransactions.LedgerSeq BETWEEN '415398' AND '415441')
  OR
  (AccountTransactions.TransID = Transactions.TransID AND
  AccountTransactions.Account = 'rHb9CJAWyB4rj91VRWn96DkukG4bwdtyTh' AND
  AccountTransactions.LedgerSeq = '415442' AND
  AccountTransactions.TxnSeq <= '152')
  ORDER BY AccountTransactions.LedgerSeq DESC,
  AccountTransactions.TxnSeq DESC
  LIMIT 400;

became:

SELECT AccountTransactions.LedgerSeq,AccountTransactions.TxnSeq,Status,RawTxn,TxnMeta
  FROM AccountTransactions, Transactions WHERE
  (AccountTransactions.TransID = Transactions.TransID AND
  AccountTransactions.Account = 'rHb9CJAWyB4rj91VRWn96DkukG4bwdtyTh' AND
  AccountTransactions.LedgerSeq BETWEEN 415398 AND 415441)
UNION
SELECT AccountTransactions.LedgerSeq,AccountTransactions.TxnSeq,Status,RawTxn,TxnMeta
  FROM AccountTransactions, Transactions WHERE
  (AccountTransactions.TransID = Transactions.TransID AND
  AccountTransactions.Account = 'rHb9CJAWyB4rj91VRWn96DkukG4bwdtyTh' AND
  AccountTransactions.LedgerSeq = 415442 AND
  AccountTransactions.TxnSeq <= 152)
  ORDER BY AccountTransactions.LedgerSeq DESC,
  AccountTransactions.TxnSeq DESC
  LIMIT 400;

Type of Change

  • Bug fix (non-breaking change which fixes an issue)

The witness server makes heavily use of the `account_tx` RPC command. Perf
testing showed that the SQL query used by `account_tx` became unacceptably slow
when the DB was large and there was a `marker` parameter. The plan for the query
showed only indexed reads. This appears to be an issue with the internal SQLite
optimizer. This patch rewrote the query to use `UNION` instead of `OR` and
significantly improves performance. See RXI-896 and RIPD-1847 for more details.
@oleks-rip oleks-rip marked this pull request as ready for review March 20, 2024 17:05
@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 76.95%. Comparing base (69143d7) to head (9a0c4df).

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #4955   +/-   ##
========================================
  Coverage    76.95%   76.95%           
========================================
  Files         1127     1127           
  Lines       131696   131695    -1     
  Branches     39578    39520   -58     
========================================
+ Hits        101341   101345    +4     
+ Misses       24440    24398   -42     
- Partials      5915     5952   +37     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@seelabs seelabs added the Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. label Mar 21, 2024
@seelabs seelabs merged commit 2e9261c into XRPLF:develop Mar 21, 2024
17 checks passed
legleux added a commit to legleux/rippled that referenced this pull request Apr 12, 2024
* Price Oracle (XLS-47d): (XRPLF#4789) (XRPLF#4789)

Implement native support for Price Oracles.

 A Price Oracle is used to bring real-world data, such as market prices,
 onto the blockchain, enabling dApps to access and utilize information
 that resides outside the blockchain.

 Add Price Oracle functionality:
 - OracleSet: create or update the Oracle object
 - OracleDelete: delete the Oracle object

 To support this functionality add:
 - New RPC method, `get_aggregate_price`, to calculate aggregate price for a token pair of the specified oracles
 - `ltOracle` object

 The `ltOracle` object maintains:
 - Oracle Owner's account
 - Oracle's metadata
 - Up to ten token pairs with the scaled price
 - The last update time the token pairs were updated

 Add Oracle unit-tests

* fix compile error on gcc 13: (XRPLF#4932)

The compilation fails due to an issue in the initializer list
of an optional argument, which holds a vector of pairs.
The code compiles correctly on earlier gcc versions, but fails on gcc 13.

* Set version to 2.2.0-b1

* Remove default ctors from SecretKey and PublicKey: (XRPLF#4607)

* It is now an invariant that all constructed Public Keys are valid,
  non-empty and contain 33 bytes of data.
* Additionally, the memory footprint of the PublicKey class is reduced.
  The size_ data member is declared as static.
* Distinguish and identify the PublisherList retrieved from the local
  config file, versus the ones obtained from other validators.
* Fixes XRPLF#2942

* Fast base58 codec: (XRPLF#4327)

This algorithm is about an order of magnitude faster than the existing
algorithm (about 10x faster for encoding and about 15x faster for
decoding - including the double hash for the checksum). The algorithms
use gcc's int128 (fast MS version will have to wait, in the meantime MS
falls back to the slow code).

* feat: add user version of `feature` RPC (XRPLF#4781)

* uses same formatting as admin RPC
* hides potentially sensitive data

* build: add STCurrency.h to xrpl_core to fix clio build (XRPLF#4939)

* Embed patched recipe for RocksDB 6.29.5 (XRPLF#4947)

* fix: order book update variable swap: (XRPLF#4890)

This is likely the result of a typo when the code was simplified.

* Fix workflows (XRPLF#4948)

The problem was `CONAN_USERNAME` environment variable, which Conan 1.x uses as the default user in package references.

* Upgrade to xxhash 0.8.2 as a Conan requirement, enable SIMD hashing (XRPLF#4893)

We are currently using old version 0.6.2 of `xxhash`, as a verbatim copy and paste of its header file `xxhash.h`. Switch to the more recent version 0.8.2. Since this version is in Conan Center (and properly protects its ABI by keeping the state object incomplete), add it as a Conan requirement. Switch to the SIMD instructions (in the new `XXH3` family) supported by the new version.

* Update remaining actions (XRPLF#4949)

Downgrade {upload,download}-artifact action to v3 because of unreliability with v4.

* Install more public headers (XRPLF#4940)

Fixes some mistakes in XRPLF#4885

* test: Env unit test RPC errors return a unique result: (XRPLF#4877)

* telENV_RPC_FAILED is a new code, reserved exclusively
  for unit tests when RPC fails. This will
  make those types of errors distinct and easier to test
  for when expected and/or diagnose when not.
* Output RPC command result when result is not expected.

* Fix workflows (XRPLF#4951)

- Update container for Doxygen workflow. Matches Linux workflow, with newer GLIBC version required by newer actions.
- Fixes macOS workflow to install and configure Conan correctly. Still fails on tests, but that does not seem attributable to the workflow.

* perf: improve `account_tx` SQL query: (XRPLF#4955)

The witness server makes heavily use of the `account_tx` RPC command. Perf
testing showed that the SQL query used by `account_tx` became unacceptably slow
when the DB was large and there was a `marker` parameter. The plan for the query
showed only indexed reads. This appears to be an issue with the internal SQLite
optimizer. This patch rewrote the query to use `UNION` instead of `OR` and
significantly improves performance. See RXI-896 and RIPD-1847 for more details.

* `fixEmptyDID`: fix amendment to handle empty DID edge case: (XRPLF#4950)

This amendment fixes an edge case where an empty DID object can be
created. It adds an additional check to ensure that DIDs are
non-empty when created, and returns a `tecEMPTY_DID` error if the DID
would be empty.

* Enforce no duplicate slots from incoming connections: (XRPLF#4944)

We do not currently enforce that incoming peer connection does not have
remote_endpoint which is already used (either by incoming or outgoing
connection), hence already stored in slots_. If we happen to receive a
connection from such a duplicate remote_endpoint, it will eventually result in a
crash (when disconnecting) or weird behavior (when updating slot state), as a
result of an apparently matching remote_endpoint in slots_ being used by a
different connection.

* Remove zaphod.alloy.ee hub from default server list: (XRPLF#4903)

Remove the zaphod.alloy.ee hubs from the bootstrap and default configuration after 5 years. It has been an honor to run these servers, but it is now time for another entity to step into this role.

The zaphod servers will be taken offline in a phased manner keeping all those who have peering arrangements informed.

These would be the preferred attributes of a boostrap set of hubs:

    1. Commitment to run the hubs for a minimum of 2 years
    2. Highly available
    3. Geographically dispersed
    4. Secure and up to date
    5. Committed to ensure that peering information is kept private

* Write improved `forAllApiVersions` used in NetworkOPs (XRPLF#4833)

* Don't reach consensus as quickly if no other proposals seen: (XRPLF#4763)

This fixes a case where a peer can desync under a certain timing
circumstance--if it reaches a certain point in consensus before it receives
proposals. 

This was noticed under high transaction volumes. Namely, when we arrive at the
point of deciding whether consensus is reached after minimum establish phase
duration but before having received any proposals. This could be caused by
finishing the previous round slightly faster and/or having some delay in
receiving proposals. Existing behavior arrives at consensus immediately after
the minimum establish duration with no proposals. This causes us to desync
because we then close a non-validated ledger. The change in this PR causes us to
wait for a configured threshold before making the decision to arrive at
consensus with no proposals. This allows validators to catch up and for brief
delays in receiving proposals to be absorbed. There should be no drawback since,
with no proposals coming in, we needn't be in a huge rush to jump ahead.

* fixXChainRewardRounding: round reward shares down: (XRPLF#4933)

When calculating reward shares, the amount should always be rounded
down. If the `fixUniversalNumber` amendment is not active, this works
correctly. If it is not active, then the amount is incorrectly rounded
up. This patch introduces an amendment so it will be rounded down.

* Remove unused files

* Remove packaging scripts

* Consolidate external libraries

* Simplify protobuf generation

* Rename .hpp to .h

* Format formerly .hpp files

* Rewrite includes

$ find src/ripple/ src/test/ -type f -exec sed -i 's:include\s*["<]ripple/\(.*\)\.h\(pp\)\?[">]:include <ripple/\1.h>:' {} +

* Fix source lists

* Add markers around source lists

* fix: improper handling of large synthetic AMM offers:

A large synthetic offer was not handled correctly in the payment engine.
This patch fixes that issue and introduces a new invariant check while
processing synthetic offers.

* Set version to 2.1.1

* chore: change Github Action triggers for build/test jobs (XRPLF#4956)

Github Actions for the build/test jobs (nix.yml, mac.yml, windows.yml) will only run on branches that build packages (develop, release, master), and branches with names starting with "ci/". This is intended as a compromise between disabling CI jobs on personal forks entirely, and having the jobs run as a free-for-all. Note that it will not affect PR jobs at all.

* Address compiler warnings

* Fix search for protoc

* chore: Default validator-keys-tool to master branch: (XRPLF#4943)

* master is the default branch for that project. There's no point in
  using develop.

* Remove unused lambdas from MultiApiJson_test

* fix Conan component reference typo

* Set version to 2.2.0-b2

* bump version

* 2.2.3

* 2.2.4

* 2.2.5

---------

Co-authored-by: Gregory Tsipenyuk <[email protected]>
Co-authored-by: seelabs <[email protected]>
Co-authored-by: Chenna Keshava B S <[email protected]>
Co-authored-by: Mayukha Vadari <[email protected]>
Co-authored-by: John Freeman <[email protected]>
Co-authored-by: Bronek Kozicki <[email protected]>
Co-authored-by: Ed Hennis <[email protected]>
Co-authored-by: Olek <[email protected]>
Co-authored-by: Alloy Networks <[email protected]>
Co-authored-by: Mark Travis <[email protected]>
Co-authored-by: Gregory Tsipenyuk <[email protected]>
sophiax851 pushed a commit to sophiax851/rippled that referenced this pull request Jun 12, 2024
The witness server makes heavily use of the `account_tx` RPC command. Perf
testing showed that the SQL query used by `account_tx` became unacceptably slow
when the DB was large and there was a `marker` parameter. The plan for the query
showed only indexed reads. This appears to be an issue with the internal SQLite
optimizer. This patch rewrote the query to use `UNION` instead of `OR` and
significantly improves performance. See RXI-896 and RIPD-1847 for more details.
@intelliot
Copy link
Collaborator

intelliot commented Jul 8, 2024

During Sidechain performance testing, we (@sophiax851 and team) found that account_tx's response time increased from a few milliseconds to over 3 seconds when its Witness server pulled transactions from the issuing chain using account_tx for both the door_account and the Witness server's transaction submission account. Three specific examples were found.

  1. When the Witness server receives a validated ledger from the ledger subscription stream.
  2. One of the subsequent calls issued after receiving the response of the initial account_tx which involves marker for stepping through the transactions.
  3. A slow response time when getting transactions from a single ledger: 2982941 microseconds.

There were frequent 2-10 sec delays. We produced a stack trace of the method (Node.cpp:1185) that was causing the slowness. The root cause is the slow database read in accountTxPage (backend/detail/impl/Node.cpp line 1146).

The original query was using index during normal cases, which was showing by the execution plan when query manually, however, apparently at run time, sqlite optimizer would decide to do full table scan as explained in here. Olek found that the delay could be eliminated by improving the query to use "UNION" (instead of "OR"). The change enables use of an index, which most likely improves the query complexity from O(N) to O(logN) (with some fixed constant, dependent on the disk IO speed).

Fixing this bottleneck also improved the response time of account_tx in certain situations - in particular, as the difference between ledger_index_min and ledger_index_max increases (regardless of the limit used).

In May 2024, infra providers found that RPC nodes were crashing in lockstep. In particular, while running account_tx/tx, the nodes would try to find offers to AMM, and continue searching for a long time. Eventually the nodes deadlock and crash. It was found that this was the same issue that was found during Sidechain/Witness perf testing. In particular, account_tx requests with some ledger_index range, e.g.:

        "ledger_index_max" : 87890671,
        "ledger_index_min" : 87890662,

and a marker - can cause slowness.

Clio was not affected.

This PR fixes the issue in rippled version 2.2.0.

In the long run, we could revisit the SQL engine choice or its use by rippled. It’s single threaded currently, even with reading, which was causing the rippled to stall when one query is doing full table scan. In the very long run, the solution is to use Clio instead.

@oleks-rip oleks-rip deleted the fix_req_condition branch November 19, 2024 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants