Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

book_changes takes a long time querying full-history servers #5036

Closed
mDuo13 opened this issue Jun 3, 2024 · 4 comments
Closed

book_changes takes a long time querying full-history servers #5036

mDuo13 opened this issue Jun 3, 2024 · 4 comments
Labels

Comments

@mDuo13
Copy link
Collaborator

mDuo13 commented Jun 3, 2024

Issue Description

When calling the book_changes method using WebSocket or JSON-RPC, asking for data from older ledgers leads to a long delay or even a request timeout.

Steps to Reproduce

  1. Connect to a full-history rippled server via WebSocket
  2. Send the following request:
{
  "command": "book_changes",
  "ledger_index": 72381103
}

Expected Result

Server should respond quickly with a value like this:

{
  "result": {
    "type": "bookChanges",
    "ledger_hash": "33C53CF838DEE3C1F74E0BA8BEDEE28D2527E11B1C53846A22E9DBB33EC048C2",
    "ledger_index": 72381103,
    "ledger_time": 708708711,
    "validated": true,
    "changes": [
      {
        "currency_a": "XRP_drops",
        "currency_b": "rLqUC2eCPohYvJCEBJ77eCCqVL2uEiczjA/5852646F67650000000000000000000000000000",
        "volume_a": "520687381",
        "volume_b": "8568683",
        "high": "60.76767128172012",
        "low": "60.76159531559149",
        "open": "60.76159531559149",
        "close": "60.76767128172012"
      },
      {
        "currency_a": "XRP_drops",
        "currency_b": "rhLr8bGvHvBgYXAHNPyXrQAcKGrQ2X5nU4/5854726976694100000000000000000000000000",
        "volume_a": "3852",
        "volume_b": "4950495.0495049",
        "high": "0.000778104000000008",
        "low": "0.000778104000000008",
        "open": "0.000778104000000008",
        "close": "0.000778104000000008"
      },
      {
        "currency_a": "rKWH2NPtXn636uodWJ3dLLM7wHmyG2qpz6/KWH",
        "currency_b": "rTGoNeK6vpu28U2oE5tiqhH3x8z2jBhxv/TGO",
        "volume_a": "0.8",
        "volume_b": "8",
        "high": "0.1",
        "low": "0.1",
        "open": "0.1",
        "close": "0.1"
      }
    ]
  },
  "status": "success",
  "type": "response"
}

Actual Result

Server takes a long time (>30s) to respond if it responds at all. On xrplcluster.com this results in the gateway returning a 504 error.

For more recent ledgers, the server responds quickly with a valid response, so this has something to do with fetching older data specifically. It's unclear how old a ledger you have to ask for before it becomes a problem.

Environment

Tested on xrplcluster, with servers running (I think) rippled 2.1.0 and confirmed by Wietse querying FH servers directly.

Note that this issue does not seem to happen to Clio servers, so I could not reproduce on s1.ripple.com.

@mDuo13 mDuo13 added the Bug label Jun 3, 2024
@ckeshava
Copy link
Collaborator

This method RPC::getLedgerByContext takes up a lot of time to retrieve an old ledger. This method is used only in RPC calls book_changes and ledger_request.

I do not have appropriate permissions to verify my suspicion -- that ledger_request RPC also takes up an abnormal amount of time on Full-History servers. @mDuo13 can you run this RPC on any FH node?

On the other hand, RPC::lookupLedger completes its requests quickly. This method is used in RPCs ledger_data, ledger_header etc which can fetch old ledgers in a reasonable amount of time.

The former method appears to be slower than the latter, because it does not leverage context.ledgerMaster.getLedgerByHash() (this completes very quickly on a FH node), but instead acquires the specified ledger through InboundLedger.acquire class (perhaps, non-full-history nodes benefit from this approach ?).

@mDuo13 This also explains your observations in #5033. The logs from InboundLedger.acquire() appear to leak into the output

@ckeshava
Copy link
Collaborator

On that note, a lot of the ledger-retrieval related RPC commands seem very similar -- I've come across ledger, ledger_request, ledger_data and ledger_header (I know that the last one is being deprecated).

What was the rationale for introducing so many of the RPC commands?

@Bronek
Copy link
Collaborator

Bronek commented Sep 16, 2024

This method RPC::getLedgerByContext takes up a lot of time to retrieve an old ledger. This method is used only in RPC calls book_changes and ledger_request.

The purpose of the ADMIN method ledger_request is request (acquire) the selected ledger from other nodes, and that is what getLedgerByContext, which is why it's slow. I am glad to see that book_changes is now being updated by you to use lookupLedger in #5096 , which does not request ledgers from other nodes (hence is much faster, but it will return ledgerNotFound if the requested ledger is not cached locally), however we still need getLedgerByContext. It just should not be called from a USER method, like book_changes.

I do not have appropriate permissions to verify my suspicion -- that ledger_request RPC also takes up an abnormal amount of time on Full-History servers. @mDuo13 can you run this RPC on any FH node?

Yes, it takes long time because it needs to. That's the cost of retrieving ledger from other nodes.

On the other hand, RPC::lookupLedger completes its requests quickly. This method is used in RPCs ledger_data, ledger_header etc which can fetch old ledgers in a reasonable amount of time.

The former method appears to be slower than the latter, because it does not leverage context.ledgerMaster.getLedgerByHash() (this completes very quickly on a FH node)

This function getLedgerByHash relies only on the local cache, so if the ledger is not in the memory or in the database, it will return an error. This is also why we need to have InboundLedger.acquire() - to acquire ledger from other nodes and store it into the local cache.

@mDuo13 This also explains your observations in #5033. The logs from InboundLedger.acquire() appear to leak into the output

@ckeshava
Copy link
Collaborator

@Bronek thanks, that explanation makes a lot of sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants