Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Websocket should only call async_close once #4848

Merged
merged 3 commits into from
Jan 12, 2024

Conversation

ximinez
Copy link
Collaborator

@ximinez ximinez commented Dec 7, 2023

High Level Overview of Change

Prevents Websocket connections from trying to close twice.

Context of Change

While testing #4822, @lathanbritz discovered that rippled would assert under heavy websocket client load. Later testing determined that this issue was unrelated to that PR, and in fact, exists at least as far back as 1.12.0. In fact, the relevant code appears to be 7 years old. It's entirely possible that the code was fine as originally written, and changes in the underlying Boost library uncovered it.

The issue only occurs in debug builds (assertions are disabled in release builds, including published packages), and when the websocket connections are unprivileged.

More specifically, the issue seems to occur when a client drives up the resource balance enough to be forcibly disconnected while there are still messages pending to be sent. Relevant log messages will look like:

2023-Dec-07 00:02:21.624011245 UTC Resource:WRN Consumer entry 127.0.0.1 dropped with balance 16368 at or above drop threshold 15000

Type of Change

  • [X ] Bug fix (non-breaking change which fixes an issue)

Test Plan

Running the following python script several times in parallel reliably recreates the issue, usually within a few seconds on a debug build from a non-admin host.

from datetime import datetime
from websocket import create_connection
import sys
# ws = create_connection('ws://badger-w:51233')
ws = create_connection('ws://localhost:51233')
acct="<insert any r-address here>"

while True:
  ws.send('{ "command" : "path_find", "destination_account" : "' + acct + '", "destination_amount" : "10000000", "id" : "example", "source_account" : "' + acct + '", "subcommand" : "create" }')

  ws.send('{ "command" : "ledger", "transactions" : true, "expand" : true }')

  # resp=ws.recv()
  # print(resp)

  ws.send('{ "command" : "ping" }')

  resp=ws.recv()
  print(resp)

  resp=ws.recv()
  print(resp)

  ws.send('{ "command" : "path_find", "subcommand" : "close" }')

  resp=ws.recv()
  print(resp)

  resp=ws.recv()
  print(resp)

  print(datetime.now())

@ximinez ximinez added Bug Dependencies Issues associated with 3rd party dependencies (RocksDB, SQLite, etc) labels Dec 7, 2023
@ximinez ximinez added this to the 2.0.1 (Jan 2024) milestone Dec 7, 2023
@@ -348,6 +356,8 @@ BaseWSPeer<Handler, Impl>::on_write_fin(error_code const& ec)
return fail(ec, "write_fin");
wq_.pop_front();
if (do_close_)
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should do_close_ be reset to false here? What if on_write_fin is called again?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it doesn't look like it's needed since write() is not going to be called in on_write_fin() if do_close_ is true and send() is not going to call write() because closing_ is true. Then I don't think two flags are needed. Can keep do_close_ and have the check on line 264 and remove do_close_ on line 280. The check on line 264 prevents from initiating async_close again in close(). And do_close_ set to true will allow async_close in on_write_fin() if the queue was not empty when close() was called.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question. I don't think that setting do_close_ to false is a good idea, though. Then, if wq_ is not empty we'd call on_write(). We're trying to shut things down.

If it's a problem that we need to address, and it feels like maybe it is, we could add yet another data member bool closed_ and set it true just as soon as we're inside this if. That would make this if look like...

    if (do_close_ && !closed)
    {
        closed = true;
        assert(closing_);
        ...
    }
    else if (!wq_.empty() && !closed)

But I don't like the idea of adding yet another bool. Perhaps it would work better with an enum for a state machine?

enum class Finish {
    not_closing,
    closing,
    do_close,
    closed
};

Just a thought...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback. Coincidentally, I had the realization about not needing closing_ while having dinner, and came in to fix it before I even saw your comments. 😀

Copy link
Collaborator

@gregtatcam gregtatcam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 LGTM

Copy link
Collaborator

@scottschurr scottschurr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@ximinez ximinez added the Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. label Dec 7, 2023
@intelliot intelliot added the Perf impact not expected Change is not expected to improve nor harm performance. label Jan 9, 2024
@intelliot intelliot self-assigned this Jan 9, 2024
@intelliot
Copy link
Collaborator

Suggested commit message:

WebSocket should only call async_close once (#4848)

Prevent WebSocket connections from trying to close twice.

The issue only occurs in debug builds (assertions are disabled in
release builds, including published packages), and when the WebSocket
connections are unprivileged. The assert (and WRN log) occurs when a
client drives up the resource balance enough to be forcibly disconnected
while there are still messages pending to be sent.

Thanks to @lathanbritz for discovering this issue in #4822.

@intelliot
Copy link
Collaborator

note: 694fc8a is mistaken and would need review

@intelliot intelliot removed the Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. label Jan 10, 2024
* upstream/develop:
  Set version to 2.0.0
  Set version to 2.0.0-rc7
  Set version to 2.0.0-rc6
  Revert "Apply transaction batches in periodic intervals (4504)" (4852)
  Revert "Add ProtocolStart and GracefulClose P2P protocol messages (3839)" (4850)
@ximinez
Copy link
Collaborator Author

ximinez commented Jan 10, 2024

Branch is reverted back to the version that was approved by reviewers, and merged with latest develop.

@ximinez ximinez added the Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. label Jan 12, 2024
@ximinez ximinez merged commit 4308407 into XRPLF:develop Jan 12, 2024
16 checks passed
@ximinez ximinez deleted the ws-async-close branch January 12, 2024 01:11
@SaxenaKaustubh
Copy link
Collaborator

LGTM! QA Approved.

sophiax851 pushed a commit to sophiax851/rippled that referenced this pull request Jun 12, 2024
Prevent WebSocket connections from trying to close twice.

The issue only occurs in debug builds (assertions are disabled in
release builds, including published packages), and when the WebSocket
connections are unprivileged. The assert (and WRN log) occurs when a
client drives up the resource balance enough to be forcibly disconnected
while there are still messages pending to be sent.

Thanks to @lathanbritz for discovering this issue in XRPLF#4822.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dependencies Issues associated with 3rd party dependencies (RocksDB, SQLite, etc) Passed Passed code review & PR owner thinks it's ready to merge. Perf sign-off may still be required. Perf impact not expected Change is not expected to improve nor harm performance.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants