Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High-Intensity JSON RPC BatchRequest Causes Server Crash and Unpredictable Downtime #4520

Closed
Cupnfish opened this issue Jul 16, 2024 · 9 comments · Fixed by #4576
Closed
Labels
b:rpc Break RPC interface t:bug Type: This doesn't seem right.

Comments

@Cupnfish
Copy link

Bug Report

Current Behavior

When subjected to high-intensity JSON RPC BatchRequest, the CKB node causes the entire server to crash, rendering it completely inaccessible. The crash occurs after a certain batch size threshold is exceeded, and even after stopping the requests, the server will still crash at an unpredictable time in the future.

Expected Behavior

I expect the node to send an error message to the client when it cannot handle high-intensity batch sizes, rather than attempting to process the requests and causing the server to crash. Ideally, the node should shut down to prevent affecting the entire server.

Environment

  • CKB version: ckb 0.116.1
  • Chain: testnet, mainnet (both have experienced the issue)
  • Operating system: Ubuntu 20.04
  • Arch: x64
  • Installation: GitHub Release

Additional context/Screenshots

The issue was first observed when my indexer node encountered the bug and stopped working. I had used high-frequency batch sizes for a few seconds before stopping the requests. Approximately two hours later, the server suddenly became unresponsive.

@Cupnfish Cupnfish added the t:bug Type: This doesn't seem right. label Jul 16, 2024
@eval-exec
Copy link
Collaborator

eval-exec commented Jul 16, 2024

Hello, What are the total memory, available memory, and CPU configuration of this server?

high-intensity JSON RPC BatchRequest

Which RPC causes the entire server to crash?

the CKB node causes the entire server to crash

Can you check the logs of the last crash? Can you find out why it crashed (most likely due to OOM)? You can use sudo journalctl -b -1 to check.

@eval-exec eval-exec added the b:rpc Break RPC interface label Jul 16, 2024
@chenyukang
Copy link
Collaborator

chenyukang commented Jul 16, 2024

I think this PR may help to resolve this issue:
#4459

it's on 117 version.

@chenyukang
Copy link
Collaborator

we have another working in progress PR try to limit the resource spent on heavy RPC:
#4469

@Cupnfish
Copy link
Author

@eval-exec

The server's configuration is not critical. Previously, the server with lower specifications would crash when the batch size reached 1000. With the current server's improved specifications, it crashes when the batch size reaches 2000. The interface causing the crash is get_block_by_number.

@Cupnfish
Copy link
Author

Thank you for your efforts, @chenyukang ! We're eagerly looking forward to the launch of this feature to optimize resource usage. Currently, I've already determined the batch size that our server can tolerate, and to avoid any further crashes, I won't be conducting related tests for now. However, I believe this improvement will have a positive impact on our system, and I'm looking forward to the resolution of this issue.

@quake
Copy link
Member

quake commented Jul 17, 2024

rpc batch mode will execute all the requests concurrently and wait for them to complete and return together, notice that you are calling get_block_by_number rpc, because the returning result is a full block data in json format, the size will be very large, a single batch request can easily use up several gigabytes of memory. If there are multiple concurrent batch requests, this will result in OOM.

suggest to limit the batch size to a smaller number and use verbosity=0 to return the block data in molecule format which is much smaller then json. ref: https://github.com/nervosnetwork/ckb/tree/develop/rpc#method-get_block_by_number

@eval-exec
Copy link
Collaborator

@Cupnfish Could you provide the output of free -h?

@Cupnfish
Copy link
Author

@eval-exec

free -h
               total        used        free      shared  buff/cache   available
Mem:           125Gi       7.2Gi       2.0Gi        40Mi       117Gi       118Gi
Swap:             0B          0B          0B

@chenyukang
Copy link
Collaborator

@Cupnfish in the future release of ckb, you may also add this RPC configuration to limit the batch size of RPC request:

https://github.com/nervosnetwork/ckb/pull/4529/files#diff-d6c5e396f46525d03037cb71857d668d04802896dee4faf52caf8a1619c22b41R137

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
b:rpc Break RPC interface t:bug Type: This doesn't seem right.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants