High-Intensity JSON RPC BatchRequest Causes Server Crash and Unpredictable Downtime #4520

Cupnfish · 2024-07-16T03:18:12Z

Bug Report

Current Behavior

When subjected to high-intensity JSON RPC BatchRequest, the CKB node causes the entire server to crash, rendering it completely inaccessible. The crash occurs after a certain batch size threshold is exceeded, and even after stopping the requests, the server will still crash at an unpredictable time in the future.

Expected Behavior

I expect the node to send an error message to the client when it cannot handle high-intensity batch sizes, rather than attempting to process the requests and causing the server to crash. Ideally, the node should shut down to prevent affecting the entire server.

Environment

CKB version: ckb 0.116.1
Chain: testnet, mainnet (both have experienced the issue)
Operating system: Ubuntu 20.04
Arch: x64
Installation: GitHub Release

Additional context/Screenshots

The issue was first observed when my indexer node encountered the bug and stopped working. I had used high-frequency batch sizes for a few seconds before stopping the requests. Approximately two hours later, the server suddenly became unresponsive.

eval-exec · 2024-07-16T03:22:01Z

Hello, What are the total memory, available memory, and CPU configuration of this server?

high-intensity JSON RPC BatchRequest

Which RPC causes the entire server to crash?

the CKB node causes the entire server to crash

Can you check the logs of the last crash? Can you find out why it crashed (most likely due to OOM)? You can use sudo journalctl -b -1 to check.

chenyukang · 2024-07-16T03:26:22Z

I think this PR may help to resolve this issue:
#4459

it's on 117 version.

chenyukang · 2024-07-16T03:30:08Z

we have another working in progress PR try to limit the resource spent on heavy RPC:
#4469

Cupnfish · 2024-07-16T03:42:06Z

@eval-exec

The server's configuration is not critical. Previously, the server with lower specifications would crash when the batch size reached 1000. With the current server's improved specifications, it crashes when the batch size reaches 2000. The interface causing the crash is get_block_by_number.

Cupnfish · 2024-07-16T03:45:54Z

Thank you for your efforts, @chenyukang ! We're eagerly looking forward to the launch of this feature to optimize resource usage. Currently, I've already determined the batch size that our server can tolerate, and to avoid any further crashes, I won't be conducting related tests for now. However, I believe this improvement will have a positive impact on our system, and I'm looking forward to the resolution of this issue.

quake · 2024-07-17T14:00:59Z

rpc batch mode will execute all the requests concurrently and wait for them to complete and return together, notice that you are calling get_block_by_number rpc, because the returning result is a full block data in json format, the size will be very large, a single batch request can easily use up several gigabytes of memory. If there are multiple concurrent batch requests, this will result in OOM.

suggest to limit the batch size to a smaller number and use verbosity=0 to return the block data in molecule format which is much smaller then json. ref: https://github.com/nervosnetwork/ckb/tree/develop/rpc#method-get_block_by_number

eval-exec · 2024-07-19T03:12:16Z

@Cupnfish Could you provide the output of free -h?

Cupnfish · 2024-07-19T09:18:17Z

@eval-exec

free -h
               total        used        free      shared  buff/cache   available
Mem:           125Gi       7.2Gi       2.0Gi        40Mi       117Gi       118Gi
Swap:             0B          0B          0B

chenyukang · 2024-08-13T12:43:46Z

@Cupnfish in the future release of ckb, you may also add this RPC configuration to limit the batch size of RPC request:

https://github.com/nervosnetwork/ckb/pull/4529/files#diff-d6c5e396f46525d03037cb71857d668d04802896dee4faf52caf8a1619c22b41R137

Cupnfish added the t:bug Type: This doesn't seem right. label Jul 16, 2024

eval-exec added the b:rpc Break RPC interface label Jul 16, 2024

chenyukang mentioned this issue Jul 17, 2024

Add jsonrpc batch request limit #4529

Merged

driftluo mentioned this issue Aug 9, 2024

fix: add limit to get cells #4576

Merged

quake closed this as completed in #4576 Aug 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High-Intensity JSON RPC BatchRequest Causes Server Crash and Unpredictable Downtime #4520

High-Intensity JSON RPC BatchRequest Causes Server Crash and Unpredictable Downtime #4520

Cupnfish commented Jul 16, 2024

eval-exec commented Jul 16, 2024 •

edited

Loading

chenyukang commented Jul 16, 2024 •

edited

Loading

chenyukang commented Jul 16, 2024

Cupnfish commented Jul 16, 2024

Cupnfish commented Jul 16, 2024

quake commented Jul 17, 2024

eval-exec commented Jul 19, 2024

Cupnfish commented Jul 19, 2024

chenyukang commented Aug 13, 2024

High-Intensity JSON RPC BatchRequest Causes Server Crash and Unpredictable Downtime #4520

High-Intensity JSON RPC BatchRequest Causes Server Crash and Unpredictable Downtime #4520

Comments

Cupnfish commented Jul 16, 2024

Bug Report

Current Behavior

Expected Behavior

Environment

Additional context/Screenshots

eval-exec commented Jul 16, 2024 • edited Loading

chenyukang commented Jul 16, 2024 • edited Loading

chenyukang commented Jul 16, 2024

Cupnfish commented Jul 16, 2024

Cupnfish commented Jul 16, 2024

quake commented Jul 17, 2024

eval-exec commented Jul 19, 2024

Cupnfish commented Jul 19, 2024

chenyukang commented Aug 13, 2024

eval-exec commented Jul 16, 2024 •

edited

Loading

chenyukang commented Jul 16, 2024 •

edited

Loading