Skip to content

chore[storage]: Limit the total size of RocksDB WAL files#1518

Merged
luislhl merged 4 commits intomasterfrom
chore/storage/limit-rocksdb-wal-files
Feb 6, 2026
Merged

chore[storage]: Limit the total size of RocksDB WAL files#1518
luislhl merged 4 commits intomasterfrom
chore/storage/limit-rocksdb-wal-files

Conversation

@luislhl
Copy link
Contributor

@luislhl luislhl commented Dec 16, 2025

Motivation

We have been observing cases of long-running fullnodes where the size of the RocksDB WAL files (the .log files) grow indefinitely, usually up to fill most of the available space in the disk. In one of the cases, we found 13 GB of .log files.

Acceptance Criteria

  • We should set max_total_wal_size to 3 GB, which will cause RocksDB to flush all in-memory memtable data to .sst files and clean up the .log files when they reach this size in total
  • We should have new sysctl commands:
    • storage.rocksdb.flush=: This will make RocksDB flush the memory (and consequently clean up the .log files). It's a way to trigger the process manually.
    • storage.rocksdb.wal_stats: This will show information about the size each column-family is holding in WAL files. It's useful to check whether the other comand worked for all column-families, for instance

Testing

To test this, you will need to:

  1. Apply this git diff locally, since we depend on changes in the python-bindings:
diff --git a/pyproject.toml b/pyproject.toml
index 9787888b..0f0930c2 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -72,7 +72,7 @@ service_identity = "~21.1.0"
 pexpect = "~4.8.0"
 sortedcontainers = "~2.4.0"
 structlog = "~22.3.0"
-rocksdb = {git = "https://github.com/hathornetwork/python-rocksdb.git"}
+rocksdb = {git = "https://github.com/hathornetwork/python-rocksdb.git", branch = "chore/max_total_wal_size_option"}
 aiohttp = "~3.10.3"
 idna = "~3.4"
 setproctitle = "^1.3.3"
  1. Run `poetry lock && poetry update rocksdb

  2. Change the max_total_wal_size to 3 MB, so that we can observe RocksDB behavior with it:

diff --git a/hathor/storage/rocksdb_storage.py b/hathor/storage/rocksdb_storage.py
index 640dbb06..4ddb19b1 100644
--- a/hathor/storage/rocksdb_storage.py
+++ b/hathor/storage/rocksdb_storage.py
@@ -50,7 +50,7 @@ class RocksDBStorage:
             # This limits the total size of WAL files (the .log files) in RocksDB.
             # When reached, a flush is triggered by RocksDB to free up space.
             # This was added because we had cases where these files would accumulate and use too much disk space.
-            max_total_wal_size=3 * 1024 * 1024 * 1024,  # 3GB
+            max_total_wal_size=3 * 1024 * 1024,  # 3MB
         )
 
         cf_names: list[bytes]
  1. Run mkdir data-testnet-india

  2. Run this command to start the node and sync with testnet-india:

./hathor-cli run_node \
  --testnet \
  --data ./data-testnet-india \
  --listen tcp:40405 \
  --status 8081 \
  --wallet-index \
  --nc-indexes \
  --sysctl "unix:/tmp/sysctl.sock"
  1. Observe how the .log file(s) in the data folder will never grow bigger than 3 MB, and will get rotated constantly (you can see this by the number in their name):
# Run this multiple times
ls -lah data-testnet-india/data_v2.db/*.log
  1. Stop the fullnode, undo the change we did to max_total_wal_size so it gets back to 3 GB

  2. Start the fullnode again

  3. Run nc -U /tmp/sysctl.sock and send the command storage.rocksdb.flush=. You should check the .log files before and after running it, and they should decrease in size and get rotated. You may want to wait a little for them to grow bigger before doing so, they should grow quickly during the sync with the network.

Checklist

  • If you are requesting a merge into master, confirm this code is production-ready and can be included in future releases as soon as it gets merged

@luislhl luislhl force-pushed the chore/storage/limit-rocksdb-wal-files branch from 124a8e8 to 243c38b Compare February 4, 2026 20:17
@luislhl luislhl moved this from In Progress (WIP) to In Progress (Done) in Hathor Network Feb 4, 2026
msbrogli
msbrogli previously approved these changes Feb 5, 2026
@luislhl luislhl moved this from In Progress (Done) to In Review (WIP) in Hathor Network Feb 5, 2026
@github-actions
Copy link

github-actions bot commented Feb 5, 2026

🐰 Bencher Report

Branchchore/storage/limit-rocksdb-wal-files
Testbedubuntu-22.04
Click to view all benchmark results
BenchmarkLatencyBenchmark Result
minutes (m)
(Result Δ%)
Lower Boundary
minutes (m)
(Limit %)
Upper Boundary
minutes (m)
(Limit %)
sync-v2 (up to 20000 blocks)📈 view plot
🚷 view threshold
1.69 m
(-1.17%)Baseline: 1.71 m
1.54 m
(91.06%)
2.06 m
(82.36%)
🐰 View full continuous benchmarking report in Bencher

@codecov
Copy link

codecov bot commented Feb 5, 2026

Codecov Report

❌ Patch coverage is 77.41935% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.70%. Comparing base (77b2dbd) to head (8b6b684).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
hathor/sysctl/storage/manager.py 77.19% 10 Missing and 3 partials ⚠️
hathor/builder/sysctl_builder.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1518      +/-   ##
==========================================
- Coverage   85.76%   85.70%   -0.06%     
==========================================
  Files         439      441       +2     
  Lines       33767    33829      +62     
  Branches     5277     5285       +8     
==========================================
+ Hits        28960    28993      +33     
- Misses       3795     3817      +22     
- Partials     1012     1019       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@luislhl luislhl enabled auto-merge (squash) February 6, 2026 14:47
@luislhl luislhl moved this from In Review (WIP) to In Progress (Done) in Hathor Network Feb 6, 2026
@luislhl luislhl moved this from In Progress (Done) to In Review (WIP) in Hathor Network Feb 6, 2026
Copy link
Member

@jansegre jansegre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we only need to bump python-rocksdb's version before merging.

@luislhl luislhl merged commit 582c472 into master Feb 6, 2026
14 checks passed
@luislhl luislhl deleted the chore/storage/limit-rocksdb-wal-files branch February 6, 2026 16:14
@github-project-automation github-project-automation bot moved this from In Review (WIP) to Waiting to be deployed in Hathor Network Feb 6, 2026
r4mmer added a commit that referenced this pull request Feb 24, 2026
…print-move-1

* origin/master:
  feat: pydantic settings (#1600)
  fix[thin_wallet]: handle address history invalid tx version (#1590)
  refactor(nano): Make NCBlockExecutor a pure executor with no side effects
  fix[nginx]: Make sure we trust the GCP IPs to get the real client IP (#1595)
  refactor: Upgrade to Pydantic v2
  chore(github): Split GitHub main action into lint, test-cli, test-lib, test-other
  fix[nginx]: Use a larger buffer size for /v1a/status (#1594)
  chore: adjust testnet config for v0.69.0 release
  chore[storage]: Limit the total size of RocksDB WAL files (#1518)
  chore: adjust testnet config for v0.69.0 release
  chore: configure feature activations for v0.69.0 release
  refactor: wallet on_new_tx (#1561)
  refactor(nano): Remove dead reorg cleanup code from block executor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Waiting to be deployed

Development

Successfully merging this pull request may close these issues.

3 participants