Reduce SSD disk storage requirements for Parity nodes to optimise costs #14

medvedev1088 · 2020-05-03T07:26:46Z

We currently use the following options for Parity:

tracing = "on"
pruning = "archive"

This allows us to call the trace_block JSON RPC API to retrieve traces. With these options Parity consumes more than 4TB of SSD space.

A few optimisation options to explore here:

Use tracing = "on" and pruning = "auto". Test if with this configuration we can still call trace_block for all blocks. This will presumable only save trace data in disk but not trie state history.
If option 1 doesn't work: --pruning-history and --pruning-memory options allow specifying how many latest states to keep in memory, older states will be pruned. With this option we could have light nodes from which we pull latest blocks in the Streaming component.
- We need to test what's the maximum number of trie states that Parity can keep in memory. In Streamer we lag 18 blocks behind the tip. We also need a substantial buffer to account for possible failures in the Streamer that may require pulling much older blocks.
- Running just the light node described above will not be enough. We also need a full archive node, in case we need to pull the entire history starting from block 0. The full archive node doesn't need to run 24/7 though, we can start it daily, sync the state to the latest block, take a snapshot, and delete. Assuming blocks can be synced 10 times faster than new blocks are mined the full archive node will need to be run only 2.4 hours a day, which can potentially reduce the cost by the factor of 10. This node will also be used for daily scrapes in Airflow. For this scenario we can run a dedicated Kubernetes cluster, separate from the light node cluster. Explore Argo - workflow and pipeline management in Kubernentes - for running the daily jobs (spin up the node from latest snapshot, wait until it's synced, stop, take disk snapshot, delete the node). Also consider Airflow/Composer for this purpose (some work on this has been started here).
- A side benefit of the above is that the snapshots of full archival node can be shared with the community.
- The same approach with light and full nodes can be used when we migrate from Parity to Geth in the future.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce SSD disk storage requirements for Parity nodes to optimise costs #14

Reduce SSD disk storage requirements for Parity nodes to optimise costs #14

medvedev1088 commented May 3, 2020 •

edited

Loading

Reduce SSD disk storage requirements for Parity nodes to optimise costs #14

Reduce SSD disk storage requirements for Parity nodes to optimise costs #14

Comments

medvedev1088 commented May 3, 2020 • edited Loading

medvedev1088 commented May 3, 2020 •

edited

Loading