Skip to content
This repository has been archived by the owner on Sep 5, 2022. It is now read-only.

Archive nodes #32

Open
tolya-yanot opened this issue Apr 9, 2021 · 2 comments
Open

Archive nodes #32

tolya-yanot opened this issue Apr 9, 2021 · 2 comments

Comments

@tolya-yanot
Copy link
Member

The network must have archive nodes that have a complete history of the network.

Not all TON nodes need to do this.

We need to make instructions for starting archive nodes (including the hardware specification) and then launch the enough number of archive nodes (should all lite-servers be archive nodes?)

@sonofmom
Copy link
Member

sonofmom commented Apr 9, 2021

The Open Network archive node HOWTO

Overview

This guide describes the requirements as well process of launching the TON archive node.

What

Archive node is full node that contains all blocks of network, starting with genesis block. This is important part of our network as it allows to recall it's entire history.

Role

Archive nodes are best paired with LiteServer listener, ideally all public LiteServers on our network should be archive nodes.

Requirements

Historical data is large, as of April 2021 database has a size of about 1 Terrabyte and contains over 1'270'000 files. Running such node requires higher resources then normal Full Node / Validator. You will also need to be savvy as a sysadmin in order to perform some adjustments on your system by hand.

Hardware

As of April 2021, minimum HW requirements are:

  • 4 CPU Cores
  • 32GB of Memory
  • 1.2TB of SSD disk space available to TON Database

Please do note that hardware requirements will likely increase as database grows, be sure that you can either easily resize (cloud servers) or start with a larger configuration to begin with.

Software

You should have configured full node, it does not need to be in sync with network but it should be in process of syncing. Easiest way to launch full node is by using mytonctrl.

Preparation Steps

Increase OS File Descriptors (open files) limits

Because database contains so many files, validator-enigne will attempt to open more then 125k files, this is more then standard limits in most UNIX operating systems. You can check your current limit using ulimit -n, cat /proc/sys/fs/file-max or sysctl -a |grep files on xBSD.

You will need to increase this number to 262144, the way how this is done depends on your OS, here are some guides:

Get proper TON Binaries

The source code used to compile TON binaries should include following commit: newton-blockchain/ton@3db52ff

Master branch includes this commit as of 9th of April 2021.

Prepare file system

You will need to setup and mount large file system where your validator-engine startup scripts expects ton work directory to be (/var/ton-work in case of mytonctrl assisted installation).

IMPORTANT: make sure to backup/move the work directory of your node before you mount something over it.

If you are familiar with ZFS I personally advise you to use it with compression enabled, it saves space (1.9x compression factor for this type of data) and increases small IO. Execute following command before you copy any data on file system: zfs set compression=lz4 <YOUR_TON_WORK_FS>

Make sure that the FS is remounted properly on system reboot, the way how this is done depends on OS, in Ubuntu and most other linuxes via the file /etc/fstab, in case of ZFS this is done via mountpoint parameter on ZFS filesystem (zfs set mountpoint=<YOUR_TON_WORK_DIR_LOCATION> <YOUR_TON_WORK_FS>).

I would advise to reboot your machine once you complete those adjustments to make sure that the new FS will be mounted properly on reboot.

Adjust validator-enigne ttl params

validator-engine will drop data once it's TTL has been exceeded, default values can be seen by invoking validator-engine --help and are:

  -s, --state-ttl<arg>               state will be gc'd after this time (in seconds) default=3600
  -b, --block-ttl<arg>               blocks will be gc'd after this time (in seconds) default=7*86400
  -A, --archive-ttl<arg>             archived blocks will be deleted after this time (in seconds) default=365*86400
  -K, --key-proof-ttl<arg>           key blocks will be deleted after this time (in seconds) default=365*86400*10

For now we advise to add following parameters to validator-engine invocation: --archive-ttl 315360000 --state-ttl 315360000 --block-ttl 315360000 which corresponds to 10 years retention.

While you are at it, I advise to increase number of threads used by validator engine to at least 4 by adding --threads 4.

If you are using mytonctrl to install TON then this adjustment needs to be done in /etc/systemd/system/validator.service file, do not forget to run systemctl daemon-reload after you edit the file.

Database injection

Start with a clean TON db directory (/var/ton-work/db in case of mytonctrl assisted installation) and:

  1. Copy config.json, keyring and error from your node's original db directory.
  2. Copy all directories from archival node state tar
  3. Make sure that ton work directory and all files under it belong to user validator-engine is running under (validator:validator in case of mytonctrl assisted installation).

Start

You are basically done, try to start the node and analyze the logs, note that it would take quite some time for node to open all files.

You can check quantity of open files on your node by executing following command: ls -1 /proc/<PID>/fd | wc -l where PID is pid of your validator-engine process.

Troubleshooting

As long as your validator-engine process is running you are usually good, it will take quite some time to sync the node from state archive tar (a day or more), keep checking the logs for last_masterchain_block_ago string, the value has to go down to ms range, then your node is in sync.

You might see typical initialization messages such as "no nodes" or "adnl query timeout", they are normal during new node startup and usually caused by missing local adnl/dht database, your node will start sync once this DB has been built, it can take up to 3 hours for this to happen. Have patience.

If validator-engine process crashes for some reason then look at main log file, it usually explains what happens. Most common mistakes are wrong permissions and open file limits.

@tolya-yanot
Copy link
Member Author

for the future: research why we need to keep all files open and a large state-ttl

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants