Skip to content

Commit

Permalink
docs: Updates for Indexer 3.0.0 (#1562)
Browse files Browse the repository at this point in the history
Co-authored-by: algoanne <[email protected]>
Co-authored-by: Zeph Grunschlag <[email protected]>
  • Loading branch information
3 people authored Aug 3, 2023
1 parent efe92f5 commit a1b2e3e
Show file tree
Hide file tree
Showing 24 changed files with 16 additions and 1,325 deletions.
72 changes: 14 additions & 58 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@

# Algorand Indexer

The Indexer is a standalone service that reads committed blocks from the Algorand blockchain and maintains a database of transactions and accounts that are searchable and indexed.
The Indexer is an API that provides search capabilities for Algorand on-chain data. Data is written by [Conduit](https://github.com/algorand/conduit/blob/master/docs/tutorials/IndexerWriter.md) to populate a PostgreSQL compatible database.

## Building from source ##

Expand All @@ -27,17 +27,18 @@ All recommendations here should be be used as a starting point. Further benchmar

## Versions

* Database: [Postgres 13](https://www.postgresql.org/download/)
* Database: [Postgres 14](https://www.postgresql.org/download/)

## System

For a simple deployment the following configuration works well:
* Network: Indexer, Algod and PostgreSQL should all be on the same network.
* Network: Indexer, Conduit, Algod and PostgreSQL should all be on the same network.
* Indexer: 2 CPU and 8 GB of ram.
* Conduit: 2 CPU and 8 GB of ram.
* Database: When hosted on AWS a `db.r5.xlarge` instance works well.
* Storage: 20 GiB

A database with replication can be used to scale read volume. Configure multiple Indexer daemons with a single writer and multiple readers.
A database with replication can be used to scale read volume. Configure a single Conduit writer with multiple Indexer readers.

# Quickstart

Expand All @@ -63,41 +64,9 @@ Contributions welcome! Please refer to our [CONTRIBUTING](https://github.com/alg

# Usage

The most common usage of the Indexer is expecting to be getting validated blocks from a local `algod` Algorand node, adding them to a [PostgreSQL](https://www.postgresql.org/) database, and serving an API to make available a variety of prepared queries. Some users may wish to directly write SQL queries of the database.

Indexer works by fetching blocks one at a time, processing the block data, and loading it into a traditional database. There is a database abstraction layer to support different database implementations. In normal operation, the service will run as a daemon and always requires access to a database.

## Modes
There are two primary modes of operation:
* [Database updater](#database-updater)
* [Read only](#read-only)

In both configurations, a postgres connection string is required. Both DSN and URL formats are supported, [details are available here](https://pkg.go.dev/github.com/jackc/pgx/v4/[email protected]#ParseConfig).

In addition, the indexer uses a data directory that stores data needed for runtime operation and configuration.
See the [Data Directory documentation](docs/DataDirectory.md) for how to (re-)initialize this directory in case it is lost or needs to be re-created.

### Database updater
In this mode, the database will be populated with data fetched from an [Algorand archival node](https://developer.algorand.org/docs/run-a-node/setup/types/#archival-mode). Because every block must be fetched to bootstrap the database, the initial import for a ledger with a long history will take a while. If the daemon is terminated, it will resume processing wherever it left off.

Keeping the indexer daemon as close as possible to the database helps minimize latency. For example, if using AWS EC2 and RDS, we suggest putting EC2 in the same VPC, Region, and even Availability Zone.

You should use a process manager, like systemd, to ensure the daemon is always running. Indexer will continue to update the database as new blocks are created.

To start indexer as a daemon in update mode, provide the required fields:
Configure Indexer with a PostgreSQL compatible database. Conduit should be used to populate the database and one or more Indexer daemons can be run to serve the API. For further isolation, a `readonly` user can be created for the database.
```
~$ algorand-indexer daemon --data-dir /tmp --algod-net yournode.com:1234 --algod-token token --genesis ~/path/to/genesis.json --postgres "user=readonly password=YourPasswordHere {other connection string options for your database}"
```

Alternatively if indexer is running on the same host as the archival node, a simplified command may be used:
```
~$ algorand-indexer daemon --data-dir /tmp --algod-net yournode.com:1234 -d /path/to/algod/data/dir --postgres "user=readonly password=YourPasswordHere {other connection string options for your database}"
```

### Read only
It is possible to set up one daemon as a writer and one or more readers. The Indexer pulling new data from algod can be started as above. Starting the indexer daemon without $ALGORAND_DATA or -d/--algod/--algod-net/--algod-token will start it without writing new data to the database. For further isolation, a `readonly` user can be created for the database.
```
~$ algorand-indexer daemon --data-dir /tmp --no-algod --postgres "user=readonly password=YourPasswordHere {other connection string options for your database}"
~$ algorand-indexer daemon --data-dir /tmp --postgres "user=readonly password=YourPasswordHere {other connection string options for your database}"
```

The Postgres backend does specifically note the username "readonly" and changes behavior to avoid writing to the database. But the primary benefit is that Postgres can enforce restricted access to this user. This can be configured with:
Expand Down Expand Up @@ -189,11 +158,7 @@ Settings can be provided from the command line, a configuration file, or an envi
| postgres | P | postgres-connection-string | INDEXER_POSTGRES_CONNECTION_STRING |
| data-dir | i | data | INDEXER_DATA |
| pidfile | | pidfile | INDEXER_PIDFILE |
| algod | d | algod-data-dir | INDEXER_ALGOD_DATA_DIR / ALGORAND_DATA|
| algod-net | | algod-address | INDEXER_ALGOD_ADDRESS |
| algod-token | | algod-token | INDEXER_ALGOD_TOKEN |
| server | S | server-address | INDEXER_SERVER_ADDRESS |
| no-algod | | no-algod | INDEXER_NO_ALGOD |
| token | t | api-token | INDEXER_API_TOKEN |
| metrics-mode | | metrics-mode | INDEXER_METRICS_MODE |
| logfile | f | logfile | INDEXER_LOGFILE |
Expand All @@ -213,24 +178,21 @@ Settings can be provided from the command line, a configuration file, or an envi
| max-applications-limit | | max-applications-limit | INDEXER_MAX_APPLICATIONS_LIMIT |
| default-applications-limit | | default-applications-limit | INDEXER_DEFAULT_APPLICATIONS_LIMIT |
| enable-all-parameters | | enable-all-parameters | INDEXER_ENABLE_ALL_PARAMETERS |
| catchpoint | | catchpoint | INDEXER_CATCHPOINT |

## Command line

The command line arguments always take priority over the config file and environment variables.

```
~$ ./algorand-indexer daemon --data-dir /tmp --pidfile /var/lib/algorand/algorand-indexer.pid --algod /var/lib/algorand --postgres "host=mydb.mycloud.com user=postgres password=password dbname=mainnet"`
~$ ./algorand-indexer daemon --data-dir /tmp --pidfile /var/lib/algorand/algorand-indexer.pid --postgres "host=mydb.mycloud.com user=postgres password=password dbname=mainnet"`
```

## Data Directory

The Indexer data directory is the location where the Indexer can store and/or load data needed for runtime operation and configuration.
The Indexer data directory is the location where the Indexer can store and/or load data needed for configuration. The data directory is effectively stateless, so it does not need to be persisted across deployments as long as the configuration is supplied.

**It is a required argument for Indexer daemon operation. Supply it to the Indexer via the `--data-dir`/`-i` flag.**

**It is HIGHLY recommended placing the data directory in a separate, stateful directory for production usage of the Indexer.**

For more information on the data directory see [Indexer Data Directory](docs/DataDirectory.md).

### Auto-Loading Configuration
Expand All @@ -252,7 +214,6 @@ The same indexer configuration from earlier can be made in bash with the followi
```
~$ export INDEXER_POSTGRES_CONNECTION_STRING="host=mydb.mycloud.com user=postgres password=password dbname=mainnet"
~$ export INDEXER_PIDFILE="/var/lib/algorand/algorand-indexer.pid"
~$ export INDEXER_ALGOD_DATA_DIR="/var/lib/algorand"
~$ export INDEXER_DATA="/tmp"
~$ ./algorand-indexer daemon
```
Expand All @@ -267,7 +228,6 @@ Here is an example **indexer.yml** file:
```
postgres-connection-string: "host=mydb.mycloud.com user=postgres password=password dbname=mainnet"
pidfile: "/var/lib/algorand/algorand-indexer.pid"
algod-data-dir: "/var/lib/algorand"
```

Place this file in the data directory (`/tmp/data-dir` in this example) and supply it to the Indexer daemon:
Expand All @@ -283,7 +243,7 @@ Place this file in the data directory (`/tmp/data-dir` in this example) and supp
```
[Service]
ExecStart=
ExecStart=/usr/bin/algorand-indexer daemon --data-dir /tmp --pidfile /var/lib/algorand/algorand-indexer.pid --algod /var/lib/algorand --postgres "host=mydb.mycloud.com user=postgres password=password dbname=mainnet"
ExecStart=/usr/bin/algorand-indexer daemon --data-dir /tmp --pidfile /var/lib/algorand/algorand-indexer.pid --postgres "host=mydb.mycloud.com user=postgres password=password dbname=mainnet"
PIDFile=/var/lib/algorand/algorand-indexer.pid
```
Expand All @@ -308,7 +268,7 @@ If you wish to run multiple indexers on one server under systemd, see the commen
# Unique Database Configurations

## Load balancing
If indexer is deployed with a clustered database using multiple readers behind a load balancer, query discrepancies are possible due to database replication lag. Users should check the `current-round` response field and be prepared to retry queries when stale data is detected.
If Conduit is deployed with a clustered database using multiple readers behind a load balancer, query discrepancies are possible due to database replication lag. Users should check the `current-round` response field and be prepared to retry queries when stale data is detected.

## Custom indices
Different application workloads will require different custom indices in order to make queries perform well. More information is available in [PostgresqlIndexes.md](docs/PostgresqlIndexes.md).
Expand All @@ -323,12 +283,8 @@ For all other transaction queries, results are returned oldest first. This is be

<!-- USAGE_END_MARKER_LINE -->

# Migrating from Indexer v1

Indexer v1 was built into the algod v1 REST API. It has been removed with the algod v2 REST API, all of the old functionality is now part of this project. The API endpoints, parameters, and response objects have all been modified and extended. Any projects depending on the old endpoints will need to be updated accordingly.

# Building
# Migrating from Indexer v2

Indexer is built using an in-house task framework called [`mule`](https://pypi.org/project/mulecli/) (it has since been open-sourced).
If you were previously using Indexer 2.x you will need to reconfigure your deployment to include [Conduit](https://github.com/algorand/conduit). The data loading component has moved from Indexer 2.x to Conduit.

Please refer to the [build docs](mule/README.md) in the `mule/` directory.
[Additional details found here.](https://github.com/algorand/conduit#migrating-from-indexer-2x)
69 changes: 0 additions & 69 deletions docs/Conduit.md

This file was deleted.

35 changes: 2 additions & 33 deletions docs/DataDirectory.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,37 +12,6 @@ The data directory is the first place to check for different configuration files
- `indexer.yml` - Indexer Configuration File
- `api_config.yml` - API Parameter Enable/Disable Configuration File

# Account Cache
## Read-Only

Indexer writers maintain an account cache in the data directory. This cache is used during block processing to compute things like the new account balances after processing transactions. Prior to this local cache, the database was queried on each round to fetch the initial account states.

The following files are created:
- ledger.block.sqlite
- ledger.block.sqlite-shm
- ledger.block.sqlite-wal
- ledger.tracker.sqlite
- ledger.tracker.sqlite-shm
- ledger.tracker.sqlite-wal


## Read-Only Mode

The account cache is not required when in read-only mode. While the data directory is still required, it will only be used for configuration.

# Initialization

If a new data directory must be created, the following process should be used:
1. Review the Indexer log to find the most recent round that was processed. For example, `22212765` in the following line:
```
{"level":"info","msg":"round r=22212765 (49 txn) imported in 139.782694ms","time":"2022-07-18T19:23:13Z"}
```
2. Lookup the most recent catchpoint for your network **without going over the indexer's current round** from the following links. For example, with `22212765` from step 1, on mainnet you would choose `22210000#MZZIOYXYPPGNYRQHROXCPILIWIMQQRN7ZNLQJVM2QVSKT3QX6O4A` from the Mainnet consolidated catchpoint list.
Full list of available catchpoints for each network can be found at the following links:
- [Mainnet](https://algorand-catchpoints.s3.us-east-2.amazonaws.com/consolidated/mainnet_catchpoints.txt)
- [Testnet](https://algorand-catchpoints.s3.us-east-2.amazonaws.com/consolidated/testnet_catchpoints.txt)
- [Betanet](https://algorand-catchpoints.s3.us-east-2.amazonaws.com/consolidated/betanet_catchpoints.txt)
3. Supply the catchpoint label when starting Indexer using the command line setting `--catchpoint 22210000#MZZIOYXYPPGNYRQHROXCPILIWIMQQRN7ZNLQJVM2QVSKT3QX6O4A`, setting `catchpoint` in `indexer.yml`, or setting the `INDEXER_CATCHPOINT` environment variable.

While Indexer starts, you can see progress information printed periodically in the log file.

Note: You are not required to unset the catchpoint label after initialization. During startup, if Indexer is ahead of the supplied catchpoint label, it is ignored.
Indexer is effectively a read-only client to the PostgreSQL database. While the configuration must be provided in a data directory, it is only used for configuration.
Loading

0 comments on commit a1b2e3e

Please sign in to comment.