Skip to content

ElasticSearch Plugin

Alfredo Garcia edited this page Oct 12, 2019 · 22 revisions

ElasticSearch Plugins Suite

Store full account and object data into indexed elastisearch database.

Motivation

There are 2 main problems this plug-in tries to solve:

  • The amount of RAM needed to run a full node with all the account history. Huge.
  • Fast search inside operation fields directly querying the ES database.

The database selection

Elastic search was selected for the main following reasons:

  • Open source.
  • Fast.
  • Index oriented.
  • Easy to install and start using.
  • Send data from c++ using curl.
  • Scalable and decentralized nodes of data possibilities.

Technical

The elasticsearch plugin when active is connected to each block the node receives. Operations are extracted in a similar logic of the classic account_history_plugin but sent to the ES database instead of storing internally. All fields from the operation are indexed for fast search.

The es-objects plugin when active is connected to config specified objects types(limit order objects, asset objects, etc).

Both plugins work in a similar way, data is collected in plugin internal database until a good amount of them(configurable) is available, then is sent as a _bulk operation to ES. _bulk needs to be big when replaying but much more smaller when we are in sync to display real time data to end users.

Optimal numbers for speed/performance can depend on hardware, default values are provided.

Hardware needed

It is very recommended that you use SSD disks in your node if you are trying to synchronize bitshares mainnet. It will make the task a lot faster. Still, the process of synchronizing the mainnet can take a few days.

You need 1T of space to be safe for a while, 32G or more of RAM is recommended.

After elasticsearch is installed increase heap size depending in your RAM:

$ vi config/jvm.options

..
# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space

-Xms12g
-Xmx12g
...

Installation

You need to have bitshares-core and its dependencies installed(https://github.com/bitshares/bitshares-core#getting-started).

In ubuntu 18.04 all the dependencies for elasticsearch database are installed by default. Just get the last version(or desired version) at:

https://www.elastic.co/downloads/elasticsearch

$ tar xvzf elasticsearch-7.4.0-linux-x86_64.tar.gz
$ cd elasticsearch-7.4.0/
$./bin/elasticsearch

ES will listen in 127.0.0.1:9200. Try http://127.0.0.1:9200/ in your browser and you should see some info about the database if the service started correctly.

You can put the binary as a service, program haves a --daemonize option, can run inside screen or any other option that suits you in order to keep the database running.

Please note ES does not run as root, make a normal user account by and proceed after:

adduser elastic

Running

Clone the bitshares repo and install bitshares:

git clone https://github.com/bitshares/bitshares-core
cd bitshares-core
git checkout -t origin/develop
git submodule update --init --recursive
cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo .
make

Start node with elasticsearch plugins enabled with default options:

./programs/witness_node --plugins "elasticsearch es-objects"

Arguments

The ES plugin have the following parameters passed by command line:

  • elasticsearch-node-url - The url of elasticsearch - default: http://localhost:9200/
  • elasticsearch-bulk-replay - The number of lines(ops * 2) to send to database in replay state - default: 10000
  • elasticsearch-bulk-sync - The number of lines(ops * 2) to send to database at syncronized state - default: 100
  • elasticsearch-visitor - Index visitor additional inside op data - default: false
  • elasticsearch-basic-auth - Send auth data i nthe form "username:password" - default: no auth ""
  • elasticsearch-index-prefix - A prefix for your indexes - default: "bitshares-"

Starting node

ES plugin is not active by default, we need to start it with the plugins parameter. An example of starting a node with ES plugin on the simplest form with all the default options will be:

programs/witness_node/witness_node --plugins "witness elasticsearch"

Note elasticsearch plugin and account_history plugin can not run the 2 at the same time.

Checking if it is working

A few minutes after the node start the first batch of 5000 ops will be inserted to the database. If you are in a desktop linux you may want to install https://github.com/mobz/elasticsearch-head (only works with elasticsearch 5) and see the database from the web browser to make sure if it is working. This is optional.

If you only have command line available you can query the database directly throw curl as:

root@NC-PH-1346-07:~/bitshares/elastic/bitshares-core# curl -X GET 'http://localhost:9200/bitshares-*/data/_count?pretty=true' -H 'Content-Type: application/json' -d '
{
    "query" : {
        "bool" : { "must" : [{"match_all": {}}] }
    }
}
'
{
  "count" : 360000,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  }
}
root@NC-PH-1346-07:~/bitshares/elastic/bitshares-core# 

360000 records are inserted at this point of the replay in ES, means it is working.

Important: Replay with ES plugin will be always slower than the "save to ram" account_history_plugin so expect to wait a lot more to be in sync than usual. With the recommended hardware the synchronization can take 30 hours.

A synchronized node will look like this(screen capture 02/08/2018):

root@NC-PH-1346-07:~# curl -X GET 'http://localhost:9200/bitshares-*/data/_count?pretty=true' -H 'Content-Type: application/json' -d '
{
    "query" : {
        "bool" : { "must" : [{"match_all": {}}] }
    }
}
'
{
  "count" : 391390823,
  "_shards" : {
    "total" : 175,
    "successful" : 175,
    "skipped" : 0,
    "failed" : 0
  }
}
root@NC-PH-1346-07:~# 

Important: We have reports of the need of more than 250G of disk space at 2018-08-02 to save all the history and the logs for them. Please make sure you have enough disk before synchronizing.

Indexes

The plugin creates monthly indexes in the ES database, index names are as graphene-2016-05 and contain all the operations made inside the monthly period.

List your indexes as:

NC-PH-1346-07:~# curl -X GET 'http://localhost:9200/_cat/indices' 
yellow open bitshares-2018-02 voS1uchzSxqxqkiaKNrEYg 5 1  18984984 0  10.8gb  10.8gb
yellow open bitshares-2018-06 D6wyX58lRyG3QOflmPwJZw 5 1  28514130 0  15.6gb  15.6gb
yellow open bitshares-2017-10 73xRTA-fSTm479H4kOENuw 5 1   9326346 0   5.2gb   5.2gb
yellow open bitshares-2016-08 -MMp3VGGRZqG2YL1LQunbg 5 1    551835 0 270.1mb 270.1mb
yellow open bitshares-2016-07 Ao56gO9LQ-asMhX50rbcCg 5 1    609087 0 303.2mb 303.2mb
yellow open bitshares-2018-05 9xuof-PiRQWburpW8ZXHVg 5 1  29665610 0  17.3gb  17.3gb
yellow open bitshares-2017-01 SpfwEzGcSoy9Hd6c6fzv2g 5 1   1197124 0   659mb   659mb
yellow open bitshares-2017-12 tF5af4OvTLqcx3IYUJSQig 5 1  13244366 0   7.5gb   7.5gb
yellow open bitshares-2016-03 yy91IvyATOCEoFHjgDbalg 5 1    597461 0 297.4mb 297.4mb
yellow open bitshares-2015-12 z-ZAZqHsQL2EDNpf3_ghGA 5 1    349985 0 151.3mb 151.3mb
yellow open bitshares-2017-07 OOr_xW4STsCm3sev1xtTRQ 5 1  17890903 0   9.6gb   9.6gb
yellow open bitshares-2016-04 jt9q50ADQuylV4l25zGAaw 5 1    413798 0 205.6mb 205.6mb
yellow open bitshares-2016-11 mWz7DpjSQyqJ_rL8gtMqWw 5 1    495556 0   260mb   260mb
yellow open bitshares-2016-12 2qht_wrXTUmNqDvczpHYzw 5 1    917034 0 506.6mb 506.6mb
yellow open bitshares-2016-10 vAMb0kW6Stqz6CNbuu7PEQ 5 1    416570 0 208.8mb 208.8mb
yellow open bitshares-2015-11 ETNFuF3sTPe-gTSzX3bdIg 5 1    301079 0 131.9mb 131.9mb
yellow open bitshares-2017-08 73Q2Asw-Rf228oQLoSCLGw 5 1   9916248 0   5.6gb   5.6gb
yellow open bitshares-2016-05 3c95AvKcQk2puBwVt_HIqQ 5 1    498493 0   246mb   246mb
yellow open bitshares-2017-02 lsiiz7PmS2q9_P2BQpNkNQ 5 1   1104282 0 586.7mb 586.7mb
yellow open bitshares-2017-11 4pqwIRdWSwSe5198YNz-Nw 5 1  14107174 0     8gb     8gb
yellow open bitshares-2018-07 fdmfXLqSTESODyLI_7cjXg 5 1 133879948 0  51.3gb  51.3gb
yellow open bitshares-2016-06 Is11IdcnT8mfBPpoLUjJyw 5 1    656358 0 330.3mb 330.3mb
yellow open bitshares-2018-04 MEA8fCsgSbOVXa0Z05cfsA 5 1  20940461 0  11.9gb  11.9gb
yellow open bitshares-2018-03 fMjxhFwHSP-6ewrl0Ns6ZQ 5 1  20335546 0    12gb    12gb
yellow open bitshares-2017-09 o-b2Bf3LR0-J1kUiv4FpHA 5 1  11075939 0   6.3gb   6.3gb
yellow open bitshares-2018-01 jw9rYlmTSvuLC1hHcYyU4Q 5 1  19396703 0  11.2gb  11.2gb
yellow open bitshares-2018-08 EDRxQvxhQJe3Vam_FxZMWg 5 1   8038498 0     3gb     3gb
yellow open bitshares-2016-09 fo2AL0y7T_q_HtXEYCv35Q 5 1    409164 0 203.4mb 203.4mb
yellow open bitshares-2016-01 3sjjs-4oQMm5HG-vUTuyoA 5 1    372772 0 168.7mb 168.7mb
yellow open bitshares-2017-03 ZxjWksRyTaGstm6T2Kxl9A 5 1   2167788 0   1.1gb   1.1gb
yellow open bitshares-2016-02 toWbFwI-RB2wEGrR8873rQ 5 1    468174 0 222.7mb 222.7mb
yellow open bitshares-2017-05 IEZQ-rtmQU2kKNcRb58Egg 5 1  10278394 0   5.6gb   5.6gb
yellow open bitshares-2017-04 S1h2eBGiS3quNJU7CqPR7Q 5 1   3316120 0   1.8gb   1.8gb
yellow open bitshares-2017-06 0HYkECRbSwGDrmDFof8nqA 5 1  10795239 0     6gb     6gb
yellow open bitshares-2015-10 XyKOlrTWSK6vQgdXm8SAtQ 5 1    161004 0  84.5mb  84.5mb
root@NC-PH-1346-07:~# 

If you don't see any index here then something is wrong with the bitshares-core node setup with elasticsearch plugin.

Pre-define settings

By default data indexes will be created with default elasticsearch settings. Node owner can tweak the default settings for all the bitshares-* indexes before the addition of any data.

An example of a good index configuration is as follows:

todo

Usage

After your node is in sync you are in possession of a full node without the ram issues. A synchronized witness_node with ES will be using less than 10 gigs of ram:

 total          8604280K
root@NC-PH-1346-07:~# pmap 2183

What client side apps can do with this new data is kind of unlimited to client developer imagination but lets check some real world examples to see the benefits of this new feature.

Get operations by account, time and operation type

References: https://github.com/bitshares/bitshares-core/issues/358 https://github.com/bitshares/bitshares-core/issues/413 https://github.com/bitshares/bitshares-core/pull/405 https://github.com/bitshares/bitshares-core/pull/379 https://github.com/bitshares/bitshares-core/pull/430 https://github.com/bitshares/bitshares-ui/issues/68

This is one of the issues that has been requested constantly. It can be easily queried with ES plugin by calling the _search endpoint doing:

curl -X GET 'http://localhost:9200/bitshares-*/data/_search?pretty=true' -H 'Content-Type: application/json' -d '
{
    "query" : {
        "bool" : { "must" : [{"term": { "account_history.account.keyword": "1.2.282"}}, {"range": {"block_data.block_time": {"gte": "2015-10-26T00:00:00", "lte": "2015-10-29T23:59:59"}}}] }
    }
}
'

Note Response is removed from the samples to save space in the document. If you are here you may want to see the response in your own place.

Filter based on block number or block range

https://github.com/bitshares/bitshares-core/issues/61

curl -X GET 'http://localhost:9200/bitshares-*/data/_search?pretty=true' -H 'Content-Type: application/json' -d '
{
    "query" : {
        "bool" : { "must" : [{"term": { "account_history.account.keyword": "1.2.356589"}}, {"range": {"block_data.block_num": {"gte": "17824289", "lte": "17824290"}                                                                                                                  
}}] }                                                          
    }
}
'

Get operations by transaction hash

Refs: https://github.com/bitshares/bitshares-core/pull/373

The get_transaction_id can be done as:

curl -X GET 'http://localhost:9200/bitshares-*/data/_search?pretty=true' -H 'Content-Type: application/json' -d '
{
    "query" : {
        "bool" : { "must" : [{"term": { "block_data.block_num": 19421114}},{"term": { "operation_history.trx_in_block": 0}}] }
    }
}
'

The above will return all ops inside trx, if you only need the trx_id field you can add source and just return the fields you need:

curl -X GET 'http://localhost:9200/bitshares-*/data/_search?pretty=true' -H 'Content-Type: application/json' -d '
{
    "_source": ["block_data.trx_id"],
    "query" : {
        "bool" : { "must" : [{"term": { "block_data.block_num": 19421114}},{"term": { "operation_history.trx_in_block": 0}}] }
    }
}
'

The get_transaction_from_id is very easy:

curl -X GET 'http://localhost:9200/bitshares-*/data/_search?pretty=true' -H 'Content-Type: application/json' -d '
{
    "query" : {
        "bool" : { "must" : [{"term": { "block_data.trx_id": "6f2d5064637391089127aa9feb36e2092347466c"}}] }
    }
}
'

Going forward

The reader will need to learn more about elasticsearch and lucene query language in order to make more complex queries.

All needed can be found at https://www.elastic.co/guide/en/elasticsearch/reference/6.2/index.html

By the same team of elasticsearch there is a front end named kibana (https://www.elastic.co/products/kibana). It is very easy to install and can do pretty good stuff like getting very detailed stats of the blockchain network.

More visitor code = more indexed data = more filters to use

Just as an example, it will be easy to index asset of trading operations by extending the visitor code of them. point 3 of https://github.com/bitshares/bitshares-core/issues/358 request trading pair, can be solved by indexing the asset of the trading ops as mentioned.

Remember ES already have all the needed info in the op text field of the operation_history object. Client can get all the ops of an account, loop throw them and convert the op string into json being able to filter by the asset or any other field needed. There is no need to index everything but it is possible.

Note on Duplicates

By using the op_type = create on each bulk line we send to the database and as we use an unique ID(ath id(2.9.X)) the plugin will not index any operation twice. If the node is on a replay, the plugin will start adding to database when it find a new record and never before.

Wrapper

It is not recommended to expose the elasticsearch api fully to the internet. Instead, applications will connect to a wrapper for data:

https://github.com/oxarbitrage/bitshares-es-wrapper

Elasticsearch database will listen in localhost and the wrapper in the same machine will expose the reduced set of API calls to the internet.

Clone this wiki locally