v0.1.0
This release brings significant changes on how Substreams are developed, consumed and speed of execution. Note that there is no breaking changes related to your Substreams' Rust code, only breaking changes will be about how Substreams are run and available features/flags.
Here the highlights of elements of this release:
- Production vs Development Mode
- Single Output Module
- Output Module must be of type
map
InitialSnapshots
is now adevelopment
mode feature only- Enhanced Parallel Execution
Warning Operators, refer to
Operators Notes
section for specific instructions of deploying this new version.
Production vs development mode
We introduce an execution mode when running Substreams, either production
mode or development
mode. The execution mode impacts how the Substreams get executed, specifically:
- The time to first byte
- The module logs and outputs sent back to the client
- How parallel execution is applied through the requested range
The difference between the modes are:
- In
development
mode, the client will receive all the logs of the executedmodules
. Inproduction
mode, logs are not available at all. - In
development
mode, module's are always re-executed from request's start block meaning now that logs will always be visible to the user. Inproduction
mode, if a module's output is found in cache, module execution is skipped completely and data is returned directly. - In
development
mode, only backward parallel execution can be effective. Inproduction
mode, both backward parallel execution and forward parallel execution can be effective. See Enhanced parallel execution section for further details about parallel execution. - In
development
mode, every module's output is returned back in the response but only root module is displayed by default insubstreams
CLI (configurable via a flag). Inproduction
mode, only root module's output is returned. - In
development
mode, you may request specificstore
snapshot that are in the execution tree via thesubstreams
CLI--debug-modules-initial-snapshots
flag. Inproduction
mode, this feature is not available.
The execution mode is specified at that gRPC request level and is the default mode is development
. The substreams
CLI tool being a development tool foremost, we do not expect people to activate production mode (-p
) when using it outside for maybe testing purposes.
If today's you have sink
code making the gRPC request yourself and are using that for production consumption, ensure that field production_mode
in your Substreams request is set to true
. StreamingFast provided sink
like substreams-sink-postgres, substreams-sink-files and others have already been updated to use production_mode
by default.
Final note, we recommend to run the production mode against a compiled .spkg
file that should ideally be released and versioned. This is to ensure stable modules' hashes and leverage cached output properly.
Single module output
We now only support 1 output module when running a Substreams, while prior this release, it was possible to have multiple ones.
- Only a single module can now be requested, previous version allowed to request N modules.
- Only
map
module can now be requested, previous version allowedmap
andstore
to be requested. InitialSnapshots
is now forbidden inproduction
mode and still allowed indevelopment
mode.- In
development
mode, the server sends back output for all executed modules (by default the CLI displays only requested module's output).
Note We added
output_module
to the Substreams request and keptoutput_modules
to remain backwards compatible for a while. If anoutput_module
is specified we will honor that module. If not we will checkoutput_modules
to ensure there is only 1 output module. In a future release, we are going to removeoutput_modules
altogether.
With the introduction of development
vs production
mode, we added a change in behavior to reduce frictions this changes has on debugging. Indeed, in development
mode, all executed modules's output will be sent be to the user. This includes the requested output module as well as all its dependencies. The substreams
CLI has been adjusted to show only the output of the requested output module by default. The new substreams
CLI flag -debug-modules-output
can be used to control which modules' output is actually displayed by the CLI.
Migration Path If you are currently requesting more than one module, refactor your Substreams code so that a single
map
module aggregates all the required information from your different dependencies in one output.
Output module must be of type map
It is now forbidden to request a store
module as the output module of the Substreams request, the requested output module must now be of kind map
. Different factors have motivated this change:
- Recently we have seen incorrect usage of
store
module. Astore
module was not intended to be used as a persistent long term storage,store
modules were conceived as a place to aggregate data for later steps in computation. Using it as a persistent storage make the store unmanageable. - We had always expected users to consume a
map
module which would return data formatted according to a finalsink
spec which will then permanently store the extracted data. We never envisionedstore
to act as long term storage. - Forward parallel execution does not support a
store
as its last step.
Migration Path If you are currently using a
store
module as your output store. You will need to create amap
module that will have as input thedeltas
of saidstore
module, and return the deltas.
Examples
Let's assume a Substreams with these dependencies: [block] --> [map_pools] --> [store_pools] --> [map_transfers]
- Running
substreams run substreams.yaml map_transfers
will only print the outputs and logs from themap_transfers
module. - Running
substreams run substreams.yaml map_transfers --debug-modules-output=map_pools,map_transfers,store_pools
will print the outputs of those 3 modules.
InitialSnapshots
is now a development
mode feature only
Now that a store
cannot be requested as the output module, the InitialSnapshots
did not make sense anymore to be available. Moreover, we have seen people using it to retrieve the initial state and then continue syncing. While it's a fair use case, we always wanted people to perform the synchronization using the streaming primitive and not by using store
as long term storage.
However, the InitialSnapshots
is a useful tool for debugging what a store contains at a given block. So we decided to keep it in development
mode only where you can request the snapshot of a store
module when doing your request. In the Substreams' request/response, initial_store_snapshot_for_modules
has been renamed to debug_initial_store_snapshot_for_modules
, snapshot_data
to debug_snapshot_data
and snapshot_complete
to debug_snapshot_complete
.
Migration Path If you were relying on
InitialSnapshots
feature in production. You will need to create amap
module that will have as input thedeltas
of saidstore
module, and then synchronize the full state on the consuming side.
Examples
Let's assume a Substreams with these dependencies: [block] --> [map_pools] --> [store_pools] --> [map_transfers]
- Running
substreams run substreams.yaml map_transfers -s 1000 -t +5 --debug-modules-initial-snapshot=store_pools
will print all the entries in store_pools at block 999, then continue with outputs and logs frommap_transfers
in blocks 1000 to 1004.
Enhanced parallel execution
There are 2 ways parallel execution can happen either backward or forward.
Backward parallel execution consists of executing in parallel block ranges from the module's start block up to the start block of the request. If the start block of the request matches module's start block, there is no backward parallel execution to perform. Also, this is happening only for dependencies of type store
which means that if you depends only on other map
modules, no backward parallel execution happens.
Forward parallel execution consists of executing in parallel block ranges from the start block of the request up to last known final block (a.k.a the irreversible block) or the stop block of the request, depending on which is smaller. Forward parallel execution significantly improves the performance of the Substreams as we execute your module in advanced through the chain history in parallel. What we stream you back is the cached output of your module's execution which means essentially that we stream back to you data written in flat files. This gives a major performance boost because in almost all cases, the data will be already for you to consume.
Forward parallel execution happens only in production
mode is always disabled when in development
mode. Moreover, since we read back data from cache, it means that logs of your modules will never be accessible as we do not store them.
Backward parallel execution still occurs in development
and production
mode. The diagram below gives details about when parallel execution happen.
You can see that in production
mode, parallel execution happens before the Substreams request range as well as within the requested range. While in development
mode, we can see that parallel execution happens only before the Substreams request range, so between module's start block and start block of requested range (backward parallel execution only).
Operators Notes
The state output format for map
and store
modules has changed internally to be more compact in Protobuf format. When deploying this new version, previous existing state files should be deleted or deployment updated to point to a new store location. The state output store is defined by the flag --substreams-state-store-url
flag parameter on chain specific binary (i.e. fireeth
).
Library
- Added
production_mode
to Substreams Request - Added
output_module
to Substreams Request
CLI
- Fixed
Ctrl-C
not working directly when in TUI mode. - Added
Trace ID
printing once available. - Added command
substreams tools analytics store-stats
to get statistic for a given store. - Added
--debug-modules-output
(comma-separated module names) (unavailable inproduction
mode).
- Breaking Renamed flag
--initial-snapshots
to--debug-modules-initial-snapshots
(comma-separated module names) (unavailable inproduction
mode).