GITBOOK-368: change request with no subject merged in GitBook

streamingfast · Jul 3, 2023 · 8128ee3 · 8128ee3
1 parent 05c8d0b
commit 8128ee3
Show file tree

Hide file tree

Showing 9 changed files with 161 additions and 285 deletions.
diff --git a/SUMMARY.md b/SUMMARY.md
@@ -35,10 +35,10 @@
 
 ## Integrate New Chains
 
-* [New Blockchains](integrate-new-chains/new-blockchains.md)
-* [Firehose Acme](integrate-new-chains/firehose-starter.md)
+* [Benefits](integrate-new-chains/benefits.md)
+* [Integration overview](integrate-new-chains/integration-overview.md)
 * [Design Principles](integrate-new-chains/design-principles.md)
-* [Why Integrate the Firehose](integrate-new-chains/why-integrate-the-firehose.md)
+* [Firehose Acme](integrate-new-chains/firehose-starter.md)
 
 ## References
 

diff --git a/architecture/data-flow.md b/architecture/data-flow.md
@@ -54,6 +54,12 @@ The instrumentation itself is called Firehose Instrumentation and generate Fireh
 
 Firehose logs outputs small chunks of data processed using a simple text-based protocol over the operating system's standard output pipe.
 
+{% hint style="danger" %}
+Note: This describes the **current methods** of instrumentation for Ethereum. It is our goal though that the integration point here gets simplified to the point where a single message would arrive for each block, on any given chain, well formatted in a nice protobuf bytestream.
+
+Contact the team if you're about to do a new chain integration. Also read the [integration overview](../integrate-new-chains/integration-overview.md).&#x20;
+{% endhint %}
+
 ### Firehose Logs Messages
 
 The Firehose Logs are specific for each blockchain although quite similar from one chain to another. There is no standardized format today, each chain implemented it's own format. The Firehose logs are usually modeled using "events" for example:
@@ -94,6 +100,7 @@ FIRE END_BLOCK 33 717 {"header":{"parentHash":"0x538473df2d1a762473cf9f8f6c69e65
 The block data event messages provided by the Firehose instrumentation are read by the reader component.
 
 The `reader` component deals with:
+
 * Launching instrumented native node process and manages its lifecycle (start/stop/monitor).
 * Connects to the native node process' standard output pipe.
 * Read the Firehose logs event messages and assembles a chain specific protobuf Block model

diff --git a/firehose-setup/system-requirements.md b/firehose-setup/system-requirements.md
@@ -39,5 +39,5 @@ The CPU/RAM requirements will depend on these factors:
 Firehose requires native instrumentation of the binary of the chain you want to sync with, a vanilla binary of the chain without Firehose support built in it is not going to work. So you can only today synchronize with Firehose enabled chains.
 
 {% hint style="success" %}
-Other blockchains beyond the ones currently supported can be used with Firehose through the process of instrumentation. Information is provided for the instrumentation process of [_new blockchains_](../integrate-new-chains/new-blockchains.md)
+Other blockchains beyond the ones currently supported can be used with Firehose through the process of instrumentation. Information is provided for the instrumentation process of [_new blockchains_](../integrate-new-chains/integration-overview.md)
 {% endhint %}
diff --git a/...-new-chains/why-integrate-the-firehose.md → integrate-new-chains/benefits.md b/...-new-chains/why-integrate-the-firehose.md → integrate-new-chains/benefits.md
diff --git a/integrate-new-chains/design-principles.md b/integrate-new-chains/design-principles.md
@@ -6,15 +6,11 @@ description: StreamingFast Firehose design principles
 
 # Design Principles
 
-## Firehose Design Principles
-
 Firehose was heavily inspired by large-scale data science machinery and other processes previously developed by the StreamingFast team.
 
 ## The Firehose "North Star"
 
-### Truths & Assumptions
-
-Firehose was designed with the following truths and assumptions taken into excruciatingly careful consideration.
+Principles and assumptions:
 
 * Flat files provide more efficiency than live running CPU and RAM-consuming and intensive processes.
 * Fast iteration is preferred for data processes because data is inherently messy.
@@ -30,95 +26,24 @@ Firehose was designed with the following truths and assumptions taken into excru
 
 ## Extraction
 
-### Minding Deterministic Block Execution
-
-StreamingFast strives to create the shortest path available from the deterministic execution of blocks and transactions down into a flat file. High-level goals surrounding the extraction process were identified and conceptualized including:
-
-* The development of simple, robust, laser-focused processes.
-* Create core system [components](../architecture/components/) including the Reader, Merger, Relayer, and gRPC Server.
-* Avoid the coupling of extraction and indexing and any other services.
-* Guarantee maximum node performance during data extraction for instrumented nodes, for all protocols.
-
-## Data Completeness
-
-### Full Data Extraction
-
-Firehose achieves data completeness through the extraction of all available data from instrumented nodes.
-
-Revisiting instrumented nodes is avoided by Firehose due to the complete, rich, verifiable data collected during the extraction process.
-
-### Finite Data Tracking
-
-During a transaction, the balance for an address could change from `100` to `200.` Firehose will save the storage key that was altered, and the previous and next values.
-
-### Integrity & Fidelity
-
-Forward and backward atomic updates and integrity checks are made possible due to the fidelity of data being tracked by Firehose.
-
-In the example above, `200` should be the next changed value for the `previous_data` key. If a discrepancy is encountered it means there is an issue with the extraction mechanism and data quality will be negatively impacted.
-
-### Complete Data in Detail
-
-Complete data means accounting for:
-
-* the relationships between a transaction,
-* the transaction's block’s schedule,
-* transaction execution,
-* transaction expiration,
-* events produced by any transaction side effects,
-* the transaction call tree, and each call’s state transition and effects.
-
-### Transaction Relationships & Data
-
-Detailed transaction relationship information is difficult to obtain from typical blockchain data.
-
-Firehose provides thorough and complete transaction data to avoid missed opportunities for potential data application development efforts.
+During the extraction phase, our goals are:
 
-### Transaction & State Data Together
+* All data captured should be deterministic, with the single exception of the block number at which the chain reaches finality (this number can vary depending on a node's relative topology to the network, and it could, for certain chains, not arrive at the same moment for all).
+* Performance-wise, we want the impact to be minimal on the node that is usually doing write operations.
 
-Query requests for either transaction status or state are available for some JSON-RPC protocols. Both status and state however aren't available.
+Deep data extraction is also one of the goals of our design, for the purposes of rich indexing downstream. For example:
 
-Data processes triggered by Ethereum log events can benefit from having knowledge of their source. The event could have been generated by the current contract, its parent (contract), or another known and trusted contact.
+* Extracting both the previous value, and the new value on balance changes, and state changes to accounts, storage locations, key/value stores, etc. This also helps with integrity checking (to know if any changes were missing, all  `prev -> new` pairs can be checked to match up, for a given storage key).
+* Extracting all the relationships, between blocks and transactions, between transactions and single function/calls executions within a transaction, call trees, and a thing we call **total ordering**, meaning having an **ordinal** that can help order all things tracked (beginning/end of transactions, function calls, state changes, events emitted during execution, etc..) all relative to one another. For example, Ethereum has log indexes, allowing ordering of a log vs another log. But it doesn't allow for ordering a log versus a state change, or a log within a tree of calls (where perhaps the input of the call is what you're watching).
+  * Some blockchains allow you to query state, and query events separately. Oftentime, it's not possible to link those things. We like to instrumented to be link changes to the source of events, to be able to build better downstream systems, and not lose relations between state and events.
+  * Most indexing strategies hinge on events, but having state changes allows for new opportunities of indexing, triggering on the actual values being mutated. On certain chains, this allows you to avoid some gas costs by limiting the events, as you're able to trigger "virtual" events based on state changes.
+  * Picking up on those changes can also avoid needing to (re-)design contracts when new data is needed, and wasn't thought of at first.
 
-### Reduced Need for Smart Contract Events
-
-Accessing rich, complete data leads smart contract developers to emit additional events. Emitting additional events leads to increased gas fees.
-
-{% hint style="info" %}
-_Note: Enriched and complete transaction data is simply not easily or readily available._
-{% endhint %}
-
-### Contract Design Issues
-
-The lack of availability of rich data also has effects on contract design.
-
-Contract designers are required to reason and plan out how stored data will be queried and consumed by their application.
-
-### Contract Simplification & Cost Reduction
-
-Having access to richer external data processes allows developers to simplify contracts reducing on-chain operation costs.
-
-## Modeling With Extreme Care
-
-### Data Model for Ingestion
-
-The data model used by StreamingFast to ingest protocol data was created with extreme diligence and care.
-
-{% hint style="success" %}
-**Tip**_: StreamingFast encountered several peculiarities within many protocols during the design and development process of Firehose._
-{% endhint %}
-
-### Subtleties in Reverted Calls
-
-Interpreting subtleties in bits of data, for things like the meaning of a reverted call in an Ethereum call stack, becomes impossible farther downstream.
-
-{% hint style="info" %}
-**Note**_: Firehose provides complete node data through carefully considered and implemented model definitions created with Protocol Buffer schemas._
-{% endhint %}
+Also, when building an extractor is to **extract all the data necessary to recreate an archive node**. If nothing is missing, then someone indexing downstream should be satisfied.
 
-### Running Full Archive Nodes
+Another principle is: like in any database, **transactions/calls are the true boundaries of state changes**, blocks exist only to facilitate and accelerate consensus (there would be great overhead if networks needed to agree on each individual transaction) but are as such an artificial boundary.
 
-Firehose provides enough comprehensive data to conceptually boot and run a full archive node.
+RPC nodes usually round up things to the block level, but with Firehose, data should be extracted in a way that makes the transaction, or even the individual smart contract calls, the unit of change. Concretely, this means state changes should be modeled at the lowest level.
 
 ## Pure Data, Files & Streams
 

diff --git a/integrate-new-chains/firehose-starter.md b/integrate-new-chains/firehose-starter.md
@@ -1,22 +1,18 @@
 ---
-description: StreamingFast Firehose template
+description: StreamingFast Firehose template sample for A Company that Makes Everything
 ---
 
 # Firehose Acme
 
-Instrumenting a new chain from scratch requires the node native code to be instrumented to output Firehose logs, but this is only one side of the coin. A Firehose instrumentation of a new chain requires also a `firehose-<chain>` repository that contains the chain specific code to run the Firehose stack.
+Instrumenting a new chain from scratch requires the node native code to be instrumented to output Firehose logs, but this is only one side of the coin. A Firehose instrumentation of a new chain requires also a `firehose-<chain>` program that contains chain-specific code to read the data output by the instrumented node, and serves data throughout the Firehose stack.
 
-This `firehose-<chain>` is a Golang project that contains the CLI, the reader code necessary to assemble Firehose Logs into chain specific logs and a bunch of other small boilerplate code around the Firehose set of libraries.
+This `firehose-<chain>` is a Golang project that contains the CLI, the [reader code necessary to assemble Firehose node output into chain-specific Blocks](https://github.com/streamingfast/firehose-acme/blob/master/codec/console\_reader.go), and a bunch of other small boilerplate code around the Firehose set of libraries.
 
 To ease the work of Firehose implementors, we provide a "template" project [firehose-acme](https://github.com/streamingfast/firehose-acme) that is the main starting point for instrumenting new, unsupported blockchain nodes.
 
-It consists of basic code and a faux data provision application called the Dummy Blockchain, or `dchain`. The idea is that you can even play with this [firehose-acme](https://github.com/streamingfast/firehose-acme) instance to see blocks produced and test how Firehose looks like in its core.
+It consists of basic code and a Dummy Blockchain prototype. The idea is that you can play with this [firehose-acme](https://github.com/streamingfast/firehose-acme) instance to see blocks produced and test some Firehose behaviors.
 
-{% hint style="info" %}
-A [Go](https://go.dev/doc/install) installation is required for the command below to work and the path where Golang install binaries should be available in your `PATH` (can be added with `export PATH=$PATH:$(go env GOPATH)/bin`, see [GOPATH](https://go.dev/doc/gopath_code#GOPATH) for further details).
-{% endhint %}
-
-## `firehose-acme` Installation
+## Install `firehose-acme`
 
 Clone the repository:
 
@@ -31,6 +27,10 @@ cd firehose-acme
 go install ./cmd/fireacme
 ```
 
+{% hint style="info" %}
+A [Go](https://go.dev/doc/install) installation is required for the command below to work and the path where Golang install binaries should be available in your `PATH` (can be added with `export PATH=$PATH:$(go env GOPATH)/bin`, see [GOPATH](https://go.dev/doc/gopath\_code#GOPATH) for further details).
+{% endhint %}
+
 And validate that everything is working as expected:
 
 ```bash
@@ -39,10 +39,10 @@ fireacme version dev (Built 2023-02-02T13:42:20-05:00)
 ```
 
 {% hint style="info" %}
-When doing the `fireacme --version` command, if you see instead a message like `command not found: fireacme`, it's most probably because `$(go env GOPATH)/bin` is not in your `PATH` environment variable.
+If `fireacme` is not found, please check [https://go.dev/doc/gopath\_code#GOPATH](https://go.dev/doc/gopath\_code#GOPATH)
 {% endhint %}
 
-## Dummy Blockchain
+## Install the dummy blockchain
 
 Obtain the Dummy Blockchain by installing from source:
 
@@ -57,19 +57,18 @@ dummy-blockchain --version
 dummy-blockchain version 0.0.1 (build-commit="-" build-time="-")
 ```
 
-## Testing `firehose-acme`
-
-### YAML Configuration
+## Run it
 
 A simple shell script that starts `firehose-acme` with sane default is located at [devel/standard/start.sh](https://github.com/streamingfast/firehose-acme/blob/master/devel/standard/start.sh). The configuration file used to launch all the applications at once is located at [devel/standard/standard.yaml](https://github.com/streamingfast/firehose-acme/blob/master/devel/standard/standard.yaml)
 
-Run the script from your local cloned `firehose-acme` version as done in [firehose-acme installation section](#firehose-acme-installation):
+Run the script from your local cloned `firehose-acme` version as done in [firehose-acme installation section](firehose-starter.md#firehose-acme-installation):
 
 ```bash
 ./devel/standard/start.sh
 ```
 
 The following messages will be printed to the terminal window if:
+
 * All of the configuration changes were made correctly,
 * All system paths have been set correctly,
 * And the Dummy Blockchain was installed and set up correctly.
@@ -94,7 +93,57 @@ start --store-dir=/Users/maoueh/work/sf/firehose-acme/devel/standard/firehose-da
 ...
 ```
 
-{% hint style="info" %}
-We want to emphasis that `firehose-acme` is only a template project showcasing how Firehose works and used by implementors to bootstrap instrumentation of their chain.
-Real-world Firehose implementations don't use or rely on the Dummy Blockchain application or its data, they deal with the blockchain's specific native process and specific `firehose-<chain>` repository.
+To integrate the target blockchain modify `devel/standard/standard.yaml` and change the `start.flags.mindreader-node-path` flag to point to the custom integration's blockchain node binary.
+
+## Define protobuf types
+
+Update the proto file `sf/acme/type/v1/type.proto` to model your chain's data model.
+
+### Generate structs
+
+After updating the references to "Acme" the Protocol Buffers need to be regenerated. Use the `generate` shell script to make the updates.
+
+```
+./types/pb/generate.sh
+```
+
+## Implement the reader
+
+The [`console_reader.go`](https://github.com/streamingfast/firehose-acme/blob/master/codec/console\_reader.go#L121) file is the interface between the instrumented node's output and the Firehose ingestion process.
+
+Each blockchain has specific pieces of data, and implementation details that are paticular to that blockchain. Reach out to us if you need guidance here.
+
+{% hint style="warning" %}
+**Important**_: Studying the StreamingFast Ethereum and other implementations and instrumentations should serve as a foundation for other custom integrations._
 {% endhint %}
+
+## Run tests
+
+After completing all of the previous steps the base integration is ready for initial testing.
+
+```
+go test ./...
+```
+
+If all changes were made correctly the updated project should compile successfully.
+
+## Wrap up the integration
+
+You can reach out to the StreamingFast team on Discord. We usually maintain these Go-side integrations and keep them up-to-date. We can review, and do the renames as needed.
+
+### Rename
+
+You can also rename the project and all files and references to `acme` to your own chain's name. Choose two names, a long-form and a short form for the custom integration following the naming conventions outlined below.
+
+For example:
+
+* `arweave` and `arw`
+
+Then finalize the rename:
+
+* Rename `cmd/fireacme` -> `cmd/firearw` (short form)
+* Search and replace `fireacme` => `firearw` (short form)
+* Conduct a global search and replace from: `acme` => `arweave` (long form)
+* Conduct a global search to replace `ACME` => `ARWEAVE` (long form)
+* Conduct a global search to replace `Acme` => `Arweave` (long form)
+