This is a demo application for handling order book aggregation. Order Books are common in cryptocurrency and traditional financial systems.
This project illustrates one possible way of aggregating order books across a set of exchanges. The exchanges used in this project are Bitstamp and Binance. The code is designed such that additional exchanges can be easily added.
-
Install Rust. For extra features, install a nightly build.
-
Install the IDE of your choice. For example: Intellij
-
Clone the Repository by clicking on the green
Code
button and clicking the copy button which will copy the github location. -
Go to the IDE of your choice and create a new project using the copied location.
-
Verify the installation by running:
cargo test
-
Run the server in one terminal window:
cargo run --release --bin orderbook-aggregator-server
-
Run the client in another terminal window:
cargo run --release --example orderbook-aggregator-client
If there are problems, verify the versions of Rust and Cargo listed below are installed.
To verify that you have the correct Rust version installed, run: rustc --version
The Rust version used to build the code is: rustc 1.60.0-nightly (ad46af247 2022-01-14)
To verify that you have the correct Rust version installed, run:cargo --version
The Cargo version used to build the code is: cargo 1.60.0-nightly (06b9d3174 2022-01-11)
Configuration can be handled using the following Environment variables (which can be set from the command line):
Order Book Aggregator Environment variables:
BUFFER_SIZE
- The size of the outgoing gRPC buffer. The default value is:1024
AGG_DEPTH
- The number of bids and asks in the outgoing aggregate order book. The default value is:10
DEFAULT_CURRENCY_PAIR
- The default currency pair to stream data for. The default value is:ethbtc
SOCKET_ADDR
- the gRPC socket address string. The default value is:0.0.0.0:50051
Binance specific Environment variables:
BINANCE_URL
- The URL string to connect to Binance. The default value is:wss://stream.binance.com:9443/stream
BINANCE_EXCHANGE
- The logging name for the Binance Exchange. The default value is:binance
BINANCE_DEPTH
- The number of bids and asks to return from Binance. The default value is:10
BINANCE_MILLIS
- How frequently Binance will send the Order Book. The default value is:100
See Binance documentation for more information.RECONNECT_WAIT_SECONDS
- How many seconds to wait after a WebSocket close before reconnecting when the close is not normal. See the Websocket RFC for more information. The default value is:5
Bitstamp specific Environment variables:
BITSTAMP_URL
- The URL string to connect to Bitstamp. The default value is:wss://ws.bitstamp.net
BITSTAMP_EXCHANGE
- The logging name for the Bitstamp Exchange. The default value is:bitstamp
RECONNECT_WAIT_SECONDS
- How many seconds to wait after a WebSocket close before reconnecting when the close is not normal. See the Websocket RFC for more information. The default value is:5
The code is designed to address the following areas of the problem space:
- Isolate Idiomatic Data Organization
- Order Books have a common set of data which is exposed in an idiomatic way. This idiomatic data can be transformed into a common set of data structures which is exposed to the rest of the project.
- Isolate WebSocket Specific Protocol Details
- While it is common for exchanges to use WebSockets to expose order book data this may not always be the case. For example, an exchange could use gRPC or some other protocol. Therefore, from a design perspective, it makes sense to isolate the protocol specific logic.
- Automatic Reconnects
- Sockets close periodically for a variety of reasons. When exposing data for trading applications, the application should reconnect to exchanges automatically to avoid stale order book data.
- Invalidating Data
- When a WebSocket is closed or disconnected, stale order book data should be invalidated. This way downstream consumers are not making trading decisions based on stale data.
- Data Subscriptions
- Ability to subscribe to various trading currency pairs.
- Data Streaming
- Data should be exposed as quickly as possible to data consumers.
- Configuration
- The application parameters should be configurable.
- DRY Code
- Where possible, the code should be DRY so that it is more easily maintained.
- Isolate gRPC Client Facing Code
- gRPC provides several advantages for exposing interfaces. Nevertheless, It should be possible to expose aggregated orderbook data through another protocol such as WebSockets without extensive refactoring.
- Well Maintained Open Source
- Use well-maintained (widely adopted) open source crates. Bugs are fixed quickly and new features are added routinely to well-maintained crates. Unsupported crates could lead to significant refactoring down the road.
- Extensibility
- Limit refactoring by providing decoupled modules for additional requirements.
- Consumers of the data can be configured at any point to allow code reuse.
- Additional data sources can be added with minimal effort.
- Ability to replace WebSocket code as needed.
- Ability to expose data to clients through protocols other than gRPC.
- Deployment
- Designed to be independent of the deployment ecosystem.
- Many deployment environments allow configuration using environment variables.
- Maintainability
- If needed, several developers should be able to work together on different parts of the code. This is accomplished by using message passing between Actors and by keeping roles and responsibilities isolated between Actors. For example, if there is a bug in one Actor, another developer can add features to another Actor with minimal impact.
The data for order books comes from exchanges. Each exchange may have an idiomatic approach for order book data.
Several exchanges use WebSockets for exposing data. This project consumes data from the following exchanges: Binance and Bitstamp. Additional technical specifications can be found at these links.
While not used in the project, Binance also exposes an endpoint which uses the industry standard FIX protocol.
The following well-supported and maintained crates were chosen for this project:
- Actix - Provides asynchronous execution, isolation, and decoupling using a well-defined interface.
- Tokio - Allows for asynchronous execution of code so that cores can be used efficiently and delays in one execution path do not delay other execution paths.
- Tokio Tungstenite - Provides an asynchronous WebSocket communication wrapper around Tungstenite.
- Serde - Handles serialization and deserialization of Order Book data.
- slog - Provides asynchronous logging so that logging does not delay data streaming.
- Anyhow and thiserror - are used for errors.
- Tonic - Provides an easy way of exposing a gRPC interface to clients.
- DashMap - Provides a fast, easy to use concurrent HashMap and HashSet.
NOTE: Crate versions are tied to a single version instead of using semantic versioning. This ensures build reproducibility when a Cargo.lock file is not present.
The core pattern for the application is Actors. Actors should not know about other Actors and ideally should communicate using messages. In Actix, Actors decoupling is accomplished using Recipients.
Another core pattern is a pipeline. Actors are organized into pipelines.
Consuming Client |
---|
↑
gRPC |
---|
↑
OrderbookAggregatorService |
---|
↑
SummaryStreamHandler |
---|
↑
OrderbookAggregator |
---|
↑
1 or more OrderbookDataSource components (e.g. Binance, Bitstamp, and future ECNs |
---|
↑
WebSocketStreamingDataSource (one for each OrderbookDataSource ) |
---|
Where the responsibilities for each component is:
OrderbookAggregatorService
- Exposes data streams to consumers using gRPC.SummaryStreamHandler
- Handles stream connects and disconnects.OrderbookAggregator
- Aggregates order books.OrderbookDataSource
- Supplies data from exchanges in a common format.WebSocketStreamingDataSource
- Handles raw text data, WebSocket logic, and reconnects.
There is an OrderbookDataSource
for both Binance, Bitstamp, and other exchanges.
Each is organized as follows:
OrderbookDataSource
- Holds logic common to all data sources.OrderbookDataSourceMemento
- Holds logic and data specific to a particular data source. This is a modified version of the GoF Memento Design Pattern.OrderbookConfig
- Holds configuration information for a specific data source.Subscription
- Holds data logic for subscribing to data streams from an exchange.
An Order Book Aggregation Pipeline is built using order_book_aggregator_pipeline()
.
It builds the structure shown in the diagram above. Other pipelines can be added as needed.
While the number of LOC
is small, it glues the pipeline together.
Ideally, performance testing would be performed on deployment hardware. This may lead to the following modifications which allow more parallelism if needed.
- Use
SyncArbiter
andSyncContext
to allow performance hot spots to be parallelized.
This can be added late in the development cycle.
To increase scalability, additional servers can be added for different currency pairs and routing needs. For example, routing could use Path-Based Routing.
With modest modifications the following are enhancements are feasible without significant redesign:
SummaryStreamHandler
- could be Generic such that it is not coupled to theSummary
object.- Configuration objects could be designed to load from either a
YAML
file or from environment variables. - The pipeline code can be made independent of the number and types of
OrderbookDataSource
s. - Actor
Message
structs can be regrouped and relocated as needs evolve. - Some code can be moved into separate crates as desired to encourage reuse.
- Procedural Macro Syntactic Sugar can be added to enhance DRY coding.
Additional security components can be added between the gRPC specific code and SummaryStreamHandler
.
Since gRPC uses HTTP/2 as an underlying protocol,
there are several HTTP specific attacks which could adversely impact
Layer 7
security. These include:
Code hygiene is handled using:
cargo fmt
andcargo clippy