@@ -105,21 +107,16 @@ Your feedbacks about the features are very important.
# Framework of Qlib
-
+
-At the module level, Qlib is a platform that consists of the above components. The components are designed as loose-coupled modules, and each component could be used stand-alone.
+The high-level framework of Qlib can be found above(users can find the [detailed framework](https://qlib.readthedocs.io/en/latest/introduction/introduction.html#framework) of Qlib's design when getting into nitty gritty).
+The components are designed as loose-coupled modules, and each component could be used stand-alone.
-| Name | Description |
-| ------ | ----- |
-| `Infrastructure` layer | `Infrastructure` layer provides underlying support for Quant research. `DataServer` provides a high-performance infrastructure for users to manage and retrieve raw data. `Trainer` provides a flexible interface to control the training process of models, which enable algorithms to control the training process. |
-| `Workflow` layer | `Workflow` layer covers the whole workflow of quantitative investment. `Information Extractor` extracts data for models. `Forecast Model` focuses on producing all kinds of forecast signals (e.g. _alpha_, risk) for other modules. With these signals `Decision Generator` will generate the target trading decisions(i.e. portfolio, orders) to be executed by `Execution Env` (i.e. the trading market). There may be multiple levels of `Trading Agent` and `Execution Env` (e.g. an _order executor trading agent and intraday order execution environment_ could behave like an interday trading environment and nested in _daily portfolio management trading agent and interday trading environment_ ) |
-| `Interface` layer | `Interface` layer tries to present a user-friendly interface for the underlying system. `Analyser` module will provide users detailed analysis reports of forecasting signals, portfolios and execution results |
-
-* The modules with hand-drawn style are under development and will be released in the future.
-* The modules with dashed borders are highly user-customizable and extendible.
-
-(p.s. framework image is created with https://draw.io/)
+Qlib provides a strong infrastructure to support Quant research. [Data](https://qlib.readthedocs.io/en/latest/component/data.html) is always an important part.
+A strong learning framework is designed to support diverse learning paradigms (e.g. [reinforcement learning](https://qlib.readthedocs.io/en/latest/component/rl.html), [supervised learning](https://qlib.readthedocs.io/en/latest/component/workflow.html#model-section)) and patterns at different levels(e.g. [market dynamic modeling](https://qlib.readthedocs.io/en/latest/component/meta.html)).
+By modeling the market, [trading strategies](https://qlib.readthedocs.io/en/latest/component/strategy.html) will generate trade decisions that will be executed. Multiple trading strategies and executors in different levels or granularities can be [nested to be optimized and run together](https://qlib.readthedocs.io/en/latest/component/highfreq.html).
+At last, a comprehensive [analysis](https://qlib.readthedocs.io/en/latest/component/report.html) will be provided and the model can be [served online](https://qlib.readthedocs.io/en/latest/component/online.html) in a low cost.
# Quick Start
@@ -404,6 +401,17 @@ Dataset plays a very important role in Quant. Here is a list of the datasets bui
[Here](https://qlib.readthedocs.io/en/latest/advanced/alpha.html) is a tutorial to build dataset with `Qlib`.
Your PR to build new Quant dataset is highly welcomed.
+
+# Learning Framework
+Qlib is high customizable and a lot of its components are learnable.
+The learnable components are instances of `Forecast Model` and `Trading Agent`. They are learned based on the `Learning Framework` layer and then applied to multiple scenarios in `Workflow` layer.
+The learning framework leverages the `Workflow` layer as well(e.g. sharing `Information Extractor`, creating environments based on `Execution Env`).
+
+Based on learning paradigms, they can be categorized into reinforcement learning and supervised learning.
+- For supervised learning, the detailed docs can be found [here](https://qlib.readthedocs.io/en/latest/component/model.html).
+- For reinforcement learning, the detailed docs can be found [here](https://qlib.readthedocs.io/en/latest/component/rl.html). Qlib's RL learning framework leverages `Execution Env` in `Workflow` layer to create environments. It's worth noting that `NestedExecutor` is supported as well. This empowers users to optimize different level of strategies/models/agents together (e.g. optimizing an order execution strategy for a specific portfolio management strategy).
+
+
# More About Qlib
If you want to have a quick glance at the most frequently used components of qlib, you can try notebooks [here](examples/tutorial/).
diff --git a/docs/_static/img/QlibRL_framework.png b/docs/_static/img/QlibRL_framework.png
new file mode 100644
index 0000000000..4ff221d5ec
Binary files /dev/null and b/docs/_static/img/QlibRL_framework.png differ
diff --git a/docs/_static/img/RL_framework.png b/docs/_static/img/RL_framework.png
new file mode 100644
index 0000000000..cb972b16f8
Binary files /dev/null and b/docs/_static/img/RL_framework.png differ
diff --git a/docs/_static/img/framework-abstract.jpg b/docs/_static/img/framework-abstract.jpg
new file mode 100644
index 0000000000..5d07130725
Binary files /dev/null and b/docs/_static/img/framework-abstract.jpg differ
diff --git a/docs/_static/img/framework.svg b/docs/_static/img/framework.svg
index 1217fd544c..80671ebe80 100644
--- a/docs/_static/img/framework.svg
+++ b/docs/_static/img/framework.svg
@@ -1,4 +1,4 @@
-
\ No newline at end of file
+
diff --git a/docs/component/highfreq.rst b/docs/component/highfreq.rst
index de385a8232..0af0fa2915 100644
--- a/docs/component/highfreq.rst
+++ b/docs/component/highfreq.rst
@@ -8,31 +8,33 @@ Design of Nested Decision Execution Framework for High-Frequency Trading
Introduction
============
-Daily trading (e.g. portfolio management) and intraday trading (e.g. orders execution) are two hot topics in Quant investment and usually studied separately.
+Daily trading (e.g. portfolio management) and intraday trading (e.g. orders execution) are two hot topics in Quant investment and are usually studied separately.
To get the join trading performance of daily and intraday trading, they must interact with each other and run backtest jointly.
-In order to support the joint backtest strategies in multiple levels, a corresponding framework is required. None of the publicly available high-frequency trading frameworks considers multi-level joint trading, which make the backtesting aforementioned inaccurate.
+In order to support the joint backtest strategies at multiple levels, a corresponding framework is required. None of the publicly available high-frequency trading frameworks considers multi-level joint trading, which makes the backtesting aforementioned inaccurate.
Besides backtesting, the optimization of strategies from different levels is not standalone and can be affected by each other.
-For example, the best portfolio management strategy may change with the performance of order executions(e.g. a portfolio with higher turnover may becomes a better choice when we improve the order execution strategies).
-To achieve the overall good performance , it is necessary to consider the interaction of strategies in different level.
+For example, the best portfolio management strategy may change with the performance of order executions(e.g. a portfolio with higher turnover may become a better choice when we improve the order execution strategies).
+To achieve overall good performance, it is necessary to consider the interaction of strategies at a different levels.
-Therefore, building a new framework for trading in multiple levels becomes necessary to solve the various problems mentioned above, for which we designed a nested decision execution framework that consider the interaction of strategies.
+Therefore, building a new framework for trading on multiple levels becomes necessary to solve the various problems mentioned above, for which we designed a nested decision execution framework that considers the interaction of strategies.
.. image:: ../_static/img/framework.svg
The design of the framework is shown in the yellow part in the middle of the figure above. Each level consists of ``Trading Agent`` and ``Execution Env``. ``Trading Agent`` has its own data processing module (``Information Extractor``), forecasting module (``Forecast Model``) and decision generator (``Decision Generator``). The trading algorithm generates the decisions by the ``Decision Generator`` based on the forecast signals output by the ``Forecast Module``, and the decisions generated by the trading algorithm are passed to the ``Execution Env``, which returns the execution results.
-The frequency of trading algorithm, decision content and execution environment can be customized by users (e.g. intraday trading, daily-frequency trading, weekly-frequency trading), and the execution environment can be nested with finer-grained trading algorithm and execution environment inside (i.e. sub-workflow in the figure, e.g. daily-frequency orders can be turned into finer-grained decisions by splitting orders within the day). The flexibility of nested decision execution framework makes it easy for users to explore the effects of combining different levels of trading strategies and break down the optimization barriers between different levels of trading algorithm.
+The frequency of the trading algorithm, decision content and execution environment can be customized by users (e.g. intraday trading, daily-frequency trading, weekly-frequency trading), and the execution environment can be nested with finer-grained trading algorithm and execution environment inside (i.e. sub-workflow in the figure, e.g. daily-frequency orders can be turned into finer-grained decisions by splitting orders within the day). The flexibility of the nested decision execution framework makes it easy for users to explore the effects of combining different levels of trading strategies and break down the optimization barriers between different levels of the trading algorithm.
+
+The optimization for the nested decision execution framework can be implemented with the support of `QlibRL `_. To know more about how to use the QlibRL, go to API Reference: `RL API <../reference/api.html#rl>`_.
Example
=======
-An example of nested decision execution framework for high-frequency can be found `here `_.
+An example of a nested decision execution framework for high-frequency can be found `here `_.
-Besides, the above examples, here are some other related work about high-frequency trading in Qlib.
+Besides, the above examples, here are some other related works about high-frequency trading in Qlib.
- `Prediction with high-frequency data `_
-- `Examples `_ to extract features form high-frequency data without fixed frequency.
+- `Examples `_ to extract features from high-frequency data without fixed frequency.
- `A paper `_ for high-frequency trading.
diff --git a/docs/component/rl/framework.rst b/docs/component/rl/framework.rst
new file mode 100644
index 0000000000..7edb08efd9
--- /dev/null
+++ b/docs/component/rl/framework.rst
@@ -0,0 +1,45 @@
+The Framework of QlibRL
+=======================
+
+QlibRL contains a full set of components that cover the entire lifecycle of an RL pipeline, including building the simulator of the market, shaping states & actions, training policies (strategies), and backtesting strategies in the simulated environment.
+
+QlibRL is basically implemented with the support of Tianshou and Gym frameworks. The high-level structure of QlibRL is demonstrated below:
+
+.. image:: ../../_static/img/QlibRL_framework.png
+ :width: 600
+ :align: center
+
+Here, we briefly introduce each component in the figure.
+
+EnvWrapper
+------------
+EnvWrapper is the complete capsulation of the simulated environment. It receives actions from outside (policy/strategy/agent), simulates the changes in the market, and then replies rewards and updated states, thus forming an interaction loop.
+
+In QlibRL, EnvWrapper is a subclass of gym.Env, so it implements all necessary interfaces of gym.Env. Any classes or pipelines that accept gym.Env should also accept EnvWrapper. Developers do not need to implement their own EnvWrapper to build their own environment. Instead, they only need to implement 4 components of the EnvWrapper:
+
+- `Simulator`
+ The simulator is the core component responsible for the environment simulation. Developers could implement all the logic that is directly related to the environment simulation in the Simulator in any way they like. In QlibRL, there are already two implementations of Simulator for single asset trading: 1) ``SingleAssetOrderExecution``, which is built based on Qlib's backtest toolkits and hence considers a lot of practical trading details but is slow. 2) ``SimpleSingleAssetOrderExecution``, which is built based on a simplified trading simulator, which ignores a lot of details (e.g. trading limitations, rounding) but is quite fast.
+- `State interpreter`
+ The state interpreter is responsible for "interpret" states in the original format (format provided by the simulator) into states in a format that the policy could understand. For example, transform unstructured raw features into numerical tensors.
+- `Action interpreter`
+ The action interpreter is similar to the state interpreter. But instead of states, it interprets actions generated by the policy, from the format provided by the policy to the format that is acceptable to the simulator.
+- `Reward function`
+ The reward function returns a numerical reward to the policy after each time the policy takes an action.
+
+EnvWrapper will organically organize these components. Such decomposition allows for better flexibility in development. For example, if the developers want to train multiple types of policies in the same environment, they only need to design one simulator and design different state interpreters/action interpreters/reward functions for different types of policies.
+
+QlibRL has well-defined base classes for all these 4 components. All the developers need to do is define their own components by inheriting the base classes and then implementing all interfaces required by the base classes. The API for the above base components can be found `here <../../reference/api.html#module-qlib.rl>`_.
+
+Policy
+------------
+QlibRL directly uses Tianshou's policy. Developers could use policies provided by Tianshou off the shelf, or implement their own policies by inheriting Tianshou's policies.
+
+Training Vessel & Trainer
+-------------------------
+As stated by their names, training vessels and trainers are helper classes used in training. A training vessel is a ship that contains a simulator/interpreters/reward function/policy, and it controls algorithm-related parts of training. Correspondingly, the trainer is responsible for controlling the runtime parts of training.
+
+As you may have noticed, a training vessel itself holds all the required components to build an EnvWrapper rather than holding an instance of EnvWrapper directly. This allows the training vessel to create duplicates of EnvWrapper dynamically when necessary (for example, under parallel training).
+
+With a training vessel, the trainer could finally launch the training pipeline by simple, Scikit-learn-like interfaces (i.e., ``trainer.fit()``).
+
+The API for Trainer and TrainingVessel and can be found `here <../../reference/api.html#module-qlib.rl.trainer>`_.
\ No newline at end of file
diff --git a/docs/component/rl/overall.rst b/docs/component/rl/overall.rst
new file mode 100644
index 0000000000..4f59dd17a7
--- /dev/null
+++ b/docs/component/rl/overall.rst
@@ -0,0 +1,50 @@
+=====================================================
+Reinforcement Learning in Quantitative Trading
+=====================================================
+
+Reinforcement Learning
+======================
+Different from supervised learning tasks such as classification tasks and regression tasks. Another important paradigm in machine learning is Reinforcement Learning,
+which attempts to optimize an accumulative numerical reward signal by directly interacting with the environment under a few assumptions such as Markov Decision Process(MDP).
+
+As demonstrated in the following figure, an RL system consists of four elements, 1)the agent 2) the environment the agent interacts with 3) the policy that the agent follows to take actions on the environment and 4)the reward signal from the environment to the agent.
+In general, the agent can perceive and interpret its environment, take actions and learn through reward, to seek long-term and maximum overall reward to achieve an optimal solution.
+
+.. image:: ../../_static/img/RL_framework.png
+ :width: 300
+ :align: center
+
+RL attempts to learn to produce actions by trial and error.
+By sampling actions and then observing which one leads to our desired outcome, a policy is obtained to generate optimal actions.
+In contrast to supervised learning, RL learns this not from a label but from a time-delayed label called a reward.
+This scalar value lets us know whether the current outcome is good or bad.
+In a word, the target of RL is to take actions to maximize reward.
+
+The Qlib Reinforcement Learning toolkit (QlibRL) is an RL platform for quantitative investment, which provides support to implement the RL algorithms in Qlib.
+
+
+Potential Application Scenarios in Quantitative Trading
+=======================================================
+RL methods have already achieved outstanding achievement in many applications, such as game playing, resource allocating, recommendation, marketing and advertising, etc.
+Investment is always a continuous process, taking the stock market as an example, investors need to control their positions and stock holdings by one or more buying and selling behaviors, to maximize the investment returns.
+Besides, each buy and sell decision is made by investors after fully considering the overall market information and stock information.
+From the view of an investor, the process could be described as a continuous decision-making process generated according to interaction with the market, such problems could be solved by the RL algorithms.
+Following are some scenarios where RL can potentially be used in quantitative investment.
+
+Portfolio Construction
+----------------------
+Portfolio construction is a process of selecting securities optimally by taking a minimum risk to achieve maximum returns. With an RL-based solution, an agent allocates stocks at every time step by obtaining information for each stock and the market. The key is to develop of policy for building a portfolio and make the policy able to pick the optimal portfolio.
+
+Order Execution
+---------------
+As a fundamental problem in algorithmic trading, order execution aims at fulfilling a specific trading order, either liquidation or acquirement, for a given instrument. Essentially, the goal of order execution is twofold: it not only requires to fulfill the whole order but also targets a more economical execution with maximizing profit gain (or minimizing capital loss). The order execution with only one order of liquidation or acquirement is called single-asset order execution.
+
+Considering stock investment always aim to pursue long-term maximized profits, it usually manifests as a sequential process of continuously adjusting the asset portfolios, execution for multiple orders, including order of liquidation and acquirement, brings more constraints and makes the sequence of execution for different orders should be considered, e.g. before executing an order to buy some stocks, we have to sell at least one stock. The order execution with multiple assets is called multi-asset order execution.
+
+According to the order execution’s trait of sequential decision-making, an RL-based solution could be applied to solve the order execution. With an RL-based solution, an agent optimizes execution strategy by interacting with the market environment.
+
+With QlibRL, the RL algorithm in the above scenarios can be easily implemented.
+
+Nested Portfolio Construction and Order Executor
+------------------------------------------------
+QlibRL makes it possible to jointly optimize different levels of strategies/models/agents. Take `Nested Decision Execution Framework `_ as an example, the optimization of order execution strategy and portfolio management strategies can interact with each other to maximize returns.
diff --git a/docs/component/rl/quickstart.rst b/docs/component/rl/quickstart.rst
new file mode 100644
index 0000000000..5e98e3baff
--- /dev/null
+++ b/docs/component/rl/quickstart.rst
@@ -0,0 +1,175 @@
+
+Quick Start
+============
+.. currentmodule:: qlib
+
+QlibRL provides an example of an implementation of a single asset order execution task and the following is an example of the config file to train with QlibRL.
+
+.. code-block:: yaml
+
+ simulator:
+ # Each step contains 30mins
+ time_per_step: 30
+ # Upper bound of volume, should be null or a float between 0 and 1, if it is a float, represent upper bound is calculated by the percentage of the market volume
+ vol_limit: null
+ env:
+ # Concurrent environment workers.
+ concurrency: 1
+ # dummy or subproc or shmem. Corresponding to `parallelism in tianshou `_.
+ parallel_mode: dummy
+ action_interpreter:
+ class: CategoricalActionInterpreter
+ kwargs:
+ # Candidate actions, it can be a list with length L: [a_1, a_2,..., a_L] or an integer n, in which case the list of length n+1 is auto-generated, i.e., [0, 1/n, 2/n,..., n/n].
+ values: 14
+ # Total number of steps (an upper-bound estimation)
+ max_step: 8
+ module_path: qlib.rl.order_execution.interpreter
+ state_interpreter:
+ class: FullHistoryStateInterpreter
+ kwargs:
+ # Number of dimensions in data.
+ data_dim: 6
+ # Equal to the total number of records. For example, in SAOE per minute, data_ticks is the length of the day in minutes.
+ data_ticks: 240
+ # The total number of steps (an upper-bound estimation). For example, 390min / 30min-per-step = 13 steps.
+ max_step: 8
+ # Provider of the processed data.
+ processed_data_provider:
+ class: PickleProcessedDataProvider
+ module_path: qlib.rl.data.pickle_styled
+ kwargs:
+ data_dir: ./data/pickle_dataframe/feature
+ module_path: qlib.rl.order_execution.interpreter
+ reward:
+ class: PAPenaltyReward
+ kwargs:
+ # The penalty for a large volume in a short time.
+ penalty: 100.0
+ module_path: qlib.rl.order_execution.reward
+ data:
+ source:
+ order_dir: ./data/training_order_split
+ data_dir: ./data/pickle_dataframe/backtest
+ # number of time indexes
+ total_time: 240
+ # start time index
+ default_start_time: 0
+ # end time index
+ default_end_time: 240
+ proc_data_dim: 6
+ num_workers: 0
+ queue_size: 20
+ network:
+ class: Recurrent
+ module_path: qlib.rl.order_execution.network
+ policy:
+ class: PPO
+ kwargs:
+ lr: 0.0001
+ module_path: qlib.rl.order_execution.policy
+ runtime:
+ seed: 42
+ use_cuda: false
+ trainer:
+ max_epoch: 2
+ # Number of episodes collected in each training iteration
+ repeat_per_collect: 5
+ earlystop_patience: 2
+ # Episodes per collect at training.
+ episode_per_collect: 20
+ batch_size: 16
+ # Perform validation every n iterations
+ val_every_n_epoch: 1
+ checkpoint_path: ./checkpoints
+ checkpoint_every_n_iters: 1
+
+
+And the config file for backtesting:
+
+.. code-block:: yaml
+
+ order_file: ./data/backtest_orders.csv
+ start_time: "9:45"
+ end_time: "14:44"
+ qlib:
+ provider_uri_1min: ./data/bin
+ feature_root_dir: ./data/pickle
+ # feature generated by today's information
+ feature_columns_today: [
+ "$open", "$high", "$low", "$close", "$vwap", "$volume",
+ ]
+ # feature generated by yesterday's information
+ feature_columns_yesterday: [
+ "$open_v1", "$high_v1", "$low_v1", "$close_v1", "$vwap_v1", "$volume_v1",
+ ]
+ exchange:
+ # the expression for buying and selling stock limitation
+ limit_threshold: ['$close == 0', '$close == 0']
+ # deal price for buying and selling
+ deal_price: ["If($close == 0, $vwap, $close)", "If($close == 0, $vwap, $close)"]
+ volume_threshold:
+ # volume limits are both buying and selling, "cum" means that this is a cumulative value over time
+ all: ["cum", "0.2 * DayCumsum($volume, '9:45', '14:44')"]
+ # the volume limits of buying
+ buy: ["current", "$close"]
+ # the volume limits of selling, "current" means that this is a real-time value and will not accumulate over time
+ sell: ["current", "$close"]
+ strategies:
+ 30min:
+ class: TWAPStrategy
+ module_path: qlib.contrib.strategy.rule_strategy
+ kwargs: {}
+ 1day:
+ class: SAOEIntStrategy
+ module_path: qlib.rl.order_execution.strategy
+ kwargs:
+ state_interpreter:
+ class: FullHistoryStateInterpreter
+ module_path: qlib.rl.order_execution.interpreter
+ kwargs:
+ max_step: 8
+ data_ticks: 240
+ data_dim: 6
+ processed_data_provider:
+ class: PickleProcessedDataProvider
+ module_path: qlib.rl.data.pickle_styled
+ kwargs:
+ data_dir: ./data/pickle_dataframe/feature
+ action_interpreter:
+ class: CategoricalActionInterpreter
+ module_path: qlib.rl.order_execution.interpreter
+ kwargs:
+ values: 14
+ max_step: 8
+ network:
+ class: Recurrent
+ module_path: qlib.rl.order_execution.network
+ kwargs: {}
+ policy:
+ class: PPO
+ module_path: qlib.rl.order_execution.policy
+ kwargs:
+ lr: 1.0e-4
+ # Local path to the latest model. The model is generated during training, so please run training first if you want to run backtest with a trained policy. You could also remove this parameter file to run backtest with a randomly initialized policy.
+ weight_file: ./checkpoints/latest.pth
+ # Concurrent environment workers.
+ concurrency: 5
+
+With the above config files, you can start training the agent by the following command:
+
+.. code-block:: console
+
+ $ python -m qlib.rl.contrib.train_onpolicy.py --config_path train_config.yml
+
+After the training, you can backtest with the following command:
+
+.. code-block:: console
+
+ $ python -m qlib.rl.contrib.backtest.py --config_path backtest_config.yml
+
+In that case, :class:`~qlib.rl.order_execution.simulator_qlib.SingleAssetOrderExecution` and :class:`~qlib.rl.order_execution.simulator_simple.SingleAssetOrderExecutionSimple` as examples for simulator, :class:`qlib.rl.order_execution.interpreter.FullHistoryStateInterpreter` and :class:`qlib.rl.order_execution.interpreter.CategoricalActionInterpreter` as examples for interpreter, :class:`qlib.rl.order_execution.policy.PPO` as an example for policy, and :class:`qlib.rl.order_execution.reward.PAPenaltyReward` as an example for reward.
+For the single asset order execution task, if developers have already defined their simulator/interpreters/reward function/policy, they could launch the training and backtest pipeline by simply modifying the corresponding settings in the config files.
+The details about the example can be found `here `_.
+
+In the future, we will provide more examples for different scenarios such as RL-based portfolio construction.
diff --git a/docs/component/rl/toctree.rst b/docs/component/rl/toctree.rst
new file mode 100644
index 0000000000..d79d5e060d
--- /dev/null
+++ b/docs/component/rl/toctree.rst
@@ -0,0 +1,10 @@
+.. _rl:
+
+========================================================================
+Reinforcement Learning in Quantitative Trading
+========================================================================
+
+.. toctree::
+ Overall
+ Quick Start
+ Framework
diff --git a/docs/index.rst b/docs/index.rst
index 71ed8ccec5..0d8cad81ad 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -33,7 +33,7 @@ Document Structure
.. toctree::
:maxdepth: 3
- :caption: COMPONENTS:
+ :caption: MAIN COMPONENTS:
Workflow: Workflow Management
Data Layer: Data Framework & Usage
@@ -44,10 +44,11 @@ Document Structure
Qlib Recorder: Experiment Management
Analysis: Evaluation & Results Analysis
Online Serving: Online Management & Strategy & Tool
+ Reinforcement Learning
.. toctree::
:maxdepth: 3
- :caption: ADVANCED TOPICS:
+ :caption: OTHER COMPONENTS/FEATURES/TOPICS:
Building Formulaic Alphas
Online & Offline mode
diff --git a/docs/introduction/introduction.rst b/docs/introduction/introduction.rst
index 8ca2b41be0..52d58e1639 100644
--- a/docs/introduction/introduction.rst
+++ b/docs/introduction/introduction.rst
@@ -15,38 +15,56 @@ With ``Qlib``, users can easily try their ideas to create better Quant investmen
Framework
=========
+
.. image:: ../_static/img/framework.svg
:align: center
At the module level, Qlib is a platform that consists of above components. The components are designed as loose-coupled modules and each component could be used stand-alone.
+This framework may be intimidating for new users to Qlib. It tries to accurately include a lot of details of Qlib's design.
+For users new to Qlib, you can skip it first and read it later.
+
+
+=========================== ==============================================================================
+Name Description
+=========================== ==============================================================================
+`Infrastructure` layer `Infrastructure` layer provides underlying support for Quant research.
+ `DataServer` provides high-performance infrastructure for users to manage
+ and retrieve raw data. `Trainer` provides flexible interface to control
+ the training process of models which enable algorithms controlling the
+ training process.
-======================== ==============================================================================
-Name Description
-======================== ==============================================================================
-`Infrastructure` layer `Infrastructure` layer provides underlying support for Quant research.
- `DataServer` provides high-performance infrastructure for users to manage
- and retrieve raw data. `Trainer` provides flexible interface to control
- the training process of models which enable algorithms controlling the
- training process.
-
-`Workflow` layer `Workflow` layer covers the whole workflow of quantitative investment.
- `Information Extractor` extracts data for models. `Forecast Model` focuses
- on producing all kinds of forecast signals (e.g. *alpha*, risk) for other
- modules. With these signals `Decision Generator` will generate the target
- trading decisions(i.e. portfolio, orders) to be executed by `Execution Env`
- (i.e. the trading market). There may be multiple levels of `Trading Agent`
- and `Execution Env` (e.g. an *order executor trading agent and intraday
- order execution environment* could behave like an interday trading
- environment and nested in *daily portfolio management trading agent and
- interday trading environment* )
-
-`Interface` layer `Interface` layer tries to present a user-friendly interface for the underlying
- system. `Analyser` module will provide users detailed analysis reports of
- forecasting signals, portfolios and execution results
-======================== ==============================================================================
+`Learning Framework` layer The `Forecast Model` and `Trading Agent` are learnable. They are learned
+ based on the `Learning Framework` layer and then applied to multiple scenarios
+ in `Workflow` layer. The supported learning paradigms can be categorized into
+ reinforcement learning and supervised learning. The learning framework
+ leverages the `Workflow` layer as well(e.g. sharing `Information Extractor`,
+ creating environments based on `Execution Env`).
+
+`Workflow` layer `Workflow` layer covers the whole workflow of quantitative investment.
+ Both supervised-learning-based strategies and RL-based Strategies
+ are supported.
+ `Information Extractor` extracts data for models. `Forecast Model` focuses
+ on producing all kinds of forecast signals (e.g. *alpha*, risk) for other
+ modules. With these signals `Decision Generator` will generate the target
+ trading decisions(i.e. portfolio, orders)
+ If RL-based Strategies are adopted, the `Policy` is learned in a end-to-end way,
+ the trading deicsions are generated directly.
+ Decisions will be executed by `Execution Env`
+ (i.e. the trading market). There may be multiple levels of `Strategy`
+ and `Executor` (e.g. an *order executor trading strategy and intraday order executor*
+ could behave like an interday trading loop and be nested in
+ *daily portfolio management trading strategy and interday trading executor*
+ trading loop)
+
+`Interface` layer `Interface` layer tries to present a user-friendly interface for the underlying
+ system. `Analyser` module will provide users detailed analysis reports of
+ forecasting signals, portfolios and execution results
+=========================== ==============================================================================
- The modules with hand-drawn style are under development and will be released in the future.
- The modules with dashed borders are highly user-customizable and extendible.
+
+(p.s. framework image is created with https://draw.io/)
diff --git a/docs/reference/api.rst b/docs/reference/api.rst
index 06d89b9a89..98f50fc281 100644
--- a/docs/reference/api.rst
+++ b/docs/reference/api.rst
@@ -256,3 +256,36 @@ Serializable
.. automodule:: qlib.utils.serial.Serializable
:members:
+
+RL
+==============
+
+Base Component
+--------------
+.. automodule:: qlib.rl
+ :members:
+ :imported-members:
+
+Strategy
+--------
+.. automodule:: qlib.rl.strategy
+ :members:
+ :imported-members:
+
+Trainer
+-------
+.. automodule:: qlib.rl.trainer
+ :members:
+ :imported-members:
+
+Order Execution
+---------------
+.. automodule:: qlib.rl.order_execution
+ :members:
+ :imported-members:
+
+Utils
+---------------
+.. automodule:: qlib.rl.utils
+ :members:
+ :imported-members:
\ No newline at end of file
diff --git a/examples/rl/README.md b/examples/rl/README.md
index db5cdf20d7..d8b4f4e493 100644
--- a/examples/rl/README.md
+++ b/examples/rl/README.md
@@ -41,7 +41,7 @@ data
Run:
```
-python ../../qlib/rl/contrib/train_onpolicy.py --config_path ./experiment_config/training/config.yml
+python -m qlib.rl.contrib.train_onpolicy.py --config_path ./experiment_config/training/config.yml
```
After training, checkpoints will be stored under `checkpoints/`.
@@ -49,7 +49,7 @@ After training, checkpoints will be stored under `checkpoints/`.
## Run backtest
```
-python ../../qlib/rl/contrib/backtest.py --config_path ./experiment_config/backtest/config.py
+python -m qlib.rl.contrib.backtest.py --config_path ./experiment_config/backtest/config.yml
```
The backtest workflow will use the trained model in `checkpoints/`. The backtest summary can be found in `outputs/`.
diff --git a/examples/rl/experiment_config/backtest/config.py b/examples/rl/experiment_config/backtest/config.py
deleted file mode 100644
index 9ac8357895..0000000000
--- a/examples/rl/experiment_config/backtest/config.py
+++ /dev/null
@@ -1,53 +0,0 @@
-_base_ = ["./twap.yml"]
-
-strategies = {
- "_delete_": True,
- "30min": {
- "class": "TWAPStrategy",
- "module_path": "qlib.contrib.strategy.rule_strategy",
- "kwargs": {},
- },
- "1day": {
- "class": "SAOEIntStrategy",
- "module_path": "qlib.rl.order_execution.strategy",
- "kwargs": {
- "state_interpreter": {
- "class": "FullHistoryStateInterpreter",
- "module_path": "qlib.rl.order_execution.interpreter",
- "kwargs": {
- "max_step": 8,
- "data_ticks": 240,
- "data_dim": 6,
- "processed_data_provider": {
- "class": "PickleProcessedDataProvider",
- "module_path": "qlib.rl.data.pickle_styled",
- "kwargs": {
- "data_dir": "./data/pickle_dataframe/feature",
- },
- },
- },
- },
- "action_interpreter": {
- "class": "CategoricalActionInterpreter",
- "module_path": "qlib.rl.order_execution.interpreter",
- "kwargs": {
- "values": 14,
- "max_step": 8,
- },
- },
- "network": {
- "class": "Recurrent",
- "module_path": "qlib.rl.order_execution.network",
- "kwargs": {},
- },
- "policy": {
- "class": "PPO",
- "module_path": "qlib.rl.order_execution.policy",
- "kwargs": {
- "lr": 1.0e-4,
- "weight_file": "./checkpoints/latest.pth",
- },
- },
- },
- },
-}
diff --git a/examples/rl/experiment_config/backtest/config.yml b/examples/rl/experiment_config/backtest/config.yml
new file mode 100644
index 0000000000..418780c2cc
--- /dev/null
+++ b/examples/rl/experiment_config/backtest/config.yml
@@ -0,0 +1,57 @@
+order_file: ./data/backtest_orders.csv
+start_time: "9:45"
+end_time: "14:44"
+qlib:
+ provider_uri_1min: ./data/bin
+ feature_root_dir: ./data/pickle
+ feature_columns_today: [
+ "$open", "$high", "$low", "$close", "$vwap", "$volume",
+ ]
+ feature_columns_yesterday: [
+ "$open_v1", "$high_v1", "$low_v1", "$close_v1", "$vwap_v1", "$volume_v1",
+ ]
+exchange:
+ limit_threshold: ['$close == 0', '$close == 0']
+ deal_price: ["If($close == 0, $vwap, $close)", "If($close == 0, $vwap, $close)"]
+ volume_threshold:
+ all: ["cum", "0.2 * DayCumsum($volume, '9:45', '14:44')"]
+ buy: ["current", "$close"]
+ sell: ["current", "$close"]
+strategies:
+ 30min:
+ class: TWAPStrategy
+ module_path: qlib.contrib.strategy.rule_strategy
+ kwargs: {}
+ 1day:
+ class: SAOEIntStrategy
+ module_path: qlib.rl.order_execution.strategy
+ kwargs:
+ state_interpreter:
+ class: FullHistoryStateInterpreter
+ module_path: qlib.rl.order_execution.interpreter
+ kwargs:
+ max_step: 8
+ data_ticks: 240
+ data_dim: 6
+ processed_data_provider:
+ class: PickleProcessedDataProvider
+ module_path: qlib.rl.data.pickle_styled
+ kwargs:
+ data_dir: ./data/pickle_dataframe/feature
+ action_interpreter:
+ class: CategoricalActionInterpreter
+ module_path: qlib.rl.order_execution.interpreter
+ kwargs:
+ values: 14
+ max_step: 8
+ network:
+ class: Recurrent
+ module_path: qlib.rl.order_execution.network
+ kwargs: {}
+ policy:
+ class: PPO
+ module_path: qlib.rl.order_execution.policy
+ kwargs:
+ lr: 1.0e-4
+ weight_file: ./checkpoints/latest.pth
+concurrency: 5
diff --git a/examples/rl/experiment_config/backtest/twap.yml b/examples/rl/experiment_config/backtest/twap.yml
deleted file mode 100644
index e0c342502b..0000000000
--- a/examples/rl/experiment_config/backtest/twap.yml
+++ /dev/null
@@ -1,21 +0,0 @@
-order_file: ./data/backtest_orders.csv
-start_time: "9:45"
-end_time: "14:44"
-qlib:
- provider_uri_1min: ./data/bin
- feature_root_dir: ./data/pickle
- feature_columns_today: [
- "$open", "$high", "$low", "$close", "$vwap", "$volume",
- ]
- feature_columns_yesterday: [
- "$open_v1", "$high_v1", "$low_v1", "$close_v1", "$vwap_v1", "$volume_v1",
- ]
-exchange:
- limit_threshold: ['$close == 0', '$close == 0']
- deal_price: ["If($close == 0, $vwap, $close)", "If($close == 0, $vwap, $close)"]
- volume_threshold:
- all: ["cum", "0.2 * DayCumsum($volume, '9:45', '14:44')"]
- buy: ["current", "$close"]
- sell: ["current", "$close"]
-strategies: {} # Placeholder
-concurrency: 5
diff --git a/qlib/rl/__init__.py b/qlib/rl/__init__.py
index 59e481eb93..a12afc3996 100644
--- a/qlib/rl/__init__.py
+++ b/qlib/rl/__init__.py
@@ -1,2 +1,8 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
+
+from .interpreter import Interpreter, StateInterpreter, ActionInterpreter
+from .reward import Reward, RewardCombination
+from .simulator import Simulator
+
+__all__ = ["Interpreter", "StateInterpreter", "ActionInterpreter", "Reward", "RewardCombination", "Simulator"]
diff --git a/qlib/rl/order_execution/__init__.py b/qlib/rl/order_execution/__init__.py
index b7b47c3d15..318c774230 100644
--- a/qlib/rl/order_execution/__init__.py
+++ b/qlib/rl/order_execution/__init__.py
@@ -6,8 +6,33 @@
Multi-asset is on the way.
"""
-from .interpreter import *
-from .network import *
-from .policy import *
-from .reward import *
-from .simulator_simple import *
+from .interpreter import (
+ FullHistoryStateInterpreter,
+ CurrentStepStateInterpreter,
+ CategoricalActionInterpreter,
+ TwapRelativeActionInterpreter,
+)
+from .network import Recurrent
+from .policy import AllOne, PPO
+from .reward import PAPenaltyReward
+from .simulator_simple import SingleAssetOrderExecutionSimple
+from .state import SAOEStateAdapter, SAOEMetrics, SAOEState
+from .strategy import SAOEStrategy, ProxySAOEStrategy, SAOEIntStrategy
+
+__all__ = [
+ "FullHistoryStateInterpreter",
+ "CurrentStepStateInterpreter",
+ "CategoricalActionInterpreter",
+ "TwapRelativeActionInterpreter",
+ "Recurrent",
+ "AllOne",
+ "PPO",
+ "PAPenaltyReward",
+ "SingleAssetOrderExecutionSimple",
+ "SAOEStateAdapter",
+ "SAOEMetrics",
+ "SAOEState",
+ "SAOEStrategy",
+ "ProxySAOEStrategy",
+ "SAOEIntStrategy",
+]
diff --git a/qlib/rl/strategy/__init__.py b/qlib/rl/strategy/__init__.py
index 59e481eb93..26e12580ba 100644
--- a/qlib/rl/strategy/__init__.py
+++ b/qlib/rl/strategy/__init__.py
@@ -1,2 +1,5 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
+from .single_order import SingleOrderStrategy
+
+__all__ = ["SingleOrderStrategy"]
diff --git a/qlib/rl/trainer/__init__.py b/qlib/rl/trainer/__init__.py
index 0a197b3781..4c5121ecec 100644
--- a/qlib/rl/trainer/__init__.py
+++ b/qlib/rl/trainer/__init__.py
@@ -7,3 +7,5 @@
from .callbacks import Checkpoint, EarlyStopping
from .trainer import Trainer
from .vessel import TrainingVessel, TrainingVesselBase
+
+__all__ = ["Trainer", "TrainingVessel", "TrainingVesselBase", "Checkpoint", "EarlyStopping", "train", "backtest"]
diff --git a/setup.py b/setup.py
index 8ce3b93f6e..a796ecf4b7 100644
--- a/setup.py
+++ b/setup.py
@@ -1,6 +1,5 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
-import io
import os
import numpy
@@ -26,9 +25,6 @@ def get_version(rel_path: str) -> str:
DESCRIPTION = "A Quantitative-research Platform"
REQUIRES_PYTHON = ">=3.5.0"
-from pathlib import Path
-from shutil import copyfile
-
VERSION = get_version("qlib/__init__.py")
# Detect Cython
@@ -148,15 +144,16 @@ def get_version(rel_path: str) -> str:
# References: https://github.com/python/typeshed/issues/8799
"mypy<0.981",
"flake8",
+ # The 5.0.0 version of importlib-metadata removed the deprecated endpoint,
+ # which prevented flake8 from working properly, so we restricted the version of importlib-metadata.
+ # To help ensure the dependencies of flake8 https://github.com/python/importlib_metadata/issues/406
+ "importlib-metadata<5.0.0",
"readthedocs_sphinx_ext",
"cmake",
"lxml",
"baostock",
"yahooquery",
"beautifulsoup4",
- # The 5.0.0 version of importlib-metadata removed the deprecated endpoint,
- # which prevented flake8 from working properly, so we restricted the version of importlib-metadata.
- "importlib-metadata<5.0.0",
"tianshou",
"gym>=0.24", # If you do not put gym at the end, gym will degrade causing pytest results to fail.
],