Qlib RL framework (stage 1) - single-asset order execution #1076

ultmaster · 2022-04-25T13:10:25Z

This is the first stage of a systematic Qlib RL support (aka NeuTrader), as planned in #1011.

What this PR includes:

Basic framework: simulator, state/action-interpreter, reward, logger.
Utilities: data queue, finite env, env wrapper.
The first single-asset order execution simulator built upon "OPD-styled" data, along with several interpreters and basic policies.
Tests. Unit-tests and an end-to-end runnable tests for TWAP strategy.

What this PR will include but haven't been completed so far (we can discuss about whether they can be deferred to stage 2):

[MUST DO] Put the test data on the shared storage, and check the correctness of several policy checkpoints, against the old framework.
The script to train a new policy. -- will be in stage 2

What this PR won't include:

More loggers including tensorboard, mlflow, memory buffer.
Fix the compatibility with non-linux platforms.
Integration with Qlib workflow.
Integration with Qlib config system - launching backtest / training via config.
The "RLStrategy" that enables testing a trained policy in qlib.backtest for deployment.
The second SAOE simulator that is built upon qlib.backtest.
Performance optimization: prefetching data, caching.
Multi-agent.
Tasks other than SAOE.

As this PR is already quite large by itself, I suggest reviewing it first, and collecting some feedbacks.

Please squash this PR when merging. The commit history is too messy.

matluster · 2022-05-12T14:47:19Z

@lihuoran Are you using some tools like pylance / pylint / pyright to scan the code? We already have mypy on the Github actions. But depending on your suggestions, I guess it doesn't harm to have one or more linters on the CI.

Comments I didn't reply to shall be considered resolved. The changes have been pushed.

lihuoran · 2022-05-13T02:02:01Z

@matluster I use PyCharm as my local IDE and PyCharm uses PEP8 to inspect code. Some of the issues will be reported while editing rather than being reported by mypy. The other issues are majorly my personal experiences, so it is fine if the team think they are unnecessary.

matluster · 2022-05-13T17:29:01Z

@matluster I use PyCharm as my local IDE and PyCharm uses PEP8 to inspect code. Some of the issues will be reported while editing rather than being reported by mypy. The other issues are majorly my personal experiences, so it is fine if the team think they are unnecessary.

I failed to see how pep8 can generate those messages you listed above.

https://pep8.readthedocs.io/en/release-1.7.x/intro.html#error-codes

lihuoran · 2022-05-15T04:01:11Z

@matluster I use PyCharm as my local IDE and PyCharm uses PEP8 to inspect code. Some of the issues will be reported while editing rather than being reported by mypy. The other issues are majorly my personal experiences, so it is fine if the team think they are unnecessary.

I failed to see how pep8 can generate those messages you listed above.

https://pep8.readthedocs.io/en/release-1.7.x/intro.html#error-codes

Not all of the issues I mentioned above are reported by PEP8. Some of them (for example, the mismatched parameter type issue) are reported by the internal checker of PyCharm. I think for the code format issues, we should NOT reply on the type of IDEs, so we could see if we can find an elegant way to enrich the current format checkers at a proper level. For "IDE related" issues, I will report them case by case and we could discuss whether they should be fixed.

It may take me a while to get familiar with our coding standards. After that, I should have a better grasp of which issues should be addressed and which ones should be ignored.

qlib/log.py

qlib/rl/data/__init__.py

you-n-g · 2022-05-11T03:17:48Z

qlib/rl/data/pickle_styled.py

@@ -0,0 +1,251 @@
+# Copyright (c) Microsoft Corporation.


It looks like this will not be shared by users
Will it be better to place it into qlib/contrib/rl?

I think qlib.rl is a self-contained package, and pickle-styled data is the only type of supported data format. Let's keep it here for now.

It looks like this feature provides a similar interface with

qlib/qlib/backtest/high_performance_ds.py

Line 35 in c428112

def get_data(

Maybe we can merge it in the future and leave an NOTE/TODO here.

qlib/rl/utils/data_queue.py

qlib/rl/utils/finite_env.py

you-n-g · 2022-05-15T15:12:11Z

tests/rl/test_saoe_simple.py

@@ -0,0 +1,306 @@
+# Copyright (c) Microsoft Corporation.


@ultmaster @matluster
I'm trying to debug it.
First, I try to change it into a single process instead of multi-processing.
So, I made the following changes for DummyVectorEnv

But I got the following error when I ran it again.

Is it the correct way to debug the program?

matluster · 2022-05-16T07:33:17Z

@matluster I use PyCharm as my local IDE and PyCharm uses PEP8 to inspect code. Some of the issues will be reported while editing rather than being reported by mypy. The other issues are majorly my personal experiences, so it is fine if the team think they are unnecessary.

I failed to see how pep8 can generate those messages you listed above.
https://pep8.readthedocs.io/en/release-1.7.x/intro.html#error-codes

Not all of the issues I mentioned above are reported by PEP8. Some of them (for example, the mismatched parameter type issue) are reported by the internal checker of PyCharm. I think for the code format issues, we should NOT reply on the type of IDEs, so we could see if we can find an elegant way to enrich the current format checkers at a proper level. For "IDE related" issues, I will report them case by case and we could discuss whether they should be fixed.

It may take me a while to get familiar with our coding standards. After that, I should have a better grasp of which issues should be addressed and which ones should be ignored.

I think it could be another task: adding another linter to qlib CI.

Let's keep as is for now. This PR is already super fat. :)

lihuoran · 2022-05-16T08:17:31Z

@matluster I use PyCharm as my local IDE and PyCharm uses PEP8 to inspect code. Some of the issues will be reported while editing rather than being reported by mypy. The other issues are majorly my personal experiences, so it is fine if the team think they are unnecessary.

I failed to see how pep8 can generate those messages you listed above.
https://pep8.readthedocs.io/en/release-1.7.x/intro.html#error-codes

Not all of the issues I mentioned above are reported by PEP8. Some of them (for example, the mismatched parameter type issue) are reported by the internal checker of PyCharm. I think for the code format issues, we should NOT reply on the type of IDEs, so we could see if we can find an elegant way to enrich the current format checkers at a proper level. For "IDE related" issues, I will report them case by case and we could discuss whether they should be fixed.
It may take me a while to get familiar with our coding standards. After that, I should have a better grasp of which issues should be addressed and which ones should be ignored.

I think it could be another task: adding another linter to qlib CI.

Let's keep as is for now. This PR is already super fat. :)

Sure~
Please pay attention to the logic parts and we can leave the format parts to future work.

you-n-g · 2022-05-17T02:36:20Z

@ultmaster @matluster ultmaster#1 Could you please merge this PR?

qlib/rl/entries/test.py

you-n-g · 2022-05-17T08:49:42Z

tests/rl/test_saoe_simple.py

@@ -0,0 +1,306 @@
+# Copyright (c) Microsoft Corporation.


@ultmaster @matluster
Thanks for your hot-fix

I try to debug. Here is some information I got.

I start the test by pytest -s --pdb --disable-warnings /sdc/home/xiaoyang/re pos/qlib-main/libs/qlib/tests/rl/test_saoe_simple.py::test_twap_strategy
It create 4 DummyVectorEnv and here is the first one.
It uses the deal price all day to get twap_price.

I skip the next 3 initialization of DummyVectorEnv and stop at the first step in the first DummyVectorEnv
It uses the first 30 tick market price to get PA in the first step

IIUC, the PA at this step will not be zero (though the average PA all day will be 0).

But It tries to assert that the PA of every step is zero. I don't think it is reasonable assert.

you-n-g · 2022-05-17T08:58:22Z

qlib/rl/data/pickle_styled.py

@@ -0,0 +1,251 @@
+# Copyright (c) Microsoft Corporation.


It looks like this feature provides a similar interface with

qlib/qlib/backtest/high_performance_ds.py

Line 35 in c428112

def get_data(

Maybe we can merge it in the future and leave an NOTE/TODO here.

you-n-g · 2022-05-17T15:11:23Z

qlib/rl/utils/env_wrapper.py

+    reward_fn
+        A callable that accepts the StateType and returns a float (at least in single-agent case).
+    aux_info_collector
+        Collect auxiliary informations. Could be useful in MARL.


Suggested change

Collect auxiliary informations. Could be useful in MARL.

Collect auxiliary information. Could be useful in MARL.

you-n-g · 2022-05-17T15:27:17Z

qlib/rl/utils/env_wrapper.py

+        logger: LogCollector | None = None,
+    ):
+        # assign weak reference to wrapper
+        for obj in [state_interpreter, action_interpreter, reward_fn, aux_info_collector]:


Is this only for faster garbage collecting?
What will happen if weakref is not used?

Please add some comments about it if there are other purposes

Comments added.

you-n-g · 2022-05-17T15:29:20Z

qlib/rl/utils/env_wrapper.py

+
+        if seed_iterator is None:
+            # in this case, there won't be any seed for simulator
+            self.seed_iterator = SEED_INTERATOR_MISSING


Why can't we simply use None instead?

Comments added.

some typos

…#1076) * rl init * aux info * Reward config * update * simple * update saoe init * update simulator and seed * minor * minor * update sim * checkpoint * obs * Update interpreter * init qlib simulator * checkpoint * Refine codebase * checkpoint * checkpoint * Add one test * More tests * Simulator checkpoint * checkpoint * First-step tested * Checkpoint * Update data_queue API * Checkpoint * Update test * Move files * Checkpoint * Single-quote -> double-quote * Fix finite env tests * Tested with mypy * pep-574 * No call for env done * Update finite env docs * Fix csv writer * Refine tester * Update logger * Add another logger test * Checkpoint * Add network sanity test * steps per episode is not correct * Cleanup code, ready for PR * Reformat with black * Fix pylint for py37 * Fix lint * Fix lint * Fix flake * update mypy command * mypy * Update exclude pattern * Use pyproject.toml * test * . * . * Refactor pipeline * . * defaults run bash * . * Revert and skip follow_imports * Fix toml issue * fix mypy * . * . * . * Fix install * Minor fix * Fix test * Fix test * Remove requirements * Revert * fix tests * Fix lint * . * . * . * . * . * update install from source command * . * Fix data download * . * . * . * . * . * . * Fix py37 * Ignore tests on non-linux * resolve comments * fix tests * resolve comments * some typo * style updates * More comments * fix dummy * add warning * Align precision in some system * Added some impl notes Co-authored-by: Young <[email protected]>

ultmaster and others added 30 commits February 28, 2022 17:31

rl init

f09af43

aux info

65e775e

Reward config

f761e35

update

78f6662

simple

0eaea69

update saoe init

cb6dd9b

update simulator and seed

7ae01ea

minor

7693bed

minor

2613919

update sim

3a88e61

checkpoint

7a79b7f

obs

c5fd45c

Update interpreter

f86e664

init qlib simulator

736993a

checkpoint

ac57fa0

Refine codebase

fd3a9b3

Merge branch 'main' of https://github.com/microsoft/qlib into rl-simple

fe17bb3

checkpoint

2bfb979

checkpoint

587dd70

Add one test

80e8181

More tests

f915aa8

Simulator checkpoint

71ddc95

checkpoint

d41c68f

First-step tested

99fa4f0

Checkpoint

f245ab8

Update data_queue API

e035f3b

Checkpoint

0d3055b

Update test

0848b15

Move files

4b89907

Checkpoint

b1997cc

fix tests

2958c77

resolve comments

38bcf7e

you-n-g self-assigned this May 15, 2022

some typo

267cb67

you-n-g reviewed May 15, 2022

View reviewed changes

ultmaster added 3 commits May 16, 2022 06:58

style updates

4063f78

More comments

daa510f

fix dummy

60157a9

you-n-g removed their assignment May 16, 2022

you-n-g reviewed May 17, 2022

View reviewed changes

ultmaster and others added 6 commits May 18, 2022 01:45

add warning

68a88bb

Align precision in some system

25a26c0

Merge pull request #2 from you-n-g/rl-simple

a3ece1c

Added some impl notes

6292958

Merge branch 'rl-simple' of github.com:ultmaster/qlib into rl-simple

d649c2e

Merge pull request #1 from you-n-g/rl-simple-typo

5da6adb

some typos

you-n-g merged commit 9a40fd3 into microsoft:main May 21, 2022

matluster mentioned this pull request May 27, 2022

[Proposal] Systematic RL support in qlib #1011

Open

ultmaster mentioned this pull request Jun 13, 2022

Qlib RL framework (stage 2) - trainer #1125

Merged

5 tasks

you-n-g added documentation Improvements or additions to documentation enhancement New feature or request labels Jun 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qlib RL framework (stage 1) - single-asset order execution #1076

Qlib RL framework (stage 1) - single-asset order execution #1076

ultmaster commented Apr 25, 2022 •

edited

Loading

matluster commented May 12, 2022

lihuoran commented May 13, 2022

matluster commented May 13, 2022

lihuoran commented May 15, 2022

you-n-g May 11, 2022

matluster May 16, 2022

you-n-g May 17, 2022

you-n-g May 15, 2022

matluster commented May 16, 2022

lihuoran commented May 16, 2022

you-n-g commented May 17, 2022

you-n-g May 17, 2022 •

edited

Loading

you-n-g May 17, 2022

you-n-g May 17, 2022

you-n-g May 17, 2022

matluster May 20, 2022

you-n-g May 17, 2022

matluster May 20, 2022

	Collect auxiliary informations. Could be useful in MARL.
	Collect auxiliary information. Could be useful in MARL.

Qlib RL framework (stage 1) - single-asset order execution #1076

Qlib RL framework (stage 1) - single-asset order execution #1076

Conversation

ultmaster commented Apr 25, 2022 • edited Loading

matluster commented May 12, 2022

lihuoran commented May 13, 2022

matluster commented May 13, 2022

lihuoran commented May 15, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matluster commented May 16, 2022

lihuoran commented May 16, 2022

you-n-g commented May 17, 2022

you-n-g May 17, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ultmaster commented Apr 25, 2022 •

edited

Loading

you-n-g May 17, 2022 •

edited

Loading