-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refine RL todos #1332
Refine RL todos #1332
Conversation
qlib/rl/utils/env_wrapper.py
Outdated
super().__init__() | ||
|
||
for obj in [state_interpreter, action_interpreter, reward_fn, aux_info_collector]: | ||
for obj in [reward_fn, aux_info_collector]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the original design, (1) state interpreter, (2) action interpreter, (3) reward function, (4) auxiliary info collector, are born equal and treated equally in the EnvWrapper. This change breaks such philosophy. Extra discussion might be needed for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another reason that motivates the previous design is that I've seen some algorithms that need extra communication between components. For example, reward function might rely on extra states calculated by state interpreter to fasten its calculation. Giving them access to env wrapper makes such "hacking" possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. I reverted this change to let interpreters have env
again. It's just that env
won't be used at the moment.
def reset(self, outer_trade_decision: BaseTradeDecision = None, **kwargs: Any) -> None: | ||
super().reset(outer_trade_decision=outer_trade_decision, **kwargs) | ||
|
||
# In backtest, env.reset() needs to be manually called since there is no outer trainer to call it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we simply create a DummyEnvWrapper
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my personal understanding, we don't need a DummyEnvWrapper
at least for now.
return pd.DatetimeIndex(ret) | ||
|
||
|
||
def fill_missing_data( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hope #1333 can help to remove this function.
* Remove Dropna limitation of `quote_df` of Exchange * Impreove docstring
qlib/rl/interpreter.py
Outdated
@@ -35,12 +31,19 @@ class Interpreter: | |||
states by calling ``self.env.register_state()``, but it's not planned for first iteration. | |||
""" | |||
|
|||
def __init__(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is weird to maintain these states in the Interpreter,
we may have a discussion later about where there is a better design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is more reasonable to get current step from state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. I think it's better to have them in EnvWrapperStatus because it's designed for this purpose.
But it would require adding DummyEnv back.
qlib/rl/order_execution/policy.py
Outdated
@@ -148,7 +148,11 @@ def __init__( | |||
action_space=action_space, | |||
) | |||
if weight_file is not None: | |||
load_weight(self, weight_file) | |||
loaded_weight = torch.load(weight_file, map_location="cpu") | |||
if "vessel" in loaded_weight: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that loading weight makes some assumptions about the loaded data than simply loading the data directly.
It looks like a reusable feature.
Will it be better to create a standalone function for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, but I didn't find a good way to do this. Any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lihuoran
If I understand it correctly, it tries to load a policy state dict from a dumped Trainer.
Do you think it is a good idea to create a static utility function for Trainer with a name like get_policy_state_dict(ckpt_path)
qlib/rl/order_execution/strategy.py
Outdated
# In backtest, env.step() needs to be manually called since there is no outer trainer to call it | ||
if self._backtest: | ||
self._env.step(None) | ||
self._state_interpreter.step() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is weird to help the interpreter maintain its state.
Will it be better to create a cur_step
attribute in the state returned by the simulator?
qlib/rl/utils/env_wrapper.py
Outdated
@@ -196,6 +179,8 @@ def reset(self, **kwargs: Any) -> ObsType: | |||
) | |||
|
|||
self.simulator.env = cast(EnvWrapper, weakref.proxy(self)) | |||
self.state_interpreter.reset() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please refer to other discussions about interpreter.
qlib/rl/utils/env_wrapper.py
Outdated
@@ -230,6 +215,8 @@ def step(self, policy_action: PolicyActType, **kwargs: Any) -> Tuple[ObsType, fl | |||
|
|||
# Use the converted action of update the simulator | |||
self.simulator.step(action) | |||
self.state_interpreter.step() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please refer to other discussions about interpreter.
tests/rl/test_qlib_simulator.py
Outdated
@@ -183,8 +183,7 @@ def test_interpreter() -> None: | |||
order = get_order() | |||
simulator = get_simulator(order) | |||
interpreter_action = CategoricalActionInterpreter(values=NUM_EXECUTION) | |||
interpreter_action.env = CollectDataEnvWrapper() | |||
interpreter_action.env.reset() | |||
interpreter_action.reset() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please refer to other discussions about interpreter.
tests/rl/test_saoe_simple.py
Outdated
interpreter_action.env.reset() | ||
interpreter_action_twap.env.reset() | ||
interpreter_action.reset() | ||
interpreter_action_twap.reset() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please refer to other discussions about interpreter.
Maybe further discussions are required to support |
* Refine several todos * CI issues * Remove Dropna limitation of `quote_df` in Exchange (microsoft#1334) * Remove Dropna limitation of `quote_df` of Exchange * Impreove docstring * Fix type error when expression is specified (microsoft#1335) * Refine fill_missing_data() * Remove several TODO comments * Add back env for interpreters * Change Literal import * Resolve PR comments * Move to SAOEState * Add Trainer.get_policy_state_dict() * Mypy issue Co-authored-by: you-n-g <[email protected]>
* Refine several todos * CI issues * Remove Dropna limitation of `quote_df` in Exchange (microsoft#1334) * Remove Dropna limitation of `quote_df` of Exchange * Impreove docstring * Fix type error when expression is specified (microsoft#1335) * Refine fill_missing_data() * Remove several TODO comments * Add back env for interpreters * Change Literal import * Resolve PR comments * Move to SAOEState * Add Trainer.get_policy_state_dict() * Mypy issue Co-authored-by: you-n-g <[email protected]>
Description
collect_data_loop
: make it more structural & add more annotations. Refine the report generation logic in RL backtest accordingly.Interpreter
withEnv
. Now, each interpreter has its own tick counter, so it does not need to get this information fromEnv
. RemoveCollectDataEnvWrapper
since it is no longer needed.SAOEStateAdapter
and related functions fromqlib/rl/order_execution/state.py
toqlib/rl/order_execution/strategy.py
. (This was supposed to be done in PR 1316 but we somehow missed it).Motivation and Context
How Has This Been Tested?
pytest qlib/tests/test_all_pipeline.py
under upper directory ofqlib
.Screenshots of Test Results (if appropriate):
Types of changes