[MRG] StableBaselines Agent #148

mmcenta · 2022-03-03T17:40:05Z

I'm finishing work on the StableBaselines wrapper that I was working on before. Unfortunately, SB3 algorithms don't seem to reliably satisfy check_fit_additive, so I removed it from the tests.

Let me know what you think!

mmcenta · 2022-03-03T17:46:12Z

It seems tests are failing because stable_baselines isn't a requirement. Do we set up an optional requirement such as with torch?

TimotheeMathieu · 2022-03-07T12:32:01Z

Yes I think it is ok to add it as optional requirement because I think the people that install torch will also want to install stablebaseline.
My only concern is about the install time when doing the tests but I guess it should not be a huge increase because a lot of stablebaselines's dependencies are already dependencies of rlberry.

mmcenta · 2022-03-11T12:28:27Z

@omardrwch mentioned that maybe we can create a dummy SB3 algorithm class and avoid the import issues. I will look into this.

Edit: Actually, it was a recommendation to use only one SB3 algorithm in the tests - done!

rlberry/utils/tests/test_sb_agent.py

TimotheeMathieu

Thanks, this looks nice !
I have just a small comment on the test and some comment on the doc.
Concerning the doc, could you move the link to the stablebaselines tuto (and also the link to gym tuto) from the index page to the user guide ? I think it would make more sense there. Also could you add the wrapper in the API (file docs/api.rst) ?

TimotheeMathieu · 2022-03-15T14:07:26Z

rlberry/utils/tests/test_sb_agent.py

+    Does not check that the fit is additive, because that doesn't seem to be
+    always the case with StableBaselines3.


this seems to be due to the callback that can stop the fit early (e.g. https://stable-baselines.readthedocs.io/en/master/guide/callbacks.html _on_step method). Would it make sense to redefine the default stablebaselines callback in order to avoid stopping early ? And is this early stopping a problem for Hyperparameter optimization ? (also @omardrwch on this)

Oh, I just added the agent to the API and I've rewritten the documentation page for it so it looks more like a tutorial. Soon, I will move the links as you asked.

About the fit additive thing, it is kind of a lie: I'm pretty sure that SB3 should support additive fitting, it just doesn't work when I test it. I actually spent some time doing the following experiment:

Create two identical agents (same seed too);

Train one for X steps;

Compare their policy logits on a fixed trajectory (policy logits are not sampling anything so we eliminate a source of randomness here).

And what I got was a bit difficult to explain, but I will summarize here:

For our agents, the agents have the same logits when X is small (<10) but for bigger values, it is clear that they are different.

For SB3 agents... There doesn't seem to be a lot of consistency: sometimes they have identical results even though X = 100 and sometimes they have different results when they should be equal. The additive fit constraint is only respected sometimes. I also found out that adding more training steps can change results, for example, agents trained for 100 and 200 steps give different results but agents trained for 500 and 600 steps give the same results...

In the end, I shelved the whole thing and ignored the test! I think that there are some interesting questions in checking if RL agents are almost equal, namely:

How to set an appropriate budget different such that the algorithms can be differentiated? Budget usually refers to the amount of collected experience, but the agent may spend some time collecting experience before updating its weights;

How does the environment affect the testing? I had some problems with small environments such as Chain-v0 where the agent learns to always go right very quickly. As a result, differentiating between two trained agents is difficult with policy samples since the policies are so similar. Additionally, how should we design an environment for such tests?

Should we compare the policies on which states? Currently, we use only initial states, but I also had problems when using environments that had simple optimal policies on the initial states. I ended up using states obtained by running one of the policies one time.

This investigation was part of my looking into the deep RL agents, and I noticed some things that we could improve in the seeding (namely seeding the external seeds on the __init__ method of torch agents to make sure that the random weight initialization is the same). However, even though I spent quite some time aggressively seeding everything, I still cannot get identical training runs (same episode lengths, returns) which may indicate that there are still things to correct.

Some comments on this

When you are doing your tests, do you use one thread, not parallelized ? Parallelization may have some unexpected consequences and this is why I used non-parallelized agents in the tests.

For the stablebaselines algo I think this is linked to the early stopping they perfom, which seems to me a bit ad hoc. I don't know how well we know what is the tradeoff behind this kind of early stopping. It is not clear for me, which is why I think we may want to avoid this early stopping if we are searching for stable/reliable/reproducible results.

How does the environment affect the testing?
I think you are right, we one way would be to test on an environment where there are a lot of possible action so that at first the agent have to basically do pure exploration. Maybe we could design an ad-hoc environment with finiteMDP class to do that. Another way would be to test the agent on a bandit environment (bandit environments may have a lot of possible actions)

Generally you are right that the budget is something not very precise. It is by design in rlberry to have something flexible but it makes testing a bit awkward. One way to solve this would be to specify a set of fit arguments (budget, n_simulations...) for each algorithm in the test, it would be a bit tedious but still doable.

About fixing external seed in __init__ I think there may be some caveat linked to the parallelization and the spawning of new seeds in AgentManager and multimanager. It would be great if all the seeding in rlberry could be handled by rlberry and not manually by using set_external_seed but I am not sure how easy it is.

Just some quick responses:

Unfortunately, the tests are indeed all single-threaded;

I didn't realize that they applied early stopping by default, this might explain it indeed;

I thought about doing something similar to what you suggested. Namely, I wanted to write some tools/environments to make these "almost equal" tests;

I think that it is best to change the budget per test. The current value works well for the basic agents but I am 80% sure it won't work for all Deep RL algorithms.

That's a good point about using global seeds, but unfortunately torch doesn't have Seeder objects so we have to use it for the Torch Agents and the StableBaselinesAgent.

I will take a look into this early stopping issue.

After looking into it, I think the default is no early stopping in stablebaselines (I had to go back several inherited modules in stablebaselines, this is quite the maze) you are right. The problem seems to be seomewhere else.

On the other hand, the learn method has a loop that says
for update in range(1, total_timesteps // self.n_batch + 1): which would not give a total of total_timesteps iterations unless self.n_batch divides total_timesteps. Maybe the problem comes from here ?

mmcenta · 2022-03-15T15:03:09Z

On another note, I added a test to test tensorboard integration but the test seems to be failing when I try to import tensorboard. Should I remove it?

TimotheeMathieu · 2022-03-15T15:26:34Z

I don't really know what we should do. i mean, tensorboard is not huge (5Mb) so I think we could just add it to the requirements and be done with it.
On the other hand, if we want to stay compatible with everyone we may encounter this problem again in the futur and it would not be good to always add requirements like that. One solution would be to have a directory for additional "extra" tests, these tests we could check on a parallel check of azure pipeline (have a pipeline on linux that checks only these tests in order not to have a too long running time for the checks, with its own additional dependencies that would not appear in requirements.txt).
Or we could also not test them at all, but still I think it is good to have these tests so we keep them, the only question is do we check them in azure pipeline or not ?

docs/api.rst

mmcenta · 2022-03-15T17:11:00Z

@TimotheeMathieu On another note, I think PR #144 broke my git pre-commit setup - now flake8 stops me because of the unused imports on the __init__.py files that used to be ignored. Do you know how to fix this?

TimotheeMathieu · 2022-03-15T17:21:42Z

No this is not due to PR #144, precommit already checked the __init__ before it is just that pre-commit only check the files that you modified for this commit so maybe you never did a commit where you modified a __init__ ?
But you are right that we should ignore the inits also in pre-commit. I think adding the --per-file-ignores="__init__.py:F401" works: in line 29 of pre-commit.yaml

       args: ['--select=F401,F405,D410,D411,D412', '--per-file-ignores="__init__.py:F401"']

but I did not test it.

Either way, you can say that you don't want pre-commit to check with git commit -n if you ever have this kind of problem later.

docs/api.rst

TimotheeMathieu · 2022-03-15T17:32:36Z

No this is not due to PR #144, precommit already checked the __init__ before it is just that pre-commit only check the files that you modified for this commit so maybe you never did a commit where you modified a __init__ ?
But you are right that we should ignore the inits also in pre-commit. I think adding the --per-file-ignores="__init__.py:F401" works: in line 29 of pre-commit.yaml

       args: ['--select=F401,F405,D410,D411,D412', '--per-file-ignores="__init__.py:F401"']

but I did not test it.

Either way, you can say that you don't want pre-commit to check with git commit -n if you ever have this kind of problem later.

mmcenta · 2022-03-15T17:32:39Z

I don't really know what we should do. i mean, tensorboard is not huge (5Mb) so I think we could just add it to the requirements and be done with it. On the other hand, if we want to stay compatible with everyone we may encounter this problem again in the futur and it would not be good to always add requirements like that. One solution would be to have a directory for additional "extra" tests, these tests we could check on a parallel check of azure pipeline (have a pipeline on linux that checks only these tests in order not to have a too long running time for the checks, with its own additional dependencies that would not appear in requirements.txt). Or we could also not test them at all, but still I think it is good to have these tests so we keep them, the only question is do we check them in azure pipeline or not ?

@omardrwch What do you think about this? I'm tempted to just add it to the requirements for now.

Also, I'll look into the batch_size problem that @TimotheeMathieu talked about. Maybe we can set batch_size=1 and solve the issues I was having, but I'll have to run some tests again.

mmcenta · 2022-03-16T15:47:29Z

@TimotheeMathieu So I rerun some reproducibility experiments with the SB3 A2C algorithm and our new agent, here are the results:

Difference Experiments

In these experiments, I choose a batch size X and a budget difference Y and analyze the results for every (X, Y) pair. Additionally, you can specify the base budget B. In summary, this experiment:

Creates two SB3 A2C agents with batch size X;
Trains the first one with a budget of B on the "CartPole-v0" environment;
Trains the second one with a budget of B + Y on the "CartPole-v0" environment;
Collects trajectories of states from the first agent (128 steps total);
Compares both agents on the collected trajectory, using the error rate (that is, the number of disagreements divided by the trajectory length).

The figure below shows the results with B = 0:

Note that we would expect a significant error rate for all experiments where Y is not zero. For completeness, here are the results for B = 100 and B = 1000:

As you can see, using a higher base budget only attenuates the errors.

Additive Fit Experiments

This experiment is similar to the previous setting. Instead of varying the budget difference, we vary the total budget T. This experiment:

Creates two SB3 A2C agents with batch size X;
Trains the first one with a budget of T the "CartPole-v0" environment;
Trains the second one with a budget of T // 2, then with a budget of (T - (T // 2)) on the "CartPole-v0" environment;
Collects a trajectories of states from the first agent (128 steps total);
Compares both agents on the collected trajectory, using the error rate (that is, the number of disagreements divided by the trajectory length).

The results are shown below:

Note that we would expect all error rates to be zero since both agents are trained for the same budget. The additive fit seems to always commit some mistakes with T > 0 for all batch sizes. It seems like it is something we can't rely on, but if you have any further suggestions, please let me know.

Edit: I also uploaded my code for these results here.

TimotheeMathieu · 2022-03-17T10:07:40Z

I did some experiments and I think it may be because of reset_num_timesteps. With the following fit function:

    def fit(
        self,
        budget: int,
        tb_log_name: Optional[str] = None,
        reset_num_timesteps: bool = True,
        **kwargs,
    ):
        """Fit the agent.

        Note
        ----
        This method wraps the :code:`learn` method of the algorithm.
        Logging parameters are processered by rlberry in order to use the
        agent.writer.

        Parameters
        ----------
        budget: int
            Number of timesteps to train the agent for.
        tb_log_name: str
            Name of the log to use in tensorboard.
        reset_num_timesteps: bool
            Whether to reset or not the :code: `num_timesteps` attribute
        """
        # if not self.wrapped._custom_logger:
        #     if tb_log_name is None:
        #         tb_log_name = self.wrapped.__class__.__name__
        #     sb_logger = configure_logger(
        #         self._verbose, self._tb_log, tb_log_name, reset_num_timesteps
        #     )
        #     self.set_logger(sb_logger)
        self.wrapped.learn(total_timesteps=budget, reset_num_timesteps=False, **kwargs)

I obtain the following add graph:

and I guess this is OK, with batch size 1 the fit is additive (which is what we want).

mmcenta · 2022-03-18T10:31:28Z

@TimotheeMathieu
I think you're right: I assumed that reset_num_timesteps is used only for logging purposes (as they mention in their docs), but it seems that it is used to possibly reset the environment too.
However, I can't comment out that part of the code because that's what integrates their logging to rlberry. I came up with a solution that seems to solve the issue - I'll push the changes as soon as I'm finished with testing.

mmcenta · 2022-03-22T14:27:34Z

Just an update: I've been able to (partially) fix the additive fit issues for the agent.

I encountered one last problem: it seems that it is difficult to guarantee additive fits when a model is saved and then loaded during training. First, the standard .save() and .load() methods from StableBaselines don't support additive fits by themselves. By using .get_parameters() and .set_parameters() instead, I can get the additive fits to work in isolation, but the method breaks when I implement it in the agent - although it does seem some sort of approximation error since the predicted values are very close.

This is relevant because our AgentManager saves and loads agents all the time, so I can't guarantee additive fits in that case. More specifically, our additive fit test uses AgentManager's, so it still fails :(

Edit: I am also pretty sure that the AgentManager should not work for additive fits with agents that use external libraries (such as PyTorch) because it reseeds the external library at the beginning of each fit. (I created #157 to discuss this)

omardrwch · 2022-04-19T19:39:38Z

@omardrwch What do you think about this? I'm tempted to just add it to the requirements for now.

Sorry, I'm a bit (very) late in this PR. Regarding the tensorboard requirement, I think it's ok to add it for now. I think the SB3 wrapper is a very important feature, so testing it should be worth the extra requirement. Later we could implement @TimotheeMathieu 's suggestion (of creating a directory for additional heavier tests) when needed (e.g. also for a more comprehensive testing of our agents, to make sure that they match some benchmarks).

TimotheeMathieu

Ok for me. We can make separate PRs for the tests and seeder later. Thanks.

mmcenta added 4 commits March 3, 2022 18:14

add SB3 agent utils

9a876f6

add demo of SB3 agent

86f4fd4

Merge branch 'rlberry-py:main' into sb-agent

c2783e7

update documentation

3535731

mmcenta self-assigned this Mar 3, 2022

mmcenta added the enhancement New feature or request label Mar 3, 2022

mmcenta requested review from omardrwch and TimotheeMathieu March 3, 2022 17:41

mmcenta added 4 commits March 11, 2022 15:18

add SB3 to requirements.txt

be2a938

fix bugs in SB3 agent

5ffcfe6

change SB agent tests to use only A2C

928096a

Merge branch 'main' into sb-agent

8455a3c

mmcenta changed the title ~~[WIP] StableBaselines Agent~~ [MRG] StableBaselines Agent Mar 11, 2022

omardrwch requested changes Mar 12, 2022

View reviewed changes

rlberry/utils/tests/test_sb_agent.py Outdated Show resolved Hide resolved

TimotheeMathieu added this to To Do in V 0.3.0 Mar 15, 2022

mmcenta added 2 commits March 15, 2022 14:50

fix saving and loading when using custom loggers

efea167

Merge branch 'rlberry-py:main' into sb-agent

80446ac

TimotheeMathieu reviewed Mar 15, 2022

View reviewed changes

improve documentation

06c9a0b

mmcenta added 2 commits March 15, 2022 16:06

move external library docs to user guide

7d94292

fix bug that loaded SB3 algorithm twice

5481ef7

TimotheeMathieu reviewed Mar 15, 2022

View reviewed changes

docs/api.rst Outdated Show resolved Hide resolved

move from utils to agents

a1e7e0e

TimotheeMathieu reviewed Mar 15, 2022

View reviewed changes

docs/api.rst Outdated Show resolved Hide resolved

move SB agent in API docs

a0d5bca

mmcenta mentioned this pull request Mar 22, 2022

Seeding in the AgentManager and additive fits #157

Closed

TimotheeMathieu added a commit to TimotheeMathieu/rlberry that referenced this pull request Mar 24, 2022

merge PR rlberry-py#148

d45aef5

TimotheeMathieu approved these changes Apr 22, 2022

View reviewed changes

mmcenta merged commit 847c54a into rlberry-py:main Apr 22, 2022

mmcenta deleted the sb-agent branch April 22, 2022 11:29

TimotheeMathieu moved this from To Do to Done in V 0.3.0 Apr 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] StableBaselines Agent #148

[MRG] StableBaselines Agent #148

mmcenta commented Mar 3, 2022

mmcenta commented Mar 3, 2022

TimotheeMathieu commented Mar 7, 2022

mmcenta commented Mar 11, 2022 •

edited

Loading

TimotheeMathieu left a comment

TimotheeMathieu Mar 15, 2022

mmcenta Mar 15, 2022 •

edited

Loading

TimotheeMathieu Mar 15, 2022

mmcenta Mar 15, 2022 •

edited

Loading

TimotheeMathieu Mar 15, 2022 •

edited

Loading

mmcenta commented Mar 15, 2022 •

edited

Loading

TimotheeMathieu commented Mar 15, 2022 •

edited

Loading

mmcenta commented Mar 15, 2022

TimotheeMathieu commented Mar 15, 2022

TimotheeMathieu commented Mar 15, 2022

mmcenta commented Mar 15, 2022 •

edited

Loading

mmcenta commented Mar 16, 2022 •

edited

Loading

TimotheeMathieu commented Mar 17, 2022

mmcenta commented Mar 18, 2022 •

edited

Loading

mmcenta commented Mar 22, 2022 •

edited

Loading

omardrwch commented Apr 19, 2022

TimotheeMathieu left a comment

		Does not check that the fit is additive, because that doesn't seem to be
		always the case with StableBaselines3.

[MRG] StableBaselines Agent #148

[MRG] StableBaselines Agent #148

Conversation

mmcenta commented Mar 3, 2022

mmcenta commented Mar 3, 2022

TimotheeMathieu commented Mar 7, 2022

mmcenta commented Mar 11, 2022 • edited Loading

TimotheeMathieu left a comment

Choose a reason for hiding this comment

TimotheeMathieu Mar 15, 2022

Choose a reason for hiding this comment

mmcenta Mar 15, 2022 • edited Loading

Choose a reason for hiding this comment

TimotheeMathieu Mar 15, 2022

Choose a reason for hiding this comment

mmcenta Mar 15, 2022 • edited Loading

Choose a reason for hiding this comment

TimotheeMathieu Mar 15, 2022 • edited Loading

Choose a reason for hiding this comment

mmcenta commented Mar 15, 2022 • edited Loading

TimotheeMathieu commented Mar 15, 2022 • edited Loading

mmcenta commented Mar 15, 2022

TimotheeMathieu commented Mar 15, 2022

TimotheeMathieu commented Mar 15, 2022

mmcenta commented Mar 15, 2022 • edited Loading

mmcenta commented Mar 16, 2022 • edited Loading

Difference Experiments

Additive Fit Experiments

TimotheeMathieu commented Mar 17, 2022

mmcenta commented Mar 18, 2022 • edited Loading

mmcenta commented Mar 22, 2022 • edited Loading

omardrwch commented Apr 19, 2022

TimotheeMathieu left a comment

Choose a reason for hiding this comment

mmcenta commented Mar 11, 2022 •

edited

Loading

mmcenta Mar 15, 2022 •

edited

Loading

mmcenta Mar 15, 2022 •

edited

Loading

TimotheeMathieu Mar 15, 2022 •

edited

Loading

mmcenta commented Mar 15, 2022 •

edited

Loading

TimotheeMathieu commented Mar 15, 2022 •

edited

Loading

mmcenta commented Mar 15, 2022 •

edited

Loading

mmcenta commented Mar 16, 2022 •

edited

Loading

mmcenta commented Mar 18, 2022 •

edited

Loading

mmcenta commented Mar 22, 2022 •

edited

Loading