Update docs (custom policy, type hints) #167

araffin · 2020-09-28T11:29:01Z

Description

Add custom feature extractor example
Add full custom policy example for on-policy algorithms
Re-enable autodoc type hint extension (looks much nicer with the latest version) and fix some issues due to that

You can see the difference with those links:
current doc: https://stable-baselines3.readthedocs.io/en/master/modules/td3.html#parameters
this PR: https://stable-baselines3.readthedocs.io/en/doc-custom-policy/modules/td3.html#parameters

Motivation and Context

I have raised an issue to propose this change (required for new features and bug fixes)

Addresses part of #10 and #166
closes #144
closes #168

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)

Checklist:

I've read the CONTRIBUTION guide (required)
I have updated the changelog accordingly (required).
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.
I have reformatted the code using make format (required)
I have checked the codestyle using make check-codestyle and make lint (required)
I have ensured make pytest and make type both pass. (required)

Note: we are using a maximum length of 127 characters per line

Miffyli · 2020-09-28T22:40:47Z

Looking good and indeed much cleaner docs! Some comments though:

Does the example run as expected (wondering because of [bug] Unexpected keyword argument "use_sde" in custom policy #168)?
With hints automatically included from type hinting, we should clean up docstrings from type hints. I think it should be included with this update or Missing Documentation #166.
Type hints in docs would be cleaner if parentheses were also formatted with code-like style, but I figure this goes to sphinx side where modifications can be difficult.

araffin · 2020-09-29T07:59:17Z

Does the example run as expected (wondering because of #168)?

actually, the current example runs but is simpler than #168
that's why i do not close #166
i will probably add a full advance usage today then.

for your last two points, i made the same remarks to myself but i would address them in a separate PR to keep that one small ;)

Miffyli

In that case LGTM! I added the two remarks to respective issues as TODOs.

pengzhi1998 · 2022-07-20T08:19:41Z

Sorry to bother you.

I'm constructing a different network architecture with self-attention which is not a sequential network though. While the actor and critic networks are not sharing the parameters.

I'm wondering whether I could use advanced custom policy to construct my attention block for the PPO network with the function _build_mlp_extractor as shown in the example? Or, for this purpose, I should use a custom feature extractor?

I really appreciate your great help!

Miffyli · 2022-07-20T13:41:28Z

@pengzhi1998 Hey. We unfortunately do not have time to offer custom tech support for custom scenarios. For your scenario, you should use the fully custom policy (which you found already) to define things from ground-up. That is the most customizable way: modifying feature extractor can be limiting.

PS: next time please open a new issue for questions instead of continuing a closed PR :)

pengzhi1998 · 2022-07-20T14:06:33Z

Thank you!! And so sorry for keeping pestering you and for this inconvenience. I'll start a new issue.

Have a great day!

araffin added 3 commits September 28, 2020 12:27

Change import

30d37d6

Update custom policy doc

e84f76c

Re-enable sphinx_autodoc_typehints

97e150b

araffin requested review from AdamGleave, Miffyli, ernestum and hill-a September 28, 2020 11:30

araffin added 4 commits September 28, 2020 13:31

Update docker image

a4b0522

Attempt to fix read the doc build error

0b033ed

Add sphinx_autodoc_typehints to read the doc env

f7542ba

Fix pip version

47d1bd5

araffin mentioned this pull request Sep 29, 2020

[bug] Unexpected keyword argument "use_sde" in custom policy #168

Closed

araffin added 2 commits September 29, 2020 17:27

Add full custom policy example

986bb67

Fix

177c96b

araffin mentioned this pull request Sep 29, 2020

Custom Policy Example (Question) #144

Closed

This was referenced Sep 29, 2020

Missing Documentation #166

Closed

Custom parser for type hints #10

Closed

Miffyli approved these changes Sep 29, 2020

View reviewed changes

Miffyli merged commit 2c924f5 into master Sep 29, 2020

Miffyli deleted the doc/custom-policy branch September 29, 2020 17:41

araffin mentioned this pull request Sep 30, 2020

Cleanup docstring types #169

Merged

15 tasks

pengzhi1998 mentioned this pull request Nov 6, 2022

[Question] Changing observation space during training #1157

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update docs (custom policy, type hints) #167

Update docs (custom policy, type hints) #167

araffin commented Sep 28, 2020 •

edited

Loading

Miffyli commented Sep 28, 2020

araffin commented Sep 29, 2020 •

edited

Loading

Miffyli left a comment

pengzhi1998 commented Jul 20, 2022 •

edited

Loading

Miffyli commented Jul 20, 2022

pengzhi1998 commented Jul 20, 2022

Update docs (custom policy, type hints) #167

Update docs (custom policy, type hints) #167

Conversation

araffin commented Sep 28, 2020 • edited Loading

Description

Motivation and Context

Types of changes

Checklist:

Miffyli commented Sep 28, 2020

araffin commented Sep 29, 2020 • edited Loading

Miffyli left a comment

Choose a reason for hiding this comment

pengzhi1998 commented Jul 20, 2022 • edited Loading

Miffyli commented Jul 20, 2022

pengzhi1998 commented Jul 20, 2022

araffin commented Sep 28, 2020 •

edited

Loading

araffin commented Sep 29, 2020 •

edited

Loading

pengzhi1998 commented Jul 20, 2022 •

edited

Loading