-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update docs (custom policy, type hints) #167
Conversation
Looking good and indeed much cleaner docs! Some comments though:
|
actually, the current example runs but is simpler than #168 for your last two points, i made the same remarks to myself but i would address them in a separate PR to keep that one small ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case LGTM! I added the two remarks to respective issues as TODOs.
Sorry to bother you. I'm constructing a different network architecture with self-attention which is not a sequential network though. While the actor and critic networks are not sharing the parameters. I'm wondering whether I could use advanced custom policy to construct my attention block for the PPO network with the function I really appreciate your great help! |
@pengzhi1998 Hey. We unfortunately do not have time to offer custom tech support for custom scenarios. For your scenario, you should use the fully custom policy (which you found already) to define things from ground-up. That is the most customizable way: modifying feature extractor can be limiting. PS: next time please open a new issue for questions instead of continuing a closed PR :) |
Thank you!! And so sorry for keeping pestering you and for this inconvenience. I'll start a new issue. Have a great day! |
Description
You can see the difference with those links:
current doc: https://stable-baselines3.readthedocs.io/en/master/modules/td3.html#parameters
this PR: https://stable-baselines3.readthedocs.io/en/doc-custom-policy/modules/td3.html#parameters
Motivation and Context
Addresses part of #10 and #166
closes #144
closes #168
Types of changes
Checklist:
make format
(required)make check-codestyle
andmake lint
(required)make pytest
andmake type
both pass. (required)Note: we are using a maximum length of 127 characters per line