-
-
Notifications
You must be signed in to change notification settings - Fork 952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update docstrings for improved documentation #160
Conversation
- 3: Move left | ||
|
||
## Observation Space | ||
There are 3 x 12 + 1 possible states. The player cannot be at the cliff, nor at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is outside the scope of this pr, but should we be interested in adding an output with the coordinates of the agent. This could be an additional constructor argument and would change the output input, but would be easier to debug and as a demonstration to new users than an integer representing the state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it would be easier to understand and remove a layer of complexity for new users. It took me a while to figure out how it works. FYI: taxi and frozen lake use the same approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I don't think, this can be done in a future PR, possibly in the same PR as the transition probability PR
## Information | ||
|
||
`step()` and `reset()` return a dict with the following keys: | ||
- "p" - transition proability for the state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the transition probability is always 1.0, what information does the transition probability actually help with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, not a lot. It's there for completeness. Taxi, frozen lake and cliff walking have proability returned in their information, though only frozen lake actually has it implemented (see next comment regarding taxi).
For cliff walking, we could implement transitional probabilities similar to frozen lake / taxi. If we are not then suggest we change the info returned to {}
. Though that may break some people's code...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is more interesting to add the transition probability, we can do this in a future PR.
Could you make an issue outlining the problem and proposed solution.
For the taxi environment, I would only apply randomness on the movement actions and not pick up and drop off
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue #161 created. It doesn't include the suggestion to output coordinates of the agent for Cliff Walking, Frozen Lake and Taxi to keep the changes discrete. Will add the output co-ordinate enhancement proposal at a later date.
As the steps are deterministic, "p" represents the probability of the transition which is always 1.0 | ||
As taxi is not stochastic, the transition probability is always 1.0. Implementing | ||
a transitional probability in line with the Dietterich paper ('The fickle taxi task') | ||
is a TODO. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this TODO be here. Do we need someone to implement this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've put my hand up to implement transitional probablities in taxi it once I finish the doc review. Replaces the "bugged will be fixed soon" comment :)
Sphinx tracks anchors globally. Each has to have a unique name to avoid linking to wrong anchor.
Description
Updates the toy_text docstrings to have better consitency and more information. As discussed on Discord. No dependcies required.
No related issue.
Type of change
Please delete options that are not relevant.
Screenshots
Please attach before and after screenshots of the change if applicable.
n/a
Checklist:
pre-commit
checks withpre-commit run --all-files
(seeCONTRIBUTING.md
instructions to set it up)