Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docstrings for improved documentation #160

Merged
merged 2 commits into from
Nov 29, 2022

Conversation

axb2035
Copy link
Contributor

@axb2035 axb2035 commented Nov 26, 2022

Description

Updates the toy_text docstrings to have better consitency and more information. As discussed on Discord. No dependcies required.

No related issue.

Type of change

Please delete options that are not relevant.

  • This change requires a documentation update

Screenshots

Please attach before and after screenshots of the change if applicable.

n/a

Checklist:

  • I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

- 3: Move left

## Observation Space
There are 3 x 12 + 1 possible states. The player cannot be at the cliff, nor at

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is outside the scope of this pr, but should we be interested in adding an output with the coordinates of the agent. This could be an additional constructor argument and would change the output input, but would be easier to debug and as a demonstration to new users than an integer representing the state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it would be easier to understand and remove a layer of complexity for new users. It took me a while to figure out how it works. FYI: taxi and frozen lake use the same approach.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I don't think, this can be done in a future PR, possibly in the same PR as the transition probability PR

## Information

`step()` and `reset()` return a dict with the following keys:
- "p" - transition proability for the state.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the transition probability is always 1.0, what information does the transition probability actually help with?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, not a lot. It's there for completeness. Taxi, frozen lake and cliff walking have proability returned in their information, though only frozen lake actually has it implemented (see next comment regarding taxi).

For cliff walking, we could implement transitional probabilities similar to frozen lake / taxi. If we are not then suggest we change the info returned to {}. Though that may break some people's code...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is more interesting to add the transition probability, we can do this in a future PR.
Could you make an issue outlining the problem and proposed solution.
For the taxi environment, I would only apply randomness on the movement actions and not pick up and drop off

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue #161 created. It doesn't include the suggestion to output coordinates of the agent for Cliff Walking, Frozen Lake and Taxi to keep the changes discrete. Will add the output co-ordinate enhancement proposal at a later date.

As the steps are deterministic, "p" represents the probability of the transition which is always 1.0
As taxi is not stochastic, the transition probability is always 1.0. Implementing
a transitional probability in line with the Dietterich paper ('The fickle taxi task')
is a TODO.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this TODO be here. Do we need someone to implement this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've put my hand up to implement transitional probablities in taxi it once I finish the doc review. Replaces the "bugged will be fixed soon" comment :)

Sphinx tracks anchors globally. Each has to have a unique name to avoid linking to wrong anchor.
@pseudo-rnd-thoughts pseudo-rnd-thoughts merged commit ae75ad2 into Farama-Foundation:main Nov 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants