Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve documentation/comments on the random walk example #208

Merged
merged 5 commits into from
Jan 31, 2023

Conversation

alan-cooney
Copy link
Contributor

@alan-cooney alan-cooney commented Jan 21, 2023

Makes this example more readable:

  • Add some context to the README
  • Add docstrings to the functions
  • Add typings & fix all typing issues
  • Give variables descriptive names
  • Add thorough comments

@alan-cooney alan-cooney marked this pull request as ready for review January 21, 2023 20:34
@alan-cooney alan-cooney changed the title Add additional documentation for the random walk example Improve documentation/comments on the random walk example Jan 21, 2023
Copy link
Collaborator

@jon-tow jon-tow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together @alan-cooney. I had some trouble running things locally. Can you fix the requested changes and provide a wandb report for both examples to ensure everything is working as expected? Thanks!

examples/randomwalks/README.md Outdated Show resolved Hide resolved
examples/randomwalks/ppo_randomwalks.py Outdated Show resolved Hide resolved
examples/randomwalks/randomwalks.py Outdated Show resolved Hide resolved
examples/randomwalks/README.md Outdated Show resolved Hide resolved
examples/randomwalks/randomwalks.py Outdated Show resolved Hide resolved
examples/randomwalks/randomwalks.py Outdated Show resolved Hide resolved
@alan-cooney
Copy link
Contributor Author

alan-cooney commented Jan 22, 2023

Thanks for the quick review @jon-tow

PPO results - https://wandb.ai/alancooney/trlx/runs/eo1vxg53
ILQL results - https://wandb.ai/alancooney/trlx/runs/onfn4je1

Typings fix

By the way, I had to fix the typings in trlx/trlx.py so that they work with the approach you prefer here. This fix is needed in any case, as the typings were incorrect (the metric function only takes samples - it's just the reward function that also takes prompts and outputs):

metrics = self.metric_fn(str_samples)

However, I'm happy to move this fix to a different PR if you want to keep the commit history clean (it's a small change, but it doesn't really belong here).

Copy link
Collaborator

@jon-tow jon-tow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I had to fix the typings in trlx/trlx.py so that they work with the approach you prefer here. This fix is needed in any case, as the typings were incorrect (the metric function only takes samples - it's just the reward function that also takes prompts and outputs):

metrics = self.metric_fn(str_samples)

However, I'm happy to move this fix to a different PR if you want to keep the commit history clean (it's a small change, but it doesn't really belong here).

Oh huh; this slipped under the radar. Thanks for the find 🙏 metric_fn should match the reward_fn signature and be called as

metrics = self.metric_fn(
  samples=str_samples,
  prompts=str_prompts,
  outputs=str_outputs,
)

on the line that you've highlighted. It's a small enough change that we can squeeze it into this PR. Let me know if that's alright with you! Once that's done we should be good to merge 👍

examples/randomwalks/ppo_randomwalks.py Outdated Show resolved Hide resolved
trlx/trlx.py Outdated Show resolved Hide resolved
trlx/trlx.py Outdated Show resolved Hide resolved
examples/randomwalks/ppo_randomwalks.py Outdated Show resolved Hide resolved
@alan-cooney
Copy link
Contributor Author

Makes sense! All done, and I've checked the runs work as well:

PPO - https://wandb.ai/alancooney/trlx/runs/agtv55vb
ILQL - https://wandb.ai/alancooney/trlx/runs/feb6oky6

@maxreciprocate
Copy link
Collaborator

Have to say this is a great work! However it's rather peculiar that I cannot reproduce your PPO wandb run from this branch despite my run being identical as from the main. It seems like a dependency difference and if it's not too laborious can you make a run from the main as well on your side just to confirm the suspicion? Thanks!

https://wandb.ai/sorry/trlx/reports/random_walks_document-v-main--VmlldzozMzkyNjE1

@alan-cooney
Copy link
Contributor Author

alan-cooney commented Jan 23, 2023

Sure, this seems plausible. I get the same results on main as on this branch, so I think it's all fine.

Main - https://wandb.ai/alancooney/trlx/runs/ixhfy261
This branch - https://wandb.ai/alancooney/trlx/runs/agtv55vb (from above)

As I understand it, CUDA + different hardware can cause different results with the same random seed. But in terms of environments, I'm using this docker image #196 and then pip install, if you want to reproduce it.

Note: sorry about the force push - committed to the wrong branch by mistake, so I've put this branch back to where you both reviewed it.

Copy link
Collaborator

@jon-tow jon-tow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can also reproduce the runs from main:

Thanks a bunch, @alan-cooney!

@jon-tow jon-tow merged commit dcbf7b0 into CarperAI:main Jan 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants