Improve documentation/comments on the random walk example #208

alan-cooney · 2023-01-21T16:19:53Z

Makes this example more readable:

Add some context to the README
Add docstrings to the functions
Add typings & fix all typing issues
Give variables descriptive names
Add thorough comments

jon-tow

Thanks for putting this together @alan-cooney. I had some trouble running things locally. Can you fix the requested changes and provide a wandb report for both examples to ensure everything is working as expected? Thanks!

examples/randomwalks/README.md

examples/randomwalks/ppo_randomwalks.py

examples/randomwalks/randomwalks.py

examples/randomwalks/README.md

examples/randomwalks/randomwalks.py

alan-cooney · 2023-01-22T09:07:03Z

Thanks for the quick review @jon-tow

PPO results - https://wandb.ai/alancooney/trlx/runs/eo1vxg53
ILQL results - https://wandb.ai/alancooney/trlx/runs/onfn4je1

Typings fix

By the way, I had to fix the typings in trlx/trlx.py so that they work with the approach you prefer here. This fix is needed in any case, as the typings were incorrect (the metric function only takes samples - it's just the reward function that also takes prompts and outputs):

trlx/trlx/trainer/accelerate_base_trainer.py

Line 357 in 84a0711

metrics = self.metric_fn(str_samples)

However, I'm happy to move this fix to a different PR if you want to keep the commit history clean (it's a small change, but it doesn't really belong here).

jon-tow

By the way, I had to fix the typings in trlx/trlx.py so that they work with the approach you prefer here. This fix is needed in any case, as the typings were incorrect (the metric function only takes samples - it's just the reward function that also takes prompts and outputs):

trlx/trlx/trainer/accelerate_base_trainer.py

Line 357 in 84a0711

metrics = self.metric_fn(str_samples)

However, I'm happy to move this fix to a different PR if you want to keep the commit history clean (it's a small change, but it doesn't really belong here).

Oh huh; this slipped under the radar. Thanks for the find 🙏 metric_fn should match the reward_fn signature and be called as

metrics = self.metric_fn(
  samples=str_samples,
  prompts=str_prompts,
  outputs=str_outputs,
)

on the line that you've highlighted. It's a small enough change that we can squeeze it into this PR. Let me know if that's alright with you! Once that's done we should be good to merge 👍

examples/randomwalks/ppo_randomwalks.py

trlx/trlx.py

examples/randomwalks/ppo_randomwalks.py

alan-cooney · 2023-01-22T21:26:49Z

Makes sense! All done, and I've checked the runs work as well:

PPO - https://wandb.ai/alancooney/trlx/runs/agtv55vb
ILQL - https://wandb.ai/alancooney/trlx/runs/feb6oky6

maxreciprocate · 2023-01-22T23:47:43Z

Have to say this is a great work! However it's rather peculiar that I cannot reproduce your PPO wandb run from this branch despite my run being identical as from the main. It seems like a dependency difference and if it's not too laborious can you make a run from the main as well on your side just to confirm the suspicion? Thanks!

https://wandb.ai/sorry/trlx/reports/random_walks_document-v-main--VmlldzozMzkyNjE1

alan-cooney · 2023-01-23T10:08:44Z

Sure, this seems plausible. I get the same results on main as on this branch, so I think it's all fine.

Main - https://wandb.ai/alancooney/trlx/runs/ixhfy261
This branch - https://wandb.ai/alancooney/trlx/runs/agtv55vb (from above)

As I understand it, CUDA + different hardware can cause different results with the same random seed. But in terms of environments, I'm using this docker image #196 and then pip install, if you want to reproduce it.

Note: sorry about the force push - committed to the wrong branch by mistake, so I've put this branch back to where you both reviewed it.

jon-tow

I can also reproduce the runs from main:

Thanks a bunch, @alan-cooney!

Improve documentation for the Random walk example

acade3e

alan-cooney force-pushed the random_walks_document branch from 4c79c09 to acade3e Compare January 21, 2023 17:04

Add additional notes on PPO random walks

0a8e2bc

alan-cooney marked this pull request as ready for review January 21, 2023 20:34

alan-cooney changed the title ~~Add additional documentation for the random walk example~~ Improve documentation/comments on the random walk example Jan 21, 2023

Add image for documentation

62d7ce0

jon-tow requested changes Jan 22, 2023

View reviewed changes

Fix requested changes

f8c3bda

alan-cooney force-pushed the random_walks_document branch from 41fe413 to f8c3bda Compare January 22, 2023 08:49

alan-cooney requested a review from jon-tow January 22, 2023 10:04

jon-tow requested changes Jan 22, 2023

View reviewed changes

examples/randomwalks/ppo_randomwalks.py Outdated Show resolved Hide resolved

trlx/trlx.py Outdated Show resolved Hide resolved

trlx/trlx.py Outdated Show resolved Hide resolved

examples/randomwalks/ppo_randomwalks.py Outdated Show resolved Hide resolved

Fix remaining issues

c8a776a

alan-cooney requested a review from jon-tow January 22, 2023 21:28

alan-cooney force-pushed the random_walks_document branch from 3378119 to c8a776a Compare January 23, 2023 09:55

jon-tow approved these changes Jan 31, 2023

View reviewed changes

jon-tow merged commit dcbf7b0 into CarperAI:main Jan 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve documentation/comments on the random walk example #208

Improve documentation/comments on the random walk example #208

alan-cooney commented Jan 21, 2023 •

edited

Loading

jon-tow left a comment

alan-cooney commented Jan 22, 2023 •

edited

Loading

jon-tow left a comment

alan-cooney commented Jan 22, 2023

maxreciprocate commented Jan 22, 2023

alan-cooney commented Jan 23, 2023 •

edited

Loading

jon-tow left a comment

Improve documentation/comments on the random walk example #208

Improve documentation/comments on the random walk example #208

Conversation

alan-cooney commented Jan 21, 2023 • edited Loading

jon-tow left a comment

Choose a reason for hiding this comment

alan-cooney commented Jan 22, 2023 • edited Loading

Typings fix

jon-tow left a comment

Choose a reason for hiding this comment

alan-cooney commented Jan 22, 2023

maxreciprocate commented Jan 22, 2023

alan-cooney commented Jan 23, 2023 • edited Loading

jon-tow left a comment

Choose a reason for hiding this comment

alan-cooney commented Jan 21, 2023 •

edited

Loading

alan-cooney commented Jan 22, 2023 •

edited

Loading

alan-cooney commented Jan 23, 2023 •

edited

Loading