WIP: Implement CrossQ for continuous actions #473

noahfarr · 2024-07-19T10:50:04Z

Description

Implementing #463

Types of changes

Bug fix
New feature
New algorithm
Documentation

Checklist:

I've read the CONTRIBUTION guide (required).
I have ensured pre-commit run --all-files passes (required).
I have updated the tests accordingly (if applicable).
I have updated the documentation and previewed the changes via mkdocs serve.
- I have explained note-worthy implementation details.
- I have explained the logged metrics.
- I have added links to the original paper and related papers.

If you need to run benchmark experiments for a performance-impacting changes:

I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team.
I have used the benchmark utility to submit the tracked experiments to the openrlbenchmark/cleanrl W&B project, optionally with --capture_video.
I have performed RLops with python -m openrlbenchmark.rlops.
- For new feature or bug fix:
  - I have used the RLops utility to understand the performance impact of the changes and confirmed there is no regression.
- For new algorithm:
  - I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
- I have added the learning curves generated by the python -m openrlbenchmark.rlops utility to the documentation.
- I have added links to the tracked experiments in W&B, generated by python -m openrlbenchmark.rlops ....your_args... --report, to the documentation.

vercel · 2024-07-19T10:50:08Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
cleanrl	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Jul 19, 2024 0:44am

pseudo-rnd-thoughts · 2024-07-19T13:25:01Z

It might be helpful to include within the documentation that for environments with sparse rewards, BatchNorm has been proven to be myopic and not able to learn long time horizon rewards / goals.
Therefore, for MuJoCo, BatchNorm doesn't have a significant negative impact but for Atari and the like, this might.
See https://arxiv.org/pdf/2407.04811, Sections 3.1 and 5.1 + Appendix G

noahfarr added 2 commits July 19, 2024 12:39

Implement CrossQ for continuous actions

330f15e

Fix bug

66c3801

vercel bot deployed to Preview July 19, 2024 10:50 View deployment

Remove extra import

afa7e81

vercel bot deployed to Preview July 19, 2024 11:12 View deployment

Run pre-commit

4146d0a

vercel bot deployed to Preview July 19, 2024 12:44 View deployment

noahfarr marked this pull request as draft July 19, 2024 13:01

noahfarr closed this by deleting the head repository Aug 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Implement CrossQ for continuous actions #473

WIP: Implement CrossQ for continuous actions #473

noahfarr commented Jul 19, 2024 •

edited

Loading

vercel bot commented Jul 19, 2024 •

edited

Loading

pseudo-rnd-thoughts commented Jul 19, 2024

WIP: Implement CrossQ for continuous actions #473

WIP: Implement CrossQ for continuous actions #473

Conversation

noahfarr commented Jul 19, 2024 • edited Loading

Description

Types of changes

Checklist:

vercel bot commented Jul 19, 2024 • edited Loading

pseudo-rnd-thoughts commented Jul 19, 2024

noahfarr commented Jul 19, 2024 •

edited

Loading

vercel bot commented Jul 19, 2024 •

edited

Loading