Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debugging best practice/tools? #281

Open
asmith26 opened this issue Aug 3, 2023 · 1 comment
Open

Debugging best practice/tools? #281

asmith26 opened this issue Aug 3, 2023 · 1 comment

Comments

@asmith26
Copy link

asmith26 commented Aug 3, 2023

Hi, I've created a new environment, but I'm struggling to determine if the RL agent is learning correctly. It feels like it isn't improving much, thus am wondering if I have implemented the environment correctly.

Just wondering if you have any tips regarding how I might best check everything is implemented correctly? E.g

  1. Is there anything I should be checking on TensorBoard?
  2. Do you have any advice on what I should be normalizing (e.g. action, states,...)?
  3. (Any other advice/tips greatly appreciated...)

Many thanks for any help, and for this amazing lib! :)

@alex-petrenko
Copy link
Owner

Hmm, maybe I should write up a FAQ at some point since this is a very common question.

There's no easy answer, RL is still more like an art rather than science.

  1. check that your agents get some reward under a random policy. If they never reach rewarding states under random policy they won't learn.
  2. check your tensorboard for any signs of exploding gradients. This can be very high KL-divergence, or Adam momentum. These values should be typically pretty low. Disable gradient norm and see if the gradients reach very high values.
  3. see what the policy is doing visually, even if it's not good. This often gives a good idea of what's going on.

Normalizations typically help, especially when reward or observation scales are not tuned correctly. But it can hurt in some cases. Try disabling all normalization.

You might also just not be training enough. What is your framerate and how long are your training sessions? Some environments take hundreds of millions of steps to get the learning going.

It's hard to say more without knowing more about your environment or config. If you can share some details maybe I can help. Sharing your config would help too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants