-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model "remembers" instead of learning #260
Comments
You're encountering a general machine learning problem called "overfitting". It is generally a challenge to make sure a model generalizes beyond training distribution, and it is not specific to RL or Sample Factory. Some things to look at:
|
Thanks again for your reply @alex-petrenko ! |
I tried the following but none worked. I'd like to try dropout and noticed it's possible to apply in pytorch but not sure how to do it in the SF2 code (maybe add an optional parameter)? update: also tried editing sample-factory/tests/test_precheck.py with lines 15, 18 . adding noise to observations, up to +/-5% |
Hi @alex-petrenko , would it be possible to reply to the latest request from 2 days ago, above? |
I think your best option is to implement a custom model (encoder only should be sufficient, but you can override the entire actor-critic module). See the documentation here: https://www.samplefactory.dev/03-customization/custom-models/ Just add dropout as a layer and fingers crossed it should work. You should be careful about |
Hmmm I guess your confusion might be from the fact that Dropout can't be just added as a model layer, you have to actually call it explicitly in forward() If I were you I would simply modify the code of forward() method of the actor_critic class to call dropout when needed. Sorry, I don't think I can properly help you with the problem without knowing context and details of your problem. Overfitting is one of the hardest problems in all of ML and there's no single magical recipe for fixing it. |
Hi @alex-petrenko , sorry, I'm not an expert at this. I'm using a customized cartpole-like gym env. 1/30 Update: Also updated sample_factory/model/encoder.py lines 216, 221 Also, would it make sense to add dropout as a switch option? |
First thing I would try would be to add dropout after each layer in the encoder.
Convolutional encoder probably has nothing to do with your task if your observations are just vectors of numbers. Convolutional encoder is for the images. |
I added it in the model_utils.py file, line 52. So the layers are: RecursiveScriptModule( But, alas, that's still not solving overfitting... |
Dropout is one way to combat overfitting but it is not a panacea. I'm sorry I can't help figure out your exact issue, as I said previously, overfitting is a general machine learning phenomenon and most likely your problem has nothing to do with Sample Factory, but rather with the overall problem formulation and approach. |
Hi @alex-petrenko , I understand. I appreciate the guidance and advice! |
@jarlva not sure if this is realistic right now. I'm starting a full-time job very soon which will keep me busy for a foreseeable future. You said you're able to fit to your training data, right? If I could get some ideas what's your environment and what exactly the difference between your training and test data is, I could be more helpful. Maybe we can set up a call in ~2 weeks. Feel free to reach out on Discord DM or by email to discuss further. |
Hey, after training (~200M) showing good reward, Enjoy shows bad reward numbers on unseen data. When including the training data in Enjoy the reward matches training. So, it seems the model "remembers" the data, as opposed to learning.
What's the best way to deal with that (other than adding more data and introducing random noise)? Are there settings to try?
Training a gym-like env with the following:
The text was updated successfully, but these errors were encountered: