Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repeated runs yield very different results #16

Open
dreamflasher opened this issue Oct 2, 2019 · 1 comment
Open

Repeated runs yield very different results #16

dreamflasher opened this issue Oct 2, 2019 · 1 comment

Comments

@dreamflasher
Copy link

This is my code:

from temperature_scaling import ModelWithTemperature

model= dill.load(open("model.p", "rb"))
scaled_model = ModelWithTemperature(model)
scaled_model.set_temperature(train_dl)

and every time I run it I get different results:

Before temperature - NLL: 0.174, ECE: 0.014, Brier: 10.509
Optimal temperature: 0.625
After temperature - NLL: 0.189, ECE: 0.031, Brier: 24.748


Before temperature - NLL: 0.177, ECE: 0.023, Brier: 10.140
Optimal temperature: 0.765
After temperature - NLL: 0.184, ECE: 0.030, Brier: 16.430

Before temperature - NLL: 0.155, ECE: 0.018, Brier: 10.403
Optimal temperature: 0.602
After temperature - NLL: 0.158, ECE: 0.027, Brier: 26.237


Before temperature - NLL: 0.195, ECE: 0.023, Brier: 10.846
Optimal temperature: 0.876
After temperature - NLL: 0.199, ECE: 0.024, Brier: 13.740

I added the Brier score, also notice all of them are not getting any better.

@dreamflasher
Copy link
Author

When I set random seeds, the scores stay the same:

np.random.seed(0)
torch.manual_seed(0)

Yet, shouldn't the scores be stable towards that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant