Repeated runs yield very different results #16

dreamflasher · 2019-10-02T08:22:21Z

This is my code:

from temperature_scaling import ModelWithTemperature

model= dill.load(open("model.p", "rb"))
scaled_model = ModelWithTemperature(model)
scaled_model.set_temperature(train_dl)

and every time I run it I get different results:

Before temperature - NLL: 0.174, ECE: 0.014, Brier: 10.509
Optimal temperature: 0.625
After temperature - NLL: 0.189, ECE: 0.031, Brier: 24.748


Before temperature - NLL: 0.177, ECE: 0.023, Brier: 10.140
Optimal temperature: 0.765
After temperature - NLL: 0.184, ECE: 0.030, Brier: 16.430

Before temperature - NLL: 0.155, ECE: 0.018, Brier: 10.403
Optimal temperature: 0.602
After temperature - NLL: 0.158, ECE: 0.027, Brier: 26.237


Before temperature - NLL: 0.195, ECE: 0.023, Brier: 10.846
Optimal temperature: 0.876
After temperature - NLL: 0.199, ECE: 0.024, Brier: 13.740

I added the Brier score, also notice all of them are not getting any better.

The text was updated successfully, but these errors were encountered:

dreamflasher · 2019-10-02T08:36:11Z

When I set random seeds, the scores stay the same:

np.random.seed(0)
torch.manual_seed(0)

Yet, shouldn't the scores be stable towards that?

dreamflasher mentioned this issue Oct 2, 2019

Questions about code #13

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repeated runs yield very different results #16

Repeated runs yield very different results #16

dreamflasher commented Oct 2, 2019

dreamflasher commented Oct 2, 2019

Repeated runs yield very different results #16

Repeated runs yield very different results #16

Comments

dreamflasher commented Oct 2, 2019

dreamflasher commented Oct 2, 2019