-
-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
loo_compare for crps and loo_crps #220
Comments
Seems like a good idea, thanks @bnicenboim. @avehtari what do you think?
Maybe just "score"? |
On Sat, Apr 1, 2023, 10:14 PM Jonah Gabry ***@***.***> wrote:
Seems like a good idea, thanks @bnicenboim <https://github.com/bnicenboim>.
@avehtari <https://github.com/avehtari> what do you think?
Maybe "ps"? (predictive score)?
Maybe just "score"?
The problem is that the Brier score is kind of related and a higher score
indicates a worst prediction. I think it's important to indicate that this
is a predictive score.
… —
Reply to this email directly, view it on GitHub
<#220 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABNUQ6TDLLCPIAGLG4RZMVLW7CEAVANCNFSM6AAAAAAWOPVNTY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Good point. Let's see if @avehtari or any other loo developers (or users) have an opinion on the name. |
Can you be more specific? Are you talking about what is printed or what are the names in the data structure storing the needed information. elpd is computed using predictions and data, so it's not a prediction either. We have also When printing in
Yeah, it's annoying that Brier is in that direction, but Brier score can be a predictive score, too. |
Sorry, I was being too sloppy, I mean the following, this is the output when you print a loo object, and the structure of the object: suppressMessages(library(loo))
suppressMessages(library(rstanarm))
data("kidiq")
fit <- stan_glm(kid_score ~ mom_hs + mom_iq, data = kidiq, refresh=0)
(l <- loo(fit))
#>
#> Computed from 4000 by 434 log-likelihood matrix
#>
#> Estimate SE
#> elpd_loo -1876.2 14.2
#> p_loo 4.1 0.4
#> looic 3752.3 28.4
#> ------
#> Monte Carlo SE of elpd_loo is 0.0.
#>
#> All Pareto k estimates are good (k < 0.5).
#> See help('pareto-k-diagnostic') for details.
l$pointwise |> head()
#> elpd_loo mcse_elpd_loo p_loo looic influence_pareto_k
#> [1,] -5.696742 0.0026537646 0.034212237 11.393485 0.10213248
#> [2,] -4.208480 0.0009781805 0.004128595 8.416959 -0.02607779
#> [3,] -4.032512 0.0007344157 0.002412484 8.065024 0.02448749
#> [4,] -3.855000 0.0005129338 0.001242125 7.710000 -0.09151790
#> [5,] -5.290719 0.0019263523 0.016870922 10.581438 0.09749258
#> [6,] -4.027275 0.0012510816 0.006134031 8.054551 0.05941903 Created on 2023-04-04 with reprex v2.0.2 And this is for suppressMessages(library(loo))
suppressMessages(library(rstanarm))
data("kidiq")
fit <- stan_glm(kid_score ~ mom_hs + mom_iq, data = kidiq, refresh=0)
ypred1 <- posterior_predict(fit)
ypred2 <- posterior_predict(fit)
(ps <- crps(ypred1, ypred2, y = fit$y))
#> $estimates
#> Estimate SE
#> -10.2131439 0.3440731
#>
#> $pointwise
#> 1 2 3 4 5 6 7
#> -25.441469 -9.469547 -7.159489 -4.931622 -21.550196 -7.035573 -30.444535
#> 8 9 10 11 12 13 14
#> ... I meant how to organize
I also asked about this here #221, why do they need an extra argument, one could split the predictions array, and use half for the first argument and half for the second one: If you just divide the predictions in two, the estimates are virtually the same: (ps2 <- crps(ypred1[1:2000,], ypred1[2001:4000,], y = fit$y)) |
I think
This would also allow that that matrix could include both columns
I think it would be clearer to have this discussed in a separate issue (and tag @LeeviLindgren). I don't expect there is much difference in the easy case, but if the variability of the importance weights is big then a) the importance weights are different for the two halves (while in the current implementation they are the same), b) the number of draws for the importance weighting is halved, which increases Monte Carlo error. It is possible that in practice a + b are not a problem, but this should be investigated more thoroughly. |
ok, sure, it makes sense
But what if one uses And also this: |
For kfold and held out data, we have https://mc-stan.org/loo/articles/loo2-elpd.html, and could follow that or do complete redesign.
For non-loo, it could still report the size of the yrep and y |
@bnicenboim should we have a video call to discuss this? |
I'm sorry, as I mentioned in the other issue, I won't have time until mid January (hopefully). |
I just discovered
crps
! This is great, I was actually asking about how to compare probabilistic vs non-probabilistic predictions https://discourse.mc-stan.org/t/comparison-of-machine-learning-models-and-bayesian-models/30833 and this is exactly what I needed.I just noticed that
loo_compare
andprint
are not implemented. If it's ok, I volunteer to implement it. I just want to know how would you call the predictions, "elpd" doesn't make much sense. Maybe "ps"? (predictive score)?The text was updated successfully, but these errors were encountered: