Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incomplete info in documentation regarding the combinations for <group>=NA and re_formula=NULL #1652

Closed
mattansb opened this issue May 12, 2024 · 5 comments

Comments

@mattansb
Copy link

Currently, the cods for prepare_predictions() read:

  • newdata
    [...] NA values within factors are interpreted as if all dummy variables of this factor are zero.
  • re_formula
    [...] If NULL (default), include all group-level effects; if NA, include no group-level effects.

The newdata argument seems to suggest that setting newdata = data.frame(..., group = NA) should have the same effect as re_formula = NA since in both cases the group-specific coefficients are set to 0.

But this is not the case.

Instead, it seem that

prepare_predictions(
  newdata = data.frame(..., group = NA), 
  re_formula = NULL, # default
  allow_new_levels = FALSE # default
)

is closer to

prepare_predictions(
  newdata = data.frame(..., group = "<NEW>"), 
  re_formula = NULL, # default
  allow_new_levels = TRUE
)

(even though newlevels throw an error when allow_new_levels = FALSE).

It is not clear which of sample_new_levels = c("uncertainty", "gaussian") is used in this case.

@paul-buerkner
Copy link
Owner

newdata = data.frame(..., group = NA) just defines a new grouping level, which does not affect any dummy variables, since random effects don't have dummy variables. Such variables only apply for fixed effects. How can we make this clearer?

@mattansb
Copy link
Author

I was expecting newdata = data.frame(..., group = NA) to be the same as re_formula = NA be cause I interpreted "NA values within factors are interpreted as if all dummy variables of this factor are zero." to mean that in a mixed model

$$ y = bX + uZ + e $$

Then all $Z$ are set to 0, similar to how if group was a fixed effect all $X$ would be set to 0.

But if newdata = data.frame(..., group = NA) is just another "new" level, than it should also give an error if not setting allow_new_levels:

library(brms)

fit <- brm(count ~ 1 + (1|patient),
           data = epilepsy, family = poisson())


posterior_epred(fit,
  newdata = data.frame(patient = "<NEW>")
)
#> Error: Levels '<NEW>' of grouping factor 'patient' cannot be found in the 
#> fitted model. Consider setting argument 'allow_new_levels' to TRUE.

# Does not throw an error...
posterior_epred(fit,
  newdata = data.frame(patient = NA)
)
#>           [,1]
#> [1,]  1.772992
#> [2,]  4.682992
#> [3,] 11.606553
#> [4,]  2.182194
#> [5,]  1.660112
#> [6,]  2.234523
#> .....

If this is the intended behavior, it should also require setting allow_new_levels = TRUE, and maybe the docs should read:

NA values within fixed factors are interpreted as if all dummy variables of this factor are zero. NA values within random factors are treated as a new level.

@paul-buerkner
Copy link
Owner

paul-buerkner commented May 15, 2024 via email

@paul-buerkner paul-buerkner added this to the brms 2.22.0 milestone May 21, 2024
paul-buerkner added a commit that referenced this issue Sep 12, 2024
@paul-buerkner
Copy link
Owner

This should now be fixed :-)

@mattansb
Copy link
Author

mattansb commented Sep 13, 2024

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants