Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use dot to include all the predictors in Logistic regression model after imputation #265

Open
tiantiy opened this issue Sep 11, 2020 · 4 comments

Comments

@tiantiy
Copy link

tiantiy commented Sep 11, 2020

Hi there,

I have use MICE to do the imputation. As the dataset we use has many variables, I try to use a dot in the glm function to include all the predictors instead of typing all the names of the predictors. However, an error message occurs.

Here I use the nhanes dataset as an example

require(mice, warn.conflicts = FALSE)
set.seed(123)
nhanes$hyp <- as.factor(nhanes$hyp)

imputed_data <- mice::mice(nhanes, m = 5, method = "pmm", 
                           maxit = 10, seed = 12345, print = FALSE)
imputed_model <- with(imputed_data, 
                      glm(hyp ~ ., family = binomial(link = 'logit')))

Error in terms.formula(formula, data = data) :
'.' in formula and no 'data' argument

Is there another quick way to include all the predictors in the dataset without typing out the names in the analysis model? Many thanks for any assistance.

@stefvanbuuren
Copy link
Member

stefvanbuuren commented Sep 15, 2020

Thanks for bringing this to my attention.

No quick fix. I wrote with.mids() many years ago, hoping that it would work most of the time. It fared well, but here you report a clear limitation.

  • The quickest solution is supply all variable names;
  • Alternatively, you could write your own function that loops mice::complete() and your complete-data analysis.

I would need to rethink the design and see what is possible with modern tools and recent texts, especially Hadley's Advanced book. If there is somebody out there with a good understanding of the R evaluation model, please feel free to drop by. It might well be that the solution is a lot simpler than the current with.mids().

@prockenschaub
Copy link
Contributor

Another quick option for the case of glm would be to mget the variables. with.mids evaluates the glm expression "within" the each complete data.frame, i.e. the columns of the data.frame are accessible in the evaluation environment as if they were variables. If you use data = mget(names(nhanes)) in your glm call, mget collects those variables again and passes them to glm's data argument as a list, allowing it to work with ..

require(mice, warn.conflicts = FALSE)
#> Loading required package: mice
set.seed(123)
nhanes$hyp <- as.factor(nhanes$hyp)

imputed_data <- mice::mice(nhanes, m = 5, method = "pmm", 
                           maxit = 10, seed = 12345, print = FALSE)
imputed_model <- with(imputed_data, 
                      glm(hyp ~ ., family = binomial(link = 'logit'), data = mget(names(nhanes))))
imputed_model$analyses[[1]]
#> 
#> Call:  glm(formula = hyp ~ ., family = binomial(link = "logit"), data = mget(names(nhanes)))
#> 
#> Coefficients:
#> (Intercept)          age          bmi          chl  
#>   -31.45358      5.24968      0.89800     -0.02669  
#> 
#> Degrees of Freedom: 24 Total (i.e. Null);  21 Residual
#> Null Deviance:       21.98 
#> Residual Deviance: 13.06     AIC: 21.06

@stefvanbuuren
Copy link
Member

Commit 4634094 simplifies with.mids() by calling eval_tidy() on a quosure. While this is a compact replacement for multiple lines of old code, it still gives the error '.' in formula and no 'data' argument. Now noted in the documentation.

@stefvanbuuren
Copy link
Member

Because of downstream issues, mice 3.12.2 reverts to the previous version of with.mids() that relies on base::eval().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants