Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How should mice behave when variables are not specified in the model #583

Open
stefvanbuuren opened this issue Sep 13, 2023 · 1 comment · May be fixed by #582
Open

How should mice behave when variables are not specified in the model #583

stefvanbuuren opened this issue Sep 13, 2023 · 1 comment · May be fixed by #582
Assignees

Comments

@stefvanbuuren
Copy link
Member

test-blocks.R contains a specification of the mice setup with two non-standard features.

  • a duplicate bmi is acceptable through blocks specification
  • variable hyp is not specified

The current policy is not very satisfying. Currently, where[, "hyp"] is set to FALSE, so hyp is not imputed. However, it is still a predictor for blocks B1, bmi and age, thus leading to missing data propagation.

Using c2da03c:

library(mice)   # branch support_blocks 
#> 
#> Attaching package: 'mice'
#> The following object is masked from 'package:stats':
#> 
#>     filter
#> The following objects are masked from 'package:base':
#> 
#>     cbind, rbind
imp <- mice(nhanes, blocks = make.blocks(list(c("bmi", "chl"), "bmi", "age")), m = 1, print = FALSE)

head(complete(imp))
#>   age  bmi hyp chl
#> 1   1   NA  NA  NA
#> 2   2 22.7   1 187
#> 3   1 27.2   1 187
#> 4   3   NA  NA  NA
#> 5   1 20.4   1 113
#> 6   3   NA  NA 184
imp$blocks
#> $B1
#> [1] "bmi" "chl"
#> 
#> $bmi
#> [1] "bmi"
#> 
#> $age
#> [1] "age"
#> 
#> attr(,"calltype")
#>        B1       bmi       age 
#> "formula" "formula" "formula"
imp$formulas
#> $B1
#> bmi + chl ~ age + hyp
#> <environment: 0x11e6e1750>
#> 
#> $bmi
#> bmi ~ age + hyp + chl
#> <environment: 0x11e6e1750>
#> 
#> $age
#> age ~ bmi + hyp + chl
#> <environment: 0x11e6e1750>
head(imp$where)
#>     age   bmi   hyp   chl
#> 1 FALSE  TRUE FALSE  TRUE
#> 2 FALSE FALSE FALSE FALSE
#> 3 FALSE  TRUE FALSE FALSE
#> 4 FALSE  TRUE FALSE  TRUE
#> 5 FALSE FALSE FALSE FALSE
#> 6 FALSE  TRUE FALSE FALSE
imp$method
#>    B1   bmi   age 
#> "pmm" "pmm"    ""
imp$predictorMatrix
#>     age bmi hyp chl
#> age   0   0   0   0
#> bmi   1   0   1   1
#> hyp   1   1   0   1
#> chl   1   1   1   0

Created on 2023-09-13 with reprex v2.0.2

A better policy might be inactivating any unmentioned variable j by

  1. set method[j] to "" (we can always do that since j is not mentioned in the model)
  2. set predictorMatrix[, j] to 0 (take j out as predictor)
  3. leave predictorMatrix[j, ] untouched (so we can still which variables it would require to imputed)
  4. leave where[, j] untouched

As a result, j is not imputed and is not a predictor anywhere. The policy might stimulate starting small (with a few variables, and gradually build up). Does this seem a good approach? Any downsides to it?

@stefvanbuuren stefvanbuuren changed the title How should mice behave when variables are not specified through blocks or formulas How should mice behave when variables are not specified in the model Sep 13, 2023
@stefvanbuuren
Copy link
Member Author

After some discussions, I suggest the following NA-PROPAGATION policy:

  • We use NA-PROPAGATION by default (continuing the policy used in mice 3.0). The user sees NA in the imputed data and becomes aware of a potential model specification problem (e.g. not imputing a variable used as a predictor).
  • mice() should offer two easy ways to solve the problem: "autoremove" and "autoimpute". Both options would "magically" make the problem disappear.
  • "autoremove" removes any incomplete predictors from the imputation model,
  • "autoimpute" will impute any incomplete predictors.

Note that these options are not yet implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants