Order of variables in `by` matters? #1373

strengejacke · 2025-01-27T21:40:57Z

I would expect that both function calls return the same output (the second one), and that the order of variables in by does not matter, or am I missing something?

set.seed(123)
n <- 200
d <- data.frame(
  score = rnorm(n),
  grp = as.factor(sample(c("treatment", "control"), n, TRUE)),
  time = as.factor(sample(1:3, n, TRUE))
)

model2 <- lm(score ~ grp * time, data = d)
grid <- insight::get_datagrid(model2, c("time", "grp"), factors = "all")

marginaleffects::avg_predictions(
  model2,
  newdata = grid,
  by = c("time", "grp"),
  hypothesis = ~pairwise | time
)
#> 
#> [1] time       Hypothesis Estimate   Std. Error z          Pr(>|z|)   S         
#> [8] 2.5 %      97.5 %    
#> <0 rows> (or 0-length row.names)
#> 
#> Type:  response


marginaleffects::avg_predictions(
  model2,
  newdata = grid,
  by = c("grp", "time"),
  hypothesis = ~pairwise | time
)
#> 
#>  time                  Hypothesis Estimate Std. Error      z Pr(>|z|)   S
#>     1 (treatment 1) - (control 1)  -0.4183      0.233 -1.797   0.0724 3.8
#>     2 (treatment 2) - (control 2)  -0.0554      0.238 -0.233   0.8161 0.3
#>     3 (treatment 3) - (control 3)  -0.0503      0.222 -0.226   0.8212 0.3
#>   2.5 % 97.5 %
#>  -0.875  0.038
#>  -0.522  0.411
#>  -0.486  0.386
#> 
#> Type:  response

^{Created on 2025-01-27 with reprex v2.1.1}

The text was updated successfully, but these errors were encountered:

strengejacke · 2025-01-28T10:46:45Z

If we switch the order of columns in the data grid, results are exactly opposite:

set.seed(123)
n <- 200
d <- data.frame(
  score = rnorm(n),
  grp = as.factor(sample(c("treatment", "control"), n, TRUE)),
  time = as.factor(sample(1:3, n, TRUE))
)

model2 <- lm(score ~ grp * time, data = d)

grid <- insight::get_datagrid(model2, c("grp", "time"), factors = "all")
marginaleffects::avg_predictions(
  model2,
  newdata = grid,
  by = c("time", "grp"),
  hypothesis = ~pairwise | time
)
#> 
#>  time                  Hypothesis Estimate Std. Error      z Pr(>|z|)   S
#>     1 (1 treatment) - (1 control)  -0.4183      0.233 -1.797   0.0724 3.8
#>     2 (2 treatment) - (2 control)  -0.0554      0.238 -0.233   0.8161 0.3
#>     3 (3 treatment) - (3 control)  -0.0503      0.222 -0.226   0.8212 0.3
#>   2.5 % 97.5 %
#>  -0.875  0.038
#>  -0.522  0.411
#>  -0.486  0.386
#> 
#> Type:  response

marginaleffects::avg_predictions(
  model2,
  newdata = grid,
  by = c("grp", "time"),
  hypothesis = ~pairwise | time
)
#> 
#> [1] time       Hypothesis Estimate   Std. Error z          Pr(>|z|)   S         
#> [8] 2.5 %      97.5 %    
#> <0 rows> (or 0-length row.names)
#> 
#> Type:  response

^{Created on 2025-01-28 with reprex v2.1.1}

vincentarelbundock · 2025-01-28T23:10:45Z

Yes, this is expected.

pairwise takes the outer product of estimates as they arrive, and returns the upper triangular elements. Similarly, reference takes the difference between each estimate and the first estimate. In both cases, the order matters.

strengejacke · 2025-01-29T05:45:20Z

I can understand that the sign will differ, but I wouldn't expect that one approach returns no results at all?

strengejacke · 2025-01-29T05:46:56Z

Or that the column order in the data grid is responsible whether you get a result or not?

strengejacke · 2025-01-29T05:58:34Z

To be clear: the rational would be that you get only a result if the variables in by have the opposite order as columns in the data grid?

strengejacke · 2025-01-29T07:01:49Z

Another example: The same function call with literally the same data grid produces no output in one situation, and output in the other situation. I doubt this is expected, else it should be very clearly documented how the column order in the data grid affects computation:

library(marginaleffects)

# data
dat <- get_dataset("thornton")
dat$incentive <- as.factor(dat$incentive)
dat$hiv2004 <- as.factor(dat$hiv2004)

# model
mod <- glm(
    outcome ~ incentive * agecat,
    data = dat,
    family = binomial
)

# data grid
grid <- marginaleffects::datagrid(model = mod, by = c("incentive", "agecat"))

# column order
head(grid)
#>   incentive   agecat
#> 1         0      <18
#> 2         1      <18
#> 3         0 18 to 35
#> 4         1 18 to 35
#> 5         0      >35
#> 6         1      >35

# no results / output!
marginaleffects::avg_predictions(
    mod,
    by = c("incentive", "agecat"),
    newdata = grid,
    hypothesis = ~ pairwise | agecat
)
#> 
#> [1] agecat     Hypothesis Estimate   Std. Error z          Pr(>|z|)   S         
#> [8] 2.5 %      97.5 %    
#> <0 rows> (or 0-length row.names)
#> 
#> Type:  response

# change order of data grid columns
grid <- marginaleffects::datagrid(model = mod, by = c("agecat", "incentive"))

# column order, essentially same datagrid
head(grid)
#>     agecat incentive
#> 1      <18         0
#> 2 18 to 35         0
#> 3      >35         0
#> 4      <18         1
#> 5 18 to 35         1
#> 6      >35         1

# results / outout!
marginaleffects::avg_predictions(
    mod,
    by = c("incentive", "agecat"),
    newdata = grid,
    hypothesis = ~ pairwise | agecat
)
#> 
#>    agecat                  Hypothesis Estimate Std. Error     z Pr(>|z|)     S
#>  <18      (1 <18) - (0 <18)              0.474     0.0609  7.79   <0.001  47.1
#>  18 to 35 (1 18 to 35) - (0 18 to 35)    0.458     0.0293 15.65   <0.001 181.0
#>  >35      (1 >35) - (0 >35)              0.432     0.0341 12.66   <0.001 119.6
#>  2.5 % 97.5 %
#>  0.355  0.593
#>  0.401  0.515
#>  0.365  0.499
#> 
#> Type:  response

^{Created on 2025-01-29 with reprex v2.1.1}

vincentarelbundock · 2025-01-31T01:53:32Z

Can you try that again with latest main?

strengejacke mentioned this issue Jan 28, 2025

Prepare CRAN release easystats/modelbased#369

Merged

7 tasks

vincentarelbundock closed this as completed Jan 28, 2025

strengejacke mentioned this issue Jan 29, 2025

Row order in data grid has an impact on whether avg_prediction() returns an output or not #1374

Closed

vincentarelbundock reopened this Jan 30, 2025

vincentarelbundock added a commit that referenced this issue Jan 31, 2025

issue #1373

d281782

strengejacke closed this as completed Jan 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Order of variables in `by` matters? #1373

Order of variables in `by` matters? #1373

strengejacke commented Jan 27, 2025

strengejacke commented Jan 28, 2025

vincentarelbundock commented Jan 28, 2025

strengejacke commented Jan 29, 2025

strengejacke commented Jan 29, 2025

strengejacke commented Jan 29, 2025

strengejacke commented Jan 29, 2025

vincentarelbundock commented Jan 31, 2025

Order of variables in by matters? #1373

Order of variables in by matters? #1373

Comments

strengejacke commented Jan 27, 2025

strengejacke commented Jan 28, 2025

vincentarelbundock commented Jan 28, 2025

strengejacke commented Jan 29, 2025

strengejacke commented Jan 29, 2025

strengejacke commented Jan 29, 2025

strengejacke commented Jan 29, 2025

vincentarelbundock commented Jan 31, 2025

Order of variables in `by` matters? #1373

Order of variables in `by` matters? #1373