-
Notifications
You must be signed in to change notification settings - Fork 63
Description
Why do standard errors slightly differ between PyFixest
and fixest
? The answer is "different small sample correction defaults".
In Python, we get
import pandas as pd
import numpy as np
from pyfixest.estimation import feols
from pyfixest.utils import get_data
data = get_data()
fml = "Y ~ X1 | f1"
fit = feols(fml, data = data)
fit.tidy()
# Estimate Std. Error t value Pr(>|t|) 2.5 % 97.5 %
# Coefficient
# X1 -0.949441 0.068857 -13.788641 2.886580e-14 -1.090269 -0.808613
whereas in R, we have
library(fixest)
library(reticulate)
data = py$data
fml = as.formula(py$fml)
fit = feols(fml, data = data)
summary(fit)
# OLS estimation, Dep. Var.: Y
# Observations: 997
# Fixed-effects: f1: 30
# Standard-errors: Clustered (f1)
# Estimate Std. Error t value Pr(>|t|)
# X1 -0.949441 0.068891 -13.7817 2.9236e-14 ***
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# RMSE: 1.7297 Adj. R2: 0.419595
# Within R2: 0.161418
fixest::feols()
produces a standard error of 0.068891
, but PyFixest.feols()
reports a standard error of 0.068857
.
To reproduce PyFixest
results in fixest, you need to use fixest::ssc(fixef.K = "none")
instead of its default "nested".
feols(fml, data = data, ssc = ssc(fixef.K = "none")) |> summary()
# OLS estimation, Dep. Var.: Y
# Observations: 997
# Fixed-effects: f1: 30
# Standard-errors: Clustered (f1)
# Estimate Std. Error t value Pr(>|t|)
# X1 -0.949441 0.068857 -13.7886 2.8867e-14 ***
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# RMSE: 1.7297 Adj. R2: 0.419595
# Within R2: 0.161418
Now standard errors, t statistics, and p values match exactly between the two libraries.
See the fixest vignette on standard errors for details.
PyFixest
implements all options that fixest::ssc()
offers, with the exception of fixef.K = "nested"
. I will add this feature in the future. Please let me know if you think this is high priority - as of now, I do not consider it too important.