Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
6e540dd
run extended tests on merge to master
s3alfisc Mar 8, 2025
a74ee33
Merge branch 'master' of https://github.com/s3alfisc/pyfixest into ma…
s3alfisc Mar 8, 2025
01e4ffb
Merge branch 'master' of https://github.com/s3alfisc/pyfixest into ma…
s3alfisc Mar 10, 2025
e1b75d9
Merge branch 'master' of https://github.com/s3alfisc/pyfixest into ma…
s3alfisc Mar 11, 2025
223faac
Merge branch 'master' of https://github.com/s3alfisc/pyfixest into ma…
s3alfisc Mar 12, 2025
126f083
Merge branch 'master' of https://github.com/s3alfisc/pyfixest into ma…
s3alfisc Mar 14, 2025
05c6c9e
Merge branch 'master' of https://github.com/s3alfisc/pyfixest into ma…
s3alfisc Mar 16, 2025
48b9ea8
Merge branch 'master' of https://github.com/s3alfisc/pyfixest into ma…
s3alfisc Mar 16, 2025
ff58541
Merge branch 'master' of https://github.com/s3alfisc/pyfixest into ma…
s3alfisc Mar 17, 2025
bcc7e3b
add function to compute non-nested fixed effects
s3alfisc Mar 22, 2025
30564e7
some progress
s3alfisc Mar 23, 2025
3baa47e
confint does not match any more
s3alfisc Mar 24, 2025
8ad116b
better tests, only confint fails
s3alfisc Mar 24, 2025
fbb5095
clarify
s3alfisc Mar 24, 2025
bba695e
small tweak
s3alfisc Mar 24, 2025
52b30e1
fix problem with confint and nested/full fixef_K
s3alfisc Mar 25, 2025
9d4310c
feols tests pass
s3alfisc Mar 25, 2025
f4fc8c7
code simplifications in FixestMulti
s3alfisc Mar 29, 2025
ba15370
cleanups
s3alfisc Mar 29, 2025
d09d7ba
fix test bug
s3alfisc Mar 29, 2025
d92aef0
lint
s3alfisc Mar 29, 2025
63207cc
fixed effects **fully** nested
s3alfisc Mar 29, 2025
229b222
tighten IWLS tolerance
s3alfisc Mar 29, 2025
4b69c61
update readme
s3alfisc Mar 29, 2025
5427993
tighter tolerance for fepois
s3alfisc Mar 29, 2025
c0cdb14
add vignette on ssc details
s3alfisc Mar 30, 2025
6cbe779
update test_ssc and delete vcov tests, as they do the same thing
s3alfisc Mar 30, 2025
134798b
update test_ssc and delete vcov tests, as they do the same thing
s3alfisc Mar 30, 2025
b5d94c9
update nested logic, move to numba
s3alfisc Mar 30, 2025
05d1c67
fix mypy
s3alfisc Mar 30, 2025
8e5377c
delete test_ssc, part of test_vs_fixest now
s3alfisc Mar 30, 2025
8bbcca2
tweak
s3alfisc Mar 30, 2025
88d739c
delete coverage
s3alfisc Mar 30, 2025
3b62993
ignore linter
s3alfisc Mar 31, 2025
71bc907
update tests
s3alfisc Mar 31, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@

`PyFixest` is a Python implementation of the formidable [fixest](https://github.com/lrberge/fixest) package for fast high-dimensional fixed effects regression.

The package aims to mimic `fixest` syntax and functionality as closely as Python allows: if you know `fixest` well, the goal is that you won't have to read the docs to get started! In particular, this means that all of `fixest's` defaults are mirrored by `PyFixest` - currently with only [one small exception](https://github.com/py-econometrics/pyfixest/issues/260).
The package aims to mimic `fixest` syntax and functionality as closely as Python allows: if you know `fixest` well, the goal is that you won't have to read the docs to get started! In particular, this means that all of `fixest's` defaults are mirrored by `PyFixest`.

Nevertheless, for a quick introduction, you can take a look at the [documentation](https://py-econometrics.github.io/pyfixest/pyfixest.html) or the regression chapter of [Arthur Turrell's](https://github.com/aeturrell) book on [Coding for Economists](https://aeturrell.github.io/coding-for-economists/econmt-regression.html#imports).

Expand Down
2 changes: 2 additions & 0 deletions docs/_quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,8 @@ website:
text: "Multiple Testing Corrections"
- file: regression_decomposition.ipynb
text: "Regression Decomposition"
- file: ssc.qmd
text: "On Small Sample Corrections"
- text: "Compare fixest & PyFixest"
file: compare-fixest-pyfixest.qmd
- text: "PyFixest on the GPU"
Expand Down
1 change: 1 addition & 0 deletions docs/changelog.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ fit3 = pf.feols("Y ~ X1 + X2 | f1", data = df)

## PyFixest (In Development, can be installed from github)

- We add options `fixef_k = "nested"` and `fixef_k = "full"` for computing small sample corrections via `pf.ssc()`. We set the defaults for `pf.feols()` and other estimation functions to `fixef_k = "nested"` to 100% mimic the defaults of `r-fixest`. This is a "breaking change" in the sense that it might (slightly) impact the standard errors of your estimations.
- We add R2-within values to the default `etable()` output.
- We fix a small bug in the Gelbach `decompose()` method, which would fail if a user selected `only_coef = True`.
- We fix a small bug in the `predict()` method with `newdata`, see [here](https://github.com/py-econometrics/pyfixest/issues/840) for details.
Expand Down
9 changes: 0 additions & 9 deletions docs/compare-fixest-pyfixest.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -63,15 +63,13 @@ r_fit = fixest.feols(
ro.Formula("Y ~ X1 + X2 | f1 + f2"),
data=data,
vcov="iid",
ssc=fixest.ssc(True, "none", True, "min", "min", False),
)

r_fit_weights = fixest.feols(
ro.Formula("Y ~ X1 + X2 | f1 + f2"),
data=data,
weights=ro.Formula("~weights"),
vcov="iid",
ssc=fixest.ssc(True, "none", True, "min", "min", False),
)
```

Expand Down Expand Up @@ -135,15 +133,13 @@ r_fit = fixest.feols(
ro.Formula("Y ~ X1 + X2 | f1 + f2"),
data=data,
vcov="hetero",
ssc=fixest.ssc(True, "none", True, "min", "min", False),
)

r_fit_weights = fixest.feols(
ro.Formula("Y ~ X1 + X2 | f1 + f2"),
data=data,
weights=ro.Formula("~weights"),
vcov="hetero",
ssc=fixest.ssc(True, "none", True, "min", "min", False),
)
```

Expand Down Expand Up @@ -192,14 +188,12 @@ r_fit = fixest.feols(
ro.Formula("Y ~ X1 + X2 | f1 + f2"),
data=data,
vcov=ro.Formula("~f1"),
ssc=fixest.ssc(True, "none", True, "min", "min", False),
)
r_fit_weights = fixest.feols(
ro.Formula("Y ~ X1 + X2 | f1 + f2"),
data=data,
weights=ro.Formula("~weights"),
vcov=ro.Formula("~f1"),
ssc=fixest.ssc(True, "none", True, "min", "min", False),
)
```

Expand Down Expand Up @@ -250,21 +244,18 @@ fit_r_iid = fixest.fepois(
ro.Formula("Y ~ X1 + X2 | f1 + f2"),
data=data,
vcov="iid",
ssc=fixest.ssc(True, "none", True, "min", "min", False),
)

fit_r_hetero = fixest.fepois(
ro.Formula("Y ~ X1 + X2 | f1 + f2"),
data=data,
vcov="hetero",
ssc=fixest.ssc(True, "none", True, "min", "min", False),
)

fit_r_crv = fixest.fepois(
ro.Formula("Y ~ X1 + X2 | f1 + f2"),
data=data,
vcov=ro.Formula("~f1"),
ssc=fixest.ssc(True, "none", True, "min", "min", False),
)
```

Expand Down
2 changes: 1 addition & 1 deletion docs/pyfixest.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@

`PyFixest` is a Python implementation of the formidable [fixest](https://github.com/lrberge/fixest) package for fast high-dimensional fixed effects regression.

The package aims to mimic `fixest` syntax and functionality as closely as Python allows: if you know `fixest` well, the goal is that you won't have to read the docs to get started! In particular, this means that all of `fixest's` defaults are mirrored by `PyFixest` - currently with only [one small exception](https://github.com/py-econometrics/pyfixest/issues/260).
The package aims to mimic `fixest` syntax and functionality as closely as Python allows: if you know `fixest` well, the goal is that you won't have to read the docs to get started! In particular, this means that all of `fixest's` defaults are mirrored by `PyFixest`.

Nevertheless, for a quick introduction, you can take a look at the [quickstart](https://py-econometrics.github.io/pyfixest/quickstart.html) or the regression chapter of [Arthur Turrell's](https://github.com/aeturrell) book on [Coding for Economists](https://aeturrell.github.io/coding-for-economists/econmt-regression.html#imports).

Expand Down
120 changes: 120 additions & 0 deletions docs/ssc.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
---
title: On Small Sample Corrections
format:
html:
html-table-processing: none
toc: true
toc-title: "On this page"
toc-location: left
---

The `fixest` R package offers a myriad of options to control small sample corrections. Despite an excellent [vignette](https://cran.r-project.org/web/packages/fixest/vignettes/standard_errors.html) on the topic, I have spent way too much time trying to make `pyfixest` standard errors with fixest. Before I forget **why** things
are implemented the way they are, I though I'd document them here.

In both `fixest` and `pyfixest`, small sample corrections can be controlled via the `ssc` function. In `pyfixest`, `ssc` has four function arguments:

- `adj`: controls if a first ssc adjustment should be applied.
- `fixef_k`: controls how to compute the "effective" number of estimated parameters "k".
- `cluster_adj`: controls if a second ssc adjustment should be applied.
- `cluster_df`: controls how to compute ssc adjustments in case of multi-way clustering.

To compute an adjusted variance covariance formula, the following formula is used:

$$
vcov_adj = 1(adj, adj_val(N, dof_k(fixef_k)), 1) x 1(cluster_adj, cluster_adj_val(G, cluster_df), 1) x vcov
$$

where

- `1()` denotes an indicator function
- `adj` is a boolean that denotes if there is a is a scalar adjustment
- `cluster_df` is a boolean that denotes if there is another scalar adjustment
- `dof_k` set the number of 'variables' used to compute `adj`.
- `fixef_k` controls how `dof_k` is computed
- `G` sets the numebr of unique clusters in the data. In case of heteroskedastic errors, `G = N`.
- `vcov` is the unadjusted variance-covariance matrix.

Last, but not part of the formula above, is `df_t`, which denotes the degrees of freedom used in computing t-statistics needed for pvalues and confidence intervals.
We always have that `df_t = N - dof_k`.

# Small Sample Adjustments

## `adj = True`

If `adj=True`, we multiply the unadjusted variance covariance matrix with

$$
adj_val = \frac{N-1}{N-dof_k}.
$$

If not, no adjustment is made.

## `fixef_k`

The value for `adj_val` depends on `dof_k`. The `fixef_k` argument allows us control how `dof_k` is computed. Three options are supported: `fixef_k in ["none", "full", "nested"]`.

### `fixef_k = "none"`

If `fixef_k = "none"`, the fixed effects are discarded. For example, if we fit a model `Y ~ X1 | f1`, we have `k = 1`, where `k` denotes the number of covariates that are not specified as fixed effects using two-part formula syntax. For a model `Y ~ X1 + X2 | f1`, we would set `k = 2`, etc.

### `fixef_k = "full"`

Here, we add the levels of fixed effects. To avoid multicollinearity, we have to drop one level from each fixed effects whenever we have more than one fixed effects. Hence if `n_fe` denotes the total number of fixed effects and `f_{i}` a given fixed effect, we compute `dof_k` as `dof_k = k + k_fe`, where `k_fe = \sum_{i}^{n_fe} levels(f_i)` + (n_fe - 1)` if we have more than one fixed effect and `k_fe = \sum_{i}^{n_fe} levels(f_i) = levels(f_{1})` if we have only one fixed effects.

### `fixef_k = nested`

If we have **clustered** standard errors, fixed effects might be fully nested within fixed effects. One simple example are **cluster fixed effects**.
If we run a model

```
pf.feols("Y ~ X1 | f1", vcov = {"CRV1":f1})
```
the fixed effect `f1` is fully nested in the cluster variable `f1` because they are identical.

A common example of fixed effects nested within a cluster would be district fixed effects and state level clustering - each district is fully nested in a given state.

If `fixef_k` is set to `"nested"`, any "nested" fixed effects are dropped from the computation of `k_fe`, hence we have

$$
k_fe = \sum_{i}^{n_fe} levels(f_i) - k_fe_nested - (n_fe - 1)
$$

where $k_fe_nested$ is the cardinality of the nested fixed effects. For a cluster fixed effect, `k_fe_nested = G`, where `G` is the number of clusters.

Note that if you previously subtracted a level from a nested fixed effect, you might have to add it back (I have lost a few hours on figuring this one out).

## `cluster_adj`

If `cluster_adj = True`, another small sample correction is applied:

$$
cluster_df_val = G / (G - 1)
$$

with $G$ the number of clusters in case of clustered standard errors and `G = N` for heteroskedastic errors. This was a point of great confusion for me - why would the
cluster adjustment be applied to heteroskedastic errors too? But it turns out that this is consistent with R's sandwich package, which is the benchmark implementation
for sandwich covariance matrices. One way to think about this: if errors are heteroskedastic, we have "singleton" clusters, hence $G = N$.

One other point of great confusion for me was that even if `cluster_adj = True` for "iid" errors, the `cluster_val_df` is set to `1`.

## `cluster_df`

This is only relevant when we use multi-way clustered standard errors. Recall that we can write two-way clustered errors as

$$
vcov = ssc_{A} x vcov_{A} + ssc_{B} x vcov_B - ssc_{AB} x vcov_{AB}
$$

where $A$ and $B$ denote the clustering variables with `G(AB) > G_{A} > G_{B}` clusters.

If we set `cluster_df = "min"`, we compute `ssc_{A}`, `ssc_{B}`, `ssc_{AB}` setting `G_{A} = G_{B} = G_{AB} = min(G_{A}, G_{B}, G_{AB})`.
If we set `cluster_df = "conventional`, we use `G_{A ` to compute `ssc_{A}`, `G_{B}` to compute `ssc_{B}`, etc.

# More on Inference

For computing critical values, we compute degrees of freedom `df_t` as `N - dof_k` unless errors are **clustered**, in which case we use `G - 1`.

To compute critical values for OLS and IV regression, we compute t-statistics using `df_t` degrees of freedom. For GLMs, we compute critical values
based on a normal distribution (z-statistics). See [here](https://github.com/py-econometrics/pyfixest/blob/864da9c0d1797aff70e3f5b420e4c73f7256642d/pyfixest/estimation/feols_.py#L851) for the implementation.

For multiway-clustered errors, we set `df_t = min(G_1 - 1, G_2 - 1)` in case of two-way clustering, `df_t = min(G_1 - 1, G_2 - 1, G_3 - 1)` in case of three-way clustering (currently not supported), etc.
Loading
Loading