Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a comparison with GAM tools to the docs? #471

Open
tbenthompson opened this issue Oct 14, 2021 · 3 comments
Open

Add a comparison with GAM tools to the docs? #471

tbenthompson opened this issue Oct 14, 2021 · 3 comments

Comments

@tbenthompson
Copy link
Collaborator

tbenthompson commented Oct 14, 2021

I've gotten a few different questions about how glum compares to using something like pygam or whether we have plans to support GAMs:

Strictly speaking, GLMs are a subset of GAMs so this question seems very appropriate. Looking over pygam, for example, a few things seem missing/different:

  • L1/elastic net penalties are not available.
  • No regularization path features.
  • No handling of sparse and categorical data.
    On the other hand, the space of models described by GAMs is much broader. I might be wrong about these specific points since I'm not experienced with the library.

It would be interesting to do a quick benchmark or even add pygam to the benchmark suite to also have a performance comparison for glum vs pygam. My guess is that glum for GLMs is substantially faster because it's tailored specifically to the problem and also handles the sparsity issues well.

A final question is how much of the feature set (e.g. elastic net regularization along a path) could be ported to the gam setting and whether some of the work we did here could be extended provide a basis for a GAM library.

@lbittarello
Copy link
Member

lbittarello commented Oct 15, 2021

A final question is [...] whether some of the work we did here could be extended provide a basis for a GAM library

There are different ways to fit GAMs. As far as I understand, pygam is basically using splines to approximate the unknown relationships between outcomes and continuous regressors. For any given sample size, this approach is effectively identical to a GLM with feature engineering (i.e. glum + sklearn). Indeed, it only qualifies as a GAM if the complexity of the splines increases in some automated fashion with the sample size. The difference is conceptual (parametric vs semiparametric) rather than practical. One could also fit GAMs with local regressions or something fancier, in which case the equivalence with GLMs breaks down.

@mattmills49
Copy link

If this group is interested in including smoothing spline functionality in glum I'd be happy to help out. I put together a guide on how to fit penalized splines using your own custom penalty matrix here: http://statmills.com/2023-11-20-Penalized_Splines_Using_glum/ .

While you can theoretically do GAMs in any way with your own penalty matrix I think incorporating separate penalties per smooth term and interaction splines into glum would make it way easier for users to actually use this functionality. There is obviously more considerations than just mine but I thought I'd offer to help if there is an appetite for more here.

@MatthiasSchmidtblaicherQC
Copy link
Contributor

MatthiasSchmidtblaicherQC commented Jan 30, 2024

Thanks for the cool tutorial and for the offer to help out here. With the upcoming release of glum v3, there will be two main user interfaces:

  1. The current one in which the user passes a design matrix that supposedly comes from preprocessing some dataframe, and
  2. a formula interface building on formulaic, which does preprocessing such as creating B-splines "under the hood".

With 1., I find the approach in the tutorial quite natural: given that the user already built a model matrix, she can also specify a custom penalty matrix. Here, I see scope for convenience helpers for creating custom penalty matrices though. With 2., it seems more natural to create penalty matrices inside the model, so smoothing or cyclic constraints that work within the formula interface would be welcome contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants