Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A data science cookiecutter #415

Open
sfmig opened this issue Jun 12, 2024 · 9 comments
Open

A data science cookiecutter #415

sfmig opened this issue Jun 12, 2024 · 9 comments
Labels
enhancement New feature or request

Comments

@sfmig
Copy link
Contributor

sfmig commented Jun 12, 2024

Is Your Feature Request Related to a Problem? Please Describe

We recently chatted with @samcunliffe and Niko from SWC (cannot tag him) about a cookiecutter for data science / scientific analysis / exploratory Python projects.

The idea would be to have some lighter requirements than for a fully-fledged Python package. And maybe some data science specific additions (like for example functionality for formatting and checking notebooks). This could be useful for many researchers, and maybe a good entry point to getting into good software practices.

Describe the Solution You'd Like

If we find this could be useful, we could have it:

  • in a separate repo, or
  • as part of this repo, and ask the user in the initial config phase which version they want to instantiate. @samcunliffe mentioned we could maybe use hidden variables for this.

Alternatively, we can just point to a good cookiecutter for this purpose if that already exists.

Describe Alternatives You've Considered

There are some examples of research cookiecutter:

  • this one is very basic (mostly a directory structure) but maybe a good starting point if we decide we want to implement this ourselves.
  • this one seems more recent and quite well documented. It may be a good one to point to if we decide that doing one ourselves is out of scope

Additional Context

No response

@sfmig sfmig added the enhancement New feature or request label Jun 12, 2024
@samcunliffe
Copy link
Member

A minimal solution would be to add a couple of trusted/tested data science templates to The Templates Page. I'd propose a section above "Community-specific..."

@sfmig
Copy link
Contributor Author

sfmig commented Jun 14, 2024

this website and the associated cookiecutter may be a nice one to try out

@paddyroddy
Copy link
Member

this website and the associated cookiecutter may be a nice one to try out

Does feel a shame to be recommending a rival 🤔

@samcunliffe
Copy link
Member

this one seems more recent and quite well documented. It may be a good one to point to if we decide that doing one ourselves is out of scope

I heard good things about ccds. Never actually used it. Should we ask in #datascience on Slack? If we're making a recommendation under the ARC logo, perhaps they should be consulted.

@dstansby
Copy link
Member

(like for example functionality for formatting and checking notebooks)

This sounds like something we could add here anyway?

@paddyroddy
Copy link
Member

This sounds like something we could add here anyway?

👍 #49

@dstansby
Copy link
Member

I've been playing around with uv projects recently, and am finding them a really nice halfway house between a single python script, and a full blown package. @sfmig when you have time, would be good to hear if uv projects are the kind of thing you were thinking of here? If so I think we can close this as something we won't duplicate in this repo.

@sfmig
Copy link
Contributor Author

sfmig commented Nov 1, 2024

thanks for pointing this out @dstansby!

I had a look and asked around a bit. It does seem like uv's functionalities for creating Python projects could be useful. Particularly its distinction between applications, libraries and packages seems to cover more cases than the standard cookiecutter, that mainly targets people who want to make a fully-fledged Python package. I agree that for a scientist with a few Python scripts, setting up a uv application may be a softer entry to good software development practices than starting off with a cookiecutter.

However, uv doesn't support conda which seems like a big downside for many science and data science projects.

I think if we were to recommend people to use uv to manage Python projects that are more data-sciency, we may be sending them down terrible rabbitholes having to deal with uv and conda environments simultaneously (of which I have no experience). But you have used uv, so do let me know if this is not accurate.

@paddyroddy
Copy link
Member

@sfmig https://github.com/prefix-dev/pixi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants