Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All code is not equal #17

Open
khinsen opened this issue Mar 29, 2018 · 1 comment
Open

All code is not equal #17

khinsen opened this issue Mar 29, 2018 · 1 comment

Comments

@khinsen
Copy link

khinsen commented Mar 29, 2018

I very much like the spirit of this manifesto! But as so often, the devil is in the details.

My main problem with the current version is that there are different types of code involved in doing science, but the principles cited in the manifest cannot be applied straightforwardly to all of them.

Here is an illustration of how I see the typical scientific software stack:
software-stack.pdf

I'd say that the top three layers are in the domain of the manifesto, so let's go through the principles one by one and see how they fit:

  1. Open over closed

Makes sense for all layers, but "released by the time of publication" makes sense only for the top one, assuming that "publication" refers to a scientific paper. The lower layers evolve independently on any specific paper.

  1. Code for the future

As a principle this is fine for everything, but "testing, writing documentation, instructions on how to run and maintain your code" is not always reasonable or practical for the top layer. Nobody maintains project-specific workflows or notebooks today, and it isn't clear that this is a practice one could reasonably move towards unless scientific papers become less numerous and more substantial. Testing is also of very limited interest for code that computes things that have never been computed before.

  1. Incorrect code results in incorrect science

"Code published in journals should be peer reviewed." I'd say that all scientific code should be peer reviewed. For the top layer, this should be part of reviewing the scientific paper, because the review must also check if the code actually does what the paper says. But this requires changes in the review process that are not obvious to implement. For example, the experience of ReScience suggests that effective code review requires rapid interaction between authors and reviewers referring to a common codebase.

For the bottom two layers, code review needs to be continuous as the code evolves, meaning that it must be separate from any journal publication. There is no infrastructure for this at all at this time. Is it reasonable in a manifesto to call for something that is impossible in the immediate future? Honest question, I don't know how pragmatic manifestos should be.

  1. Availability over perfection

This mostly applies to the top layer. The further down the stack you move, the more professionalism can and should be expected.

  1. Code deserves credit

Certainly, but how far down the stack should one cite? For the top two layers, the obligation seems obvious. But should you cite NumPy? BLAS? Python? zlib? gcc? Linux?

@npscience
Copy link

I found this review really useful, in particular:

the experience of ReScience suggests that effective code review requires rapid interaction between authors and reviewers referring to a common codebase.

This is valuable insight, thank you @khinsen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants