Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
118 changes: 118 additions & 0 deletions docs/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -199,3 +199,121 @@ The documentation can be built by running::
$ tox -e docs

The resulting built documentation will be available in the ``.tox/docs/tmp/html`` folder.

Development best practices, policies and procedures
---------------------------------------------------

The following information is mainly for core developers, but may also be of interest to
contributors.

Merging pull requests
~~~~~~~~~~~~~~~~~~~~~

If at all possible, pull requests submitted by an external contributor should be
reviewed and approved by all core developers before being merged. Pull requests
Copy link
Member

@jakirkham jakirkham Jan 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to constrain it to all? It may be difficult to achieve due to time constraints and competing demands. Admittedly "all" is small right now though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As contrast, scikit-learn requires two approvals for merging (see Pull Request Checklist). Dask has a more informal mechanism by which core developers say they plan to merge in a particular time frame if they get no comments (typical in a day).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll edit this to say we'd like to aim for pull requests to be reviewed by at least one other core developer. With only two of us at the moment it amounts to the same thing, but I think it's probably worth saying we'd like to aim for some review of PRs. This will slow things down if either of us is unable to review due to other commitments, but I think it's probably ok to move slowly for a while and focus on consolidation and sharing knowledge between core devs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I completely agree. We also added a lot of features. So we might need to be in maintenance mode for a bit just in case any bugs show up.

submitted by a core developer should be reviewed and approved by all other core
developers before being merged.

Pull requests should not be merged until all CI checks have passed (Travis, AppVeyor,
Coveralls) against code that has had the latest master merged in.

Compatibility and versioning policies
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Because Zarr is a data storage library, there are two types of compatibility to
consider: API compatibility and data format compatibility.

Here we consider all functions, classes and methods that do not begin with an
underscore as part of the Zarr public API. Any change to the public API that does **not**
break existing third party code importing Zarr, or cause third party code to behave in
a different way, is a **backwards-compatible API change**. For example, adding a new
function, class or method is usually a backwards compatible change. However, removing a
function, class or method; removing an argument to a function or method; adding a
required argument to a function or method; or changing the behaviour of a function or
method, are examples of **backwards-incompatible API changes**.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we did add a few notes about functions not having a stable API. We should probably include a similar note here.

Also how do you feel about documented API vs. functions found inside Zarr? Some projects only consider the documented part public. In that case even functions that don't have a leading underscore are still considered private as long as they are not in the docs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry. Just got to the section on not stable API below. Thoughts about moving that section after this one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I will change the definition of public API to include only what is documented in the API docs, probably safer for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM. We can always expand the scope later.


If a release contains no changes to the public API (e.g., contains only bug fixes or
other maintenance work), then the micro version number should be incremented (e.g.,
2.2.0 -> 2.2.1). If a release contains API changes, but all API changes are
backwards-compatible, then the minor version number should be incremented
(e.g., 2.2.1 -> 2.3.0). If a release contains any backwards-incompatible API changes,
the major version number should be incremented (e.g., 2.3.0 -> 3.0.0).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to go down the road of having release branches? Asking this partially as I would like to get an idea of how we want to handle breaking changes should they be proposed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would we use release branches? Is it to allow work to be going on on the next major version (including breaking API changes) in parallel with bug fixes on the current major version?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a reasonable solution to me.


Exceptions can be made for any function, class or method which has been documented as
an experimental feature, i.e., backwards-incompatible changes can be included in a
minor release, although this should be avoided wherever possible.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted above, it would be nice to have this right after the API section as they seem more logically connected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will restructure.


The data format used by Zarr is defined by a specification document, which should be
platform-independent and contain sufficient detail to construct an interoperable
software library to read and/or write Zarr data using any programming language. The
latest version of the specification document is available from the :ref:`spec` page.
Here, **data format compatibility** means that all software libraries that implement a
particular version of the Zarr storage specification are interoperable, in the sense
that data written by any one library could be read by all others. It is obviously
desirable to maintain data format compatibility wherever possible. However, if a change
is needed to the storage specification, and that change would break data format
compatibility in any way, then the storage specification version number should be
incremented (e.g., 2 -> 3).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when we add something to the data format spec which is not breaking? Is there a way to version this?

Also related how do we migrate between these enhancement versions? Lastly do we have a way to check the data format spec version of Zarr and/or the files themselves?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, I cannot imagine any change to the data format spec that does not break data format compatibility in some way, in the sense that data written by any one implementation can be read correctly by all others. Hence the storage spec has only a major version number, there is no minor or micro version number. This simplifies the definition of data format compatibility, i.e., all libraries implementing a given spec version number should be fully interoperable. That said, if you have an example of a spec change that does not fit into this model, happy to discuss.

The spec version of the data files is included in the metadata (.zarray or .zgroup). The spec version that zarr implements is currently stored as the ZARR_FORMAT variable in the zarr.meta module. There is a check on reading array or group metadata, if there is a difference between the spec version in the metadata files and the ZARR_FORMAT variable then an exception is raised.

When we moved from spec version 1 to 2 I implemented a function zarr.storage.migrate_1to2 which updates the metadata resources to conform to spec version 2. I imagined that we would do something similar for any future spec changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an example, we recently opted to not always write metadata files and noted this in the spec. We have also made sure that current Zarr treats this as empty metadata. That said, I'm not sure if an older version of Zarr (claiming to support the same spec version) would handle this correctly. Even though what we added didn't fundamentally break the version 2 spec. Instead it provided a superset of the existing functionality. IOW this seems like a feature we added to the spec that would make it 2.1. Though this could just be me being pedantic. :)

Everything else sounds great. We might want to mention the ZARR_FORMAT variable here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a fair point, although that change was a bit of a grey area in that zarr 2.1 is happy if .zattrs is missing and the spec did not say whether .zattrs is required or not, so saying that .zattrs is optional was a clarification and did not break compatibility with previous zarr versions implementing spec version 2. I might have been a bit more cautious if I knew there were other zarr implementations out there, but it still would have been a clarification rather than a change to the spec.

We may still come across other cases which are more clear cut, in that some change is required in the spec to support a new feature, and where using something more fine-grained than a single major version number for the spec version could be useful, but I find it hard to think that through without a concrete case in front of me. The questions we would need to answer are things like, what kinds of changes would be allowed in a minor spec change (e.g., 2.0 -> 2.1), how should an implementation behave if it implements spec version 2.0 and encounters data using spec version 2.1, etc. I think we can cross those bridges if and when the need arises.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably smooth this over by adding one last patch release to 2.1 with this change. Alternatively we could make the behavior something users opt-in to. That said, I'm not too worried about it. In any event, this discussion is diverging from the spec version and can be continued in an issue if we would like.

Sure, make sense. Easy to add once we need it.


The versioning of the Zarr software library is related to the versioning of the storage
specification as follows. A particular version of the Zarr library will
implement a particular version of the storage specification. For example, Zarr version
2.2.0 implements the Zarr storage specification version 2. If a release of the Zarr
library implements a different version of the storage specification, then the major
version number of the Zarr library should be incremented. E.g., if Zarr version 2.2.0
implements the storage spec version 2, and the next release of the Zarr library
implements storage spec version 3, then the next library release should have version
number 3.0.0. Note however that the major version number of the Zarr library may not
always correspond to the spec version number. For example, Zarr versions 2.x, 3.x, and
4.x might all implement the same version of the storage spec and thus maintain data
format compatibility, although they will not maintain API compatibility.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we store the spec version in a variable somewhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zarr.meta.ZARR_FORMAT


Note that the Zarr test suite includes a data fixture and tests to try and ensure that
data format compatibility is not accidentally broken. See the
:func:`test_format_compatibility` function in the :mod:`zarr.tests.test_storage` module
for details.

When to make a release
~~~~~~~~~~~~~~~~~~~~~~

Ideally, any bug fixes that don't change the public API should be released as soon as
possible. It is fine for a micro release to contain only a single bug fix.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hence the branching question above. Do we want to branch out 2.2 and allow mater to get enhancements and backport patches to 2.2?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this would be great, although given that developer time is limited I think I wouldn't want to make any promises to backport bug fixes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. :)


When to make a minor release is at the discretion of the core developers. There are no
hard-and-fast rules, e.g., it is fine to make a minor release to make a single new
feature available; equally, it is fine to make a minor release that includes a number of
changes.

Major releases obviously need to be given careful consideration, and should be done as
infrequently as possible, as they will break existing code and/or affect data
compatibility in some way.

Release procedure
~~~~~~~~~~~~~~~~~

Checkout and update the master branch::

$ git checkout master
$ git pull

Verify all tests pass on all supported Python versions, and docs build::

$ tox

Tag the version (where "X.X.X" stands for the version number, e.g., "2.2.0")::

$ version=X.X.X
$ git tag -a v$version -m v$version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any other rules about the message? Are we allowed to highlight changes or is that undesirable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind about the message, happy with more information if useful.

$ git push --tags

Release source code to PyPI::

$ python setup.py register sdist
$ twine upload dist/zarr-${version}.tar.gz
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there some rules about who gets permissions to PyPI?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy for other core devs to have PyPI permissions.


Obtain checksum for release to conda-forge::

$ openssl sha256 dist/zarr-${version}.tar.gz

Release to conda-forge by making a pull request against the zarr-feedstock conda-forge
repository, incrementing the version number.