Skip to content

Core protocol v3.0 - data types#18

Merged
alimanfoo merged 4 commits intozarr-developers:core-protocol-v3.0-devfrom
alimanfoo:core-protocol-v3.0-dtypes
May 21, 2019
Merged

Core protocol v3.0 - data types#18
alimanfoo merged 4 commits intozarr-developers:core-protocol-v3.0-devfrom
alimanfoo:core-protocol-v3.0-dtypes

Conversation

@alimanfoo
Copy link
Member

@alimanfoo alimanfoo commented May 7, 2019

This PR proposes a section of the v3.0 core protocol specification describing a set of data types for array elements.

Some discussion/decision points, for the following, should they be defined in the core protocol or via a protocol extension:

  • Fixed length byte string types (corresponding to types like 'S4' in numpy for an array containing length 4 byte strings)?
  • Fixed length unicode type (i.e., corresponding to types like '<U4' in numpy for an array containing length 4 unicode code points)?
  • Datetime/timedelta data types (corresponding to e.g. 'M8[ns]' and 'm8[ns]' types in numpy)?
  • Structured (i.e., struct-like) data types?
  • Variable length data types, such as variable length arrays of a primitive type, or variable length byte or text strings?

Core data types
~~~~~~~~~~~~~~~

.. list-table:: Data types
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should render as a table when built via sphinx.

I know it's a bit repetitive to enumerate every data type explicitly, given that many have both a big- and little-endian form. However for now I thought it simplest to just enumerate them.

@alimanfoo
Copy link
Member Author

A straw man for discussion. Note that tentatively I'm suggesting the core protocol is limited to boolean, integer, float and complex types, but that protocol extensions can define other data types.

So, e.g., this would mean that other features like datetime/timedelta data types, structured (struct) data types, and variable length data types could each be addressed via separate protocol extensions. The idea behind dividing it up this way is just to keep the core protocol small and simple to implement.

@alimanfoo alimanfoo changed the title Core protocol v3.0: dtypes WIP: Core protocol v3.0 - dtypes May 7, 2019
@alimanfoo alimanfoo force-pushed the core-protocol-v3.0-dtypes branch 2 times, most recently from 4631c80 to de32e65 Compare May 7, 2019 21:00
@alimanfoo alimanfoo changed the title WIP: Core protocol v3.0 - dtypes WIP: Core protocol v3.0 - data types May 7, 2019
@alimanfoo alimanfoo requested a review from a team May 7, 2019 22:00
@alimanfoo
Copy link
Member Author

Also realising that I haven't mentioned a fixed length bytes type (i.e., corresponding to types like 'S4' in numpy for an array containing length 4 byte strings) or a fixed length unicode type (i.e., corresponding to types like '<U4' in numpy for an array containing length 4 unicode code points). Up for discussion whether these should be in the core protocol spec.

@meggart
Copy link
Member

meggart commented May 8, 2019

I think for the ease of implementation in different languages I would tend to move string-type specs (S4, <U4) to a protocol extension. From our experience implementing the specs in Julia, it was very simple to implement the numeric types, but starting with the different fixed-size and variable-sized string encodings needed a lot of special-casing and made the code much less generic, because the Julia String type does not directly map to numpy's representation. So for me it would feel natural to move these parts of the code to some extension module.

However, I don't have strong feelings about this and can definitely see the advantage of simply supporting all numpy dtypes.

@alimanfoo
Copy link
Member Author

I think for the ease of implementation in different languages I would tend to move string-type specs (S4, <U4) to a protocol extension.

Thanks @meggart, that's very useful to know. I'm in favour of making the core protocol as easy as possible to implement in different languages, and so would be happy if these types were addressed via a protocol extension.

@alimanfoo alimanfoo force-pushed the core-protocol-v3.0-dtypes branch from de32e65 to 32f4ab2 Compare May 9, 2019 16:43
@alimanfoo alimanfoo changed the title WIP: Core protocol v3.0 - data types Core protocol v3.0 - data types May 14, 2019
@alimanfoo
Copy link
Member Author

In the interests of having content together in one place, I'd like to merge this PR into the core-protocol-v3.0-dev branch. We can still discuss, revise and revisit anything after merge. I'll merge tomorrow if no objections.

@alimanfoo alimanfoo merged commit 9630297 into zarr-developers:core-protocol-v3.0-dev May 21, 2019
@alimanfoo alimanfoo deleted the core-protocol-v3.0-dtypes branch May 21, 2019 11:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants