Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 26 additions & 12 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,20 +16,34 @@ xarray: N-D labeled arrays and datasets
.. image:: http://img.shields.io/badge/benchmarked%20by-asv-green.svg?style=flat
:target: http://pandas.pydata.org/speed/xarray/

**xarray** (formerly **xray**) is an open source project and Python package that aims to bring the
labeled data power of pandas_ to the physical sciences, by providing
N-dimensional variants of the core pandas data structures.

Our goal is to provide a pandas-like and pandas-compatible toolkit for
analytics on multi-dimensional arrays, rather than the tabular data for which
pandas excels. Our approach adopts the `Common Data Model`_ for self-
describing scientific data in widespread use in the Earth sciences:
``xarray.Dataset`` is an in-memory representation of a netCDF file.

**xarray** (formerly **xray**) is an open source project and Python package
that makes working with labelled multi-dimensional arrays simple,
efficient, and fun!

Multi-dimensional (a.k.a. N-dimensional, ND) arrays (sometimes called
"tensors") are an essential part of computational science.
They are encountered in a wide range of fields, including physics, astronomy,
geoscience, bioinformatics, engineering, finance, and deep learning.
In Python, NumPy_ provides the fundamental data structure and API for
working with raw ND arrays.
However, real-world datasets are usually more than just raw numbers;
they have labels which encode information about how the array values map
to locations in space, time, etc.

By introducing *dimensions*, *coordinates*, and *attributes* on top of raw
NumPy-like arrays, xarray is able to understand these labels and use them to
provide a more intuitive, more concise, and less error-prone experience.
Xarray also provides a large and growing library of functions for advanced
analytics and visualization with these data structures.
Xarray was inspired by and borrows heavily from pandas_, the popular data
analysis package focused on labelled tabular data.
Xarray can read and write data from most common labeled ND-array storage
formats and is particularly tailored to working with netCDF_ files, which were
the source of xarray's data model.

.. _NumPy: http://www.numpy.org/
.. _pandas: http://pandas.pydata.org
.. _Common Data Model: http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/CDM
.. _netCDF: http://www.unidata.ucar.edu/software/netcdf
.. _OPeNDAP: http://www.opendap.org/

Why xarray?
-----------
Expand Down
38 changes: 26 additions & 12 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,33 @@ xarray: N-D labeled arrays and datasets in Python
=================================================

**xarray** (formerly **xray**) is an open source project and Python package
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shoyer can we drop the reference to xray? The set of people that know the old xray and don't know the new xarray name is probably next to empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly, just today in the twitter thread under discussion, someone referenced xray and linked to the v0.2 documentation. 🤦‍♂️

that aims to bring the labeled data power of pandas_ to the physical sciences,
by providing N-dimensional variants of the core pandas data structures.

Our goal is to provide a pandas-like and pandas-compatible toolkit for
analytics on multi-dimensional arrays, rather than the tabular data for which
pandas excels. Our approach adopts the `Common Data Model`_ for self-
describing scientific data in widespread use in the Earth sciences:
``xarray.Dataset`` is an in-memory representation of a netCDF file.

that makes working with labelled multi-dimensional arrays simple,
efficient, and fun!

Multi-dimensional (a.k.a. N-dimensional, ND) arrays (sometimes called
"tensors") are an essential part of computational science.
They are encountered in a wide range of fields, including physics, astronomy,
geoscience, bioinformatics, engineering, finance, and deep learning.
In Python, NumPy_ provides the fundamental data structure and API for
working with raw ND arrays.
However, real-world datasets are usually more than just raw numbers;
they have labels which encode information about how the array values map
to locations in space, time, etc.

By introducing *dimensions*, *coordinates*, and *attributes* on top of raw
NumPy-like arrays, xarray is able to understand these labels and use them to
provide a more intuitive, more concise, and less error-prone experience.
Xarray also provides a large and growing library of functions for advanced
analytics and visualization with these data structures.
Xarray was inspired by and borrows heavily from pandas_, the popular data
analysis package focused on labelled tabular data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to still see the words "netCDF" somewhere (or maybe that's implicit in our mentioning of the "Common Data Model"?).

Roughly speaking we have three audiences here:

  • NumPy users who want labels
  • pandas users who want to work with higher-dimensional data
  • netCDF users who want good in-memory data-structures

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it in response to @alexamici's comments. But in retrospect I agree that it belongs there. (I personally had never heard of CDM before xarray.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prioritize mentioning netCDF over the CDM and maybe drop CDM entirely from the brief intro. I don't think many people know what the "common data model" refers to, and worse it seems to be a heavily overloaded term, even in technical contexts (e.g., the top hit from Google is something unrelated from Microsoft).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roughly speaking we have three audiences here:

* NumPy users who want labels

* pandas users who want to work with higher-dimensional data

* netCDF users who want good in-memory data-structures

This seems key enough that I might even put this somewhere in the docs?

and

* pandas users who want to work with higher-dimensional data
->
* pandas users who want to work with higher-dimensional data and an explicit, production-capable API

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be good stuff to add to the “Why xarray” page.

Xarray can read and write data from most common labeled ND-array storage
formats and is particularly tailored to working with netCDF_ files, which were
the source of xarray's data model.

.. _NumPy: http://www.numpy.org/
.. _pandas: http://pandas.pydata.org
.. _Common Data Model: http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/CDM
.. _netCDF: http://www.unidata.ucar.edu/software/netcdf
.. _OPeNDAP: http://www.opendap.org/

Documentation
-------------
Expand Down Expand Up @@ -106,7 +120,7 @@ See also
.. _2015 Unidata Users Workshop talk: https://www.youtube.com/watch?v=J9ypQOnt5l8
.. _tutorial: https://github.com/Unidata/unidata-users-workshop/blob/master/notebooks/xray-tutorial.ipynb
.. _with answers: https://github.com/Unidata/unidata-users-workshop/blob/master/notebooks/xray-tutorial-with-answers.ipynb
.. _Nicolas Fauchereau's tutorial: http://nbviewer.ipython.org/github/nicolasfauchereau/metocean/blob/master/notebooks/xray.ipynb
.. _Nicolas Fauchereau's tutorial: http://nbviewer.iPython.org/github/nicolasfauchereau/metocean/blob/master/notebooks/xray.ipynb

Get in touch
------------
Expand Down
32 changes: 23 additions & 9 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,19 +33,33 @@
DESCRIPTION = "N-D labeled arrays and datasets in Python"
LONG_DESCRIPTION = """
**xarray** (formerly **xray**) is an open source project and Python package
that aims to bring the labeled data power of pandas_ to the physical sciences,
by providing N-dimensional variants of the core pandas data structures.
that makes working with labelled multi-dimensional arrays simple,
efficient, and fun!

Our goal is to provide a pandas-like and pandas-compatible toolkit for
analytics on multi-dimensional arrays, rather than the tabular data for which
pandas excels. Our approach adopts the `Common Data Model`_ for self-
describing scientific data in widespread use in the Earth sciences:
``xarray.Dataset`` is an in-memory representation of a netCDF file.
Multi-dimensional (a.k.a. N-dimensional, ND) arrays (sometimes called
"tensors") are an essential part of computational science.
They are encountered in a wide range of fields, including physics, astronomy,
geoscience, bioinformatics, engineering, finance, and deep learning.
In Python, NumPy_ provides the fundamental data structure and API for
working with raw ND arrays.
However, real-world datasets are usually more than just raw numbers;
they have labels which encode information about how the array values map
to locations in space, time, etc.

By introducing *dimensions*, *coordinates*, and *attributes* on top of raw
NumPy-like arrays, xarray is able to understand these labels and use them to
provide a more intuitive, more concise, and less error-prone experience.
Xarray also provides a large and growing library of functions for advanced
analytics and visualization with these data structures.
Xarray was inspired by and borrows heavily from pandas_, the popular data
analysis package focused on labelled tabular data.
Xarray can read and write data from most common labeled ND-array storage
formats and is particularly tailored to working with netCDF_ files, which were
the source of xarray's data model.

.. _NumPy: http://www.numpy.org/
.. _pandas: http://pandas.pydata.org
.. _Common Data Model: http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/CDM
.. _netCDF: http://www.unidata.ucar.edu/software/netcdf
.. _OPeNDAP: http://www.opendap.org/

Important links
---------------
Expand Down