Skip to content
2 changes: 1 addition & 1 deletion doc/source/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ parts:
- file: data/advanced-pipelines
- file: data/random-access
- file: data/faq
- file: data/package-ref
- file: data/api/api
- file: data/integrations

- file: train/train
Expand Down
16 changes: 16 additions & 0 deletions doc/source/data/api/api.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
.. _data-api:

Ray Datasets API
================

.. toctree::
:maxdepth: 2

input_output.rst
dataset.rst
dataset_pipeline.rst
grouped_dataset.rst
dataset_context.rst
data_representations.rst
random_access_dataset.rst
utility.rst
48 changes: 48 additions & 0 deletions doc/source/data/api/data_representations.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
.. _data-representations:

Data Representations
====================

Block API
---------

.. autoclass:: ray.data.block.Block

.. autoclass:: ray.data.block.BlockExecStats
:members:

.. autoclass:: ray.data.block.BlockMetadata
:members:

.. autoclass:: ray.data.block.BlockAccessor
:members:


Batch API
---------

.. autoclass:: ray.data.block.DataBatch

Row API
--------

.. autoclass:: ray.data.row.TableRow
:members:


.. _dataset-tensor-extension-api:

Tensor Column Extension API
---------------------------

.. autoclass:: ray.data.extensions.tensor_extension.TensorDtype
:members:

.. autoclass:: ray.data.extensions.tensor_extension.TensorArray
:members:

.. autoclass:: ray.data.extensions.tensor_extension.ArrowTensorType
:members:

.. autoclass:: ray.data.extensions.tensor_extension.ArrowTensorArray
:members:
285 changes: 285 additions & 0 deletions doc/source/data/api/dataset.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,285 @@
.. _dataset-api:

Dataset API
===========

.. autoclass:: ray.data.Dataset

**Basic Transformations**

.. autosummary::
:nosignatures:

ray.data.Dataset.map
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A stretch request: add a ref to each API, so that we can pinpoint to individual API, which is useful. I think what's needed is adding something like ".. _dataset-map-ref:".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should already be natively supported!

:py:meth:`ray.data.Dataset.map`

Copy link
Contributor

@clarkzinzow clarkzinzow Aug 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jianoaix For the individual APIs, that should already be doable with cross-referencing via e.g. :meth:`ray.data.Dataset.map`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, can it be linked with a url? e.g. https://docs.ray.io/en/master/data/package-ref.html#dataset-api, if we point a user to map(), can we give a ulr pointing to it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found it: https://docs.ray.io/en/master/data/package-ref.html#ray.data.Dataset.map. I guess I was looking at the navigation list on the right-hand-side which doesn't show it.

ray.data.Dataset.map_batches
ray.data.Dataset.flat_map
ray.data.Dataset.filter
ray.data.Dataset.add_column
ray.data.Dataset.drop_columns
ray.data.Dataset.random_sample
ray.data.Dataset.limit

**Sorting, Shuffling, Repartitioning**

.. autosummary::
:nosignatures:

ray.data.Dataset.sort
ray.data.Dataset.random_shuffle
ray.data.Dataset.randomize_block_order
ray.data.Dataset.repartition

**Splitting and Merging Datasets**

.. autosummary::
:nosignatures:

ray.data.Dataset.split
ray.data.Dataset.split_at_indices
ray.data.Dataset.split_proportionately
ray.data.Dataset.train_test_split
ray.data.Dataset.union
ray.data.Dataset.zip

**Grouped and Global Aggregations**

.. autosummary::
:nosignatures:

ray.data.Dataset.groupby
ray.data.Dataset.aggregate
ray.data.Dataset.sum
ray.data.Dataset.min
ray.data.Dataset.max
ray.data.Dataset.mean
ray.data.Dataset.std

**Converting to Pipelines**

.. autosummary::
:nosignatures:

ray.data.Dataset.repeat
ray.data.Dataset.window

**Consuming Datasets**

.. autosummary::
:nosignatures:

ray.data.Dataset.show
ray.data.Dataset.take
ray.data.Dataset.take_all
ray.data.Dataset.iter_rows
ray.data.Dataset.iter_batches
ray.data.Dataset.iter_torch_batches
ray.data.Dataset.iter_tf_batches

**I/O and Conversion**

.. autosummary::
:nosignatures:

ray.data.Dataset.write_parquet
ray.data.Dataset.write_json
ray.data.Dataset.write_csv
ray.data.Dataset.write_numpy
ray.data.Dataset.write_datasource
ray.data.Dataset.to_torch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we actually deprecate these two in 2.0?

ray.data.Dataset.to_tf
ray.data.Dataset.to_dask
ray.data.Dataset.to_mars
ray.data.Dataset.to_modin
ray.data.Dataset.to_spark
ray.data.Dataset.to_pandas
ray.data.Dataset.to_pandas_refs
ray.data.Dataset.to_numpy_refs
ray.data.Dataset.to_arrow_refs
ray.data.Dataset.to_random_access_dataset

**Inspecting Metadata**

.. autosummary::
:nosignatures:

ray.data.Dataset.count
ray.data.Dataset.schema
ray.data.Dataset.num_blocks
ray.data.Dataset.size_bytes
ray.data.Dataset.input_files
ray.data.Dataset.stats
ray.data.Dataset.get_internal_block_refs

**Execution**

.. autosummary::
:nosignatures:

ray.data.Dataset.fully_executed
ray.data.Dataset.is_fully_executed
ray.data.Dataset.lazy

**Serialization**

.. autosummary::
:nosignatures:

ray.data.Dataset.has_serializable_lineage
ray.data.Dataset.serialize_lineage
ray.data.Dataset.deserialize_lineage

Basic Transformations
---------------------

.. automethod:: ray.data.Dataset.map

.. automethod:: ray.data.Dataset.map_batches

.. automethod:: ray.data.Dataset.flat_map

.. automethod:: ray.data.Dataset.filter

.. automethod:: ray.data.Dataset.add_column

.. automethod:: ray.data.Dataset.drop_columns

.. automethod:: ray.data.Dataset.random_sample

.. automethod:: ray.data.Dataset.limit

Sorting, Shuffling, Repartitioning
----------------------------------

.. automethod:: ray.data.Dataset.sort

.. automethod:: ray.data.Dataset.random_shuffle

.. automethod:: ray.data.Dataset.randomize_block_order

.. automethod:: ray.data.Dataset.repartition

Splitting and Merging Datasets
------------------------------

.. automethod:: ray.data.Dataset.split

.. automethod:: ray.data.Dataset.split_at_indices

.. automethod:: ray.data.Dataset.split_proportionately

.. automethod:: ray.data.Dataset.train_test_split

.. automethod:: ray.data.Dataset.union

.. automethod:: ray.data.Dataset.zip

Grouped and Global Aggregations
-------------------------------

.. automethod:: ray.data.Dataset.groupby

.. automethod:: ray.data.Dataset.aggregate

.. automethod:: ray.data.Dataset.sum

.. automethod:: ray.data.Dataset.min

.. automethod:: ray.data.Dataset.max

.. automethod:: ray.data.Dataset.mean

.. automethod:: ray.data.Dataset.std

Converting to Pipeline
----------------------

.. automethod:: ray.data.Dataset.repeat

.. automethod:: ray.data.Dataset.window

Consuming Datasets
------------------

.. automethod:: ray.data.Dataset.show

.. automethod:: ray.data.Dataset.take

.. automethod:: ray.data.Dataset.take_all

.. automethod:: ray.data.Dataset.iter_rows

.. automethod:: ray.data.Dataset.iter_batches

.. automethod:: ray.data.Dataset.iter_torch_batches

.. automethod:: ray.data.Dataset.iter_tf_batches

I/O and Conversion
------------------

.. automethod:: ray.data.Dataset.write_parquet

.. automethod:: ray.data.Dataset.write_json

.. automethod:: ray.data.Dataset.write_csv

.. automethod:: ray.data.Dataset.write_numpy

.. automethod:: ray.data.Dataset.write_datasource

.. automethod:: ray.data.Dataset.to_torch

.. automethod:: ray.data.Dataset.to_tf

.. automethod:: ray.data.Dataset.to_dask

.. automethod:: ray.data.Dataset.to_mars

.. automethod:: ray.data.Dataset.to_modin

.. automethod:: ray.data.Dataset.to_spark

.. automethod:: ray.data.Dataset.to_pandas

.. automethod:: ray.data.Dataset.to_pandas_refs

.. automethod:: ray.data.Dataset.to_numpy_refs

.. automethod:: ray.data.Dataset.to_arrow_refs

.. automethod:: ray.data.Dataset.to_random_access_dataset

Inspecting Metadata
-------------------

.. automethod:: ray.data.Dataset.count

.. automethod:: ray.data.Dataset.schema

.. automethod:: ray.data.Dataset.num_blocks

.. automethod:: ray.data.Dataset.size_bytes

.. automethod:: ray.data.Dataset.input_files

.. automethod:: ray.data.Dataset.stats

.. automethod:: ray.data.Dataset.get_internal_block_refs

Execution
---------

.. automethod:: ray.data.Dataset.fully_executed

.. automethod:: ray.data.Dataset.is_fully_executed

.. automethod:: ray.data.Dataset.lazy

Serialization
-------------

.. automethod:: ray.data.Dataset.has_serializable_lineage

.. automethod:: ray.data.Dataset.serialize_lineage

.. automethod:: ray.data.Dataset.deserialize_lineage
7 changes: 7 additions & 0 deletions doc/source/data/api/dataset_context.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
.. _dataset-context-api:

DatasetContext API
==================

.. autoclass:: ray.data.context.DatasetContext
:members:
Loading