diff --git a/docs/gallery/advanced_io/streaming.py b/docs/gallery/advanced_io/streaming.py index 4b03a9f2c..23c291db6 100644 --- a/docs/gallery/advanced_io/streaming.py +++ b/docs/gallery/advanced_io/streaming.py @@ -11,7 +11,7 @@ using the dandi API library. Getting the location of the file on DANDI -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +----------------------------------------- The :py:class:`~dandi.dandiapi.DandiAPIClient` can be used to get the S3 URL of any NWB file stored in the DANDI Archive. If you have not already, install the latest release of the ``dandi`` package. @@ -34,36 +34,8 @@ s3_url = asset.get_content_url(follow_redirects=1, strip_query=True) -Streaming Method 1: ROS3 -~~~~~~~~~~~~~~~~~~~~~~~~ -ROS3 is one of the supported methods for reading data from a remote store. ROS3 stands for "read only S3" and is a -driver created by the HDF5 Group that allows HDF5 to read HDF5 files stored remotely in s3 buckets. Using this method -requires that your HDF5 library is installed with the ROS3 driver enabled. This is not the default configuration, -so you will need to make sure you install the right version of ``h5py`` that has this advanced configuration enabled. -You can install HDF5 with the ROS3 driver from `conda-forge `_ using ``conda``. You may -first need to uninstall a currently installed version of ``h5py``. - -.. code-block:: bash - - pip uninstall h5py - conda install -c conda-forge "h5py>=3.2" - -Now instantiate a :py:class:`~pynwb.NWBHDF5IO` object with the S3 URL and specify the driver as "ros3". This -will download metadata about the file from the S3 bucket to memory. The values of datasets are accessed lazily, -just like when reading an NWB file stored locally. So, slicing into a dataset will require additional time to -download the sliced data (and only the sliced data) to memory. - -.. code-block:: python - - from pynwb import NWBHDF5IO - - with NWBHDF5IO(s3_url, mode='r', load_namespaces=True, driver='ros3') as io: - nwbfile = io.read() - print(nwbfile) - print(nwbfile.acquisition['lick_times'].time_series['lick_left_times'].data[:]) - -Streaming Method 2: fsspec -~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Streaming Method 1: fsspec +-------------------------- fsspec is another data streaming approach that is quite flexible and has several performance advantages. This library creates a virtual filesystem for remote stores. With this approach, a virtual file is created for the file and the virtual filesystem layer takes care of requesting data from the S3 bucket whenever data is @@ -113,6 +85,49 @@ The S3 backend, in particular, may provide additional functionality for accessing data on DANDI. See the `fsspec documentation on known implementations `_ for a full updated list of supported store formats. + +Streaming Method 2: ROS3 +------------------------ +ROS3 is one of the supported methods for reading data from a remote store. ROS3 stands for "read only S3" and is a +driver created by the HDF5 Group that allows HDF5 to read HDF5 files stored remotely in s3 buckets. Using this method +requires that your HDF5 library is installed with the ROS3 driver enabled. This is not the default configuration, +so you will need to make sure you install the right version of ``h5py`` that has this advanced configuration enabled. +You can install HDF5 with the ROS3 driver from `conda-forge `_ using ``conda``. You may +first need to uninstall a currently installed version of ``h5py``. + +.. code-block:: bash + + pip uninstall h5py + conda install -c conda-forge "h5py>=3.2" + +Now instantiate a :py:class:`~pynwb.NWBHDF5IO` object with the S3 URL and specify the driver as "ros3". This +will download metadata about the file from the S3 bucket to memory. The values of datasets are accessed lazily, +just like when reading an NWB file stored locally. So, slicing into a dataset will require additional time to +download the sliced data (and only the sliced data) to memory. + +.. code-block:: python + + from pynwb import NWBHDF5IO + + with NWBHDF5IO(s3_url, mode='r', load_namespaces=True, driver='ros3') as io: + nwbfile = io.read() + print(nwbfile) + print(nwbfile.acquisition['lick_times'].time_series['lick_left_times'].data[:]) + +Which streaming method to choose? +--------------------------------- + +fsspec has many advantages over ros3: + +1. fsspec is easier to install +2. fsspec supports caching, which will dramatically speed up repeated requests for the + same region of data +3. fsspec automatically retries when s3 fails to return. +4. fsspec works with other storage backends and +5. fsspec works with other types of files. +6. In our hands, fsspec is faster out-of-the-box. + +For these reasons, we would recommend use fsspec for most Python users. ''' # sphinx_gallery_thumbnail_path = 'figures/gallery_thumbnails_streaming.png'