Update iterative write and parallel I/O tutorial #1633

oruebel · 2023-01-11T01:44:07Z

Motivation

This PR is related to HDMF #623 and Fix #1514

Update the iterative write tutorial to:
- mention GenericDataChunkIterator and crosslink to the corresponding tutorial on HDMF
- use the new HDF5IO.dataset property to avoid having to close and reopen a file
- rename the tutorial to add the plot_ prefix to make sure outputs are captured directly from the tutorial rather than being harcoded in the tutorial
Update the parallel I/O tutorial to use HDF5IO to setup a dataset in a file rather than an empty DataChunkIterator
Update the Makefile for the docs to clean up files generated by the advanced_io tutorial

How to test the behavior?

Build the docs

Checklist

Did you update CHANGELOG.md with your changes?
Have you checked our Contributing document?
Have you ensured the PR clearly describes the problem and the solution?
Is your contribution compliant with our coding style? This can be checked running flake8 from the source directory.
Have you checked to ensure that there aren't other open Pull Requests for the same change?
Have you included the relevant issue number using "Fix #XXX" notation where XXX is the issue number? By including "Fix #XXX" you allow GitHub to close issue #XXX when the PR is merged.

…orial

…aChunkIterator to setup data for parallel write

codecov · 2023-01-11T01:47:32Z

Codecov Report

Merging #1633 (1251ed3) into dev (f4bbbd6) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##              dev    #1633   +/-   ##
=======================================
  Coverage   91.31%   91.31%           
=======================================
  Files          25       25           
  Lines        2534     2534           
  Branches      481      481           
=======================================
  Hits         2314     2314           
  Misses        139      139           
  Partials       81       81

Flag	Coverage Δ
integration	`70.44% <ø> (ø)`
unit	`84.37% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

oruebel · 2023-01-11T02:00:13Z

@CodyCBakerPhD while going through the iterative write tutorial to fix a few issues, I noticed that we were not discussing the GenericDataChunkIterator here. I added a few references to the corresponding tutorial, however, it would be nice if we could also show the usage of GenericDataChunkIterator here. I think we could just update the "Convert large binary data arrays" section of the tutorial to use GenericDataChunkIterator instead of DataChunkIterator. If you have time, could you maybe just quickly add those changes to this PR. I think it should be a fairly simple change, but I figured since you are the expert in GenericDataChunkIterator it would be best if you could make that change.

pynwb/docs/gallery/advanced_io/iterative_write.py

Lines 465 to 563 in f4bbbd6

    
           #################### 
        
           # Example: Convert large binary data arrays 
        
           # ----------------------------------------------------- 
        
           # 
        
           # When converting large data files, a typical problem is that it is often too expensive to load all the data 
        
           # into memory. This example is very similar to the data generator example only that instead of generating 
        
           # data on-the-fly in memory we are loading data from a file one-chunk-at-a-time in our generator. 
        
           # 
        
           #################### 
        
           # Create example data 
        
           # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 
        
           import numpy as np 
        
           # Create the test data 
        
           datashape = (100, 10)   # OK, this not really large, but we just want to show how it works 
        
           num_values = np.prod(datashape) 
        
           arrdata = np.arange(num_values).reshape(datashape) 
        
           # Write the test data to disk 
        
           temp = np.memmap('basic_sparse_iterwrite_testdata.npy', dtype='float64', mode='w+', shape=datashape) 
        
           temp[:] = arrdata 
        
           del temp  # Flush to disk 
        
           #################### 
        
           # Step 1: Create a generator for our array 
        
           # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 
        
           # 
        
           # Note, we here use a generator for simplicity but we could equally well also implement our own 
        
           # :py:class:`~hdmf.data_utils.AbstractDataChunkIterator`. 
        
           def iter_largearray(filename, shape, dtype='float64'): 
        
               """ 
        
               Generator reading [chunk_size, :] elements from our array in each iteration. 
        
               """ 
        
               for i in range(shape[0]): 
        
                   # Open the file and read the next chunk 
        
                   newfp = np.memmap(filename, dtype=dtype, mode='r', shape=shape) 
        
                   curr_data = newfp[i:(i + 1), ...][0] 
        
                   del newfp  # Reopen the file in each iterator to prevent accumulation of data in memory 
        
                   yield curr_data 
        
               return 
        
           #################### 
        
           # Step 2: Wrap the generator in a DataChunkIterator 
        
           # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 
        
           # 
        
           from hdmf.data_utils import DataChunkIterator 
        
           data = DataChunkIterator(data=iter_largearray(filename='basic_sparse_iterwrite_testdata.npy', 
        
                                                         shape=datashape), 
        
                                    maxshape=datashape, 
        
                                    buffer_size=10)   # Buffer 10 elements into a chunk, i.e., create chunks of shape (10,10) 
        
           #################### 
        
           # Step 3: Write the data as usual 
        
           # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 
        
           # 
        
           write_test_file(filename='basic_sparse_iterwrite_largearray.nwb', 
        
                           data=data) 
        
           #################### 
        
           # .. tip:: 
        
           # 
        
           #       Again, if we want to explicitly control how our data will be chunked (compressed etc.) 
        
           #       in the HDF5 file then we need to wrap our :py:class:`~hdmf.data_utils.DataChunkIterator` 
        
           #       using :py:class:`~hdmf.backends.hdf5.h5_utils.H5DataIO` 
        
           #################### 
        
           # Discussion 
        
           # ^^^^^^^^^^ 
        
           # Let's verify that our data was written correctly 
        
           # Read the NWB file 
        
           from pynwb import NWBHDF5IO  # noqa: F811 
        
           with NWBHDF5IO('basic_sparse_iterwrite_largearray.nwb', 'r') as io: 
        
               nwbfile = io.read() 
        
               data = nwbfile.get_acquisition('synthetic_timeseries').data 
        
               # Compare all the data values of our two arrays 
        
               data_match = np.all(arrdata == data[:])   # Don't do this for very large arrays! 
        
               # Print result message 
        
               if data_match: 
        
                   print("Success: All data values match") 
        
               else: 
        
                   print("ERROR: Mismatch between data") 
        
           #################### 
        
           # ``[Out]:`` 
        
           # 
        
           #  .. code-block:: python 
        
           # 
        
           #       Success: All data values match

docs/gallery/advanced_io/plot_iterative_write.py

oruebel · 2023-01-11T02:32:05Z

@rly thanks for the fixes

* Check nwb_version on read (#1612) * Added NWBHDF5IO.nwb_version property and check for version on NWBHDF5IO.read * Updated icephys tests to skip version check when writing non NWBFile container * Add tests for NWB version check on read * Add unit tests for NWBHDF5IO.nwb_version property * Updated changelog Co-authored-by: Ryan Ly <[email protected]> * Bump setuptools from 65.4.1 to 65.5.1 (#1614) Bumps [setuptools](https://github.com/pypa/setuptools) from 65.4.1 to 65.5.1. - [Release notes](https://github.com/pypa/setuptools/releases) - [Changelog](https://github.com/pypa/setuptools/blob/main/CHANGES.rst) - [Commits](pypa/setuptools@v65.4.1...v65.5.1) --- updated-dependencies: - dependency-name: setuptools dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * modify export.rst to have proper links to the NWBFile API docs (#1615) * Create project_action.yml (#1617) * Create project_action.yml * Update project_action.yml * Update project_action.yml * Update project_action.yml (#1620) * Update project_action.yml (#1623) * Project action (#1626) * Create project_action.yml * Update project_action.yml * Update project_action.yml * Update project_action.yml * Show recommended usaege for hdf5plugin in tutorial (#1630) * Show recommended usaege for hdf5plugin in tutorial * Update docs/gallery/advanced_io/h5dataio.py * Update docs/gallery/advanced_io/h5dataio.py Co-authored-by: Heberto Mayorquin <[email protected]> Co-authored-by: Ben Dichter <[email protected]> Co-authored-by: Heberto Mayorquin <[email protected]> * Update iterative write and parallel I/O tutorial (#1633) * Update iterative write tutorial * Update doc makefiles to clean up files created by the advanced io tutorial * Fix #1514 Update parallel I/O tutorial to use H5DataIO instead of DataChunkIterator to setup data for parallel write * Update changelog * Fix flake8 * Fix broken external links * Update make.bat * Update CHANGELOG.md * Update plot_iterative_write.py * Update docs/gallery/advanced_io/plot_iterative_write.py Co-authored-by: Ryan Ly <[email protected]> * Update project_action.yml (#1632) * nwb_schema_2.6.0 * Update CHANGELOG.md * remove Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Oliver Ruebel <[email protected]> Co-authored-by: Ryan Ly <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Ben Dichter <[email protected]> Co-authored-by: Heberto Mayorquin <[email protected]>

oruebel added 3 commits January 10, 2023 17:11

Update iterative write tutorial

84fd1dd

Update doc makefiles to clean up files created by the advanced io tut…

7ace649

…orial

Fix #1514 Update parallel I/O tutorial to use H5DataIO instead of Dat…

07ccd73

…aChunkIterator to setup data for parallel write

oruebel added 2 commits January 10, 2023 17:49

Update changelog

d4afa7e

Fix flake8

e1383a5

oruebel marked this pull request as ready for review January 11, 2023 01:54

Fix broken external links

d622109

oruebel requested review from CodyCBakerPhD and rly January 11, 2023 02:06

oruebel added category: enhancement improvements of code or code behavior priority: medium non-critical problem and/or affecting only a small set of NWB users topic: docs issues related to documentation labels Jan 11, 2023

oruebel added this to the Next Release milestone Jan 11, 2023

rly added 4 commits January 10, 2023 18:12

Update make.bat

578538e

Update CHANGELOG.md

5320ac3

Update plot_iterative_write.py

7fa1673

Update plot_iterative_write.py

32205de

rly reviewed Jan 11, 2023

View reviewed changes

docs/gallery/advanced_io/plot_iterative_write.py Outdated Show resolved Hide resolved

Update docs/gallery/advanced_io/plot_iterative_write.py

1251ed3

rly approved these changes Jan 11, 2023

View reviewed changes

oruebel merged commit 8395176 into dev Jan 11, 2023

oruebel deleted the update/iter_write_tutorial branch January 11, 2023 02:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update iterative write and parallel I/O tutorial #1633

Update iterative write and parallel I/O tutorial #1633

oruebel commented Jan 11, 2023 •

edited

Loading

codecov bot commented Jan 11, 2023 •

edited

Loading

oruebel commented Jan 11, 2023

oruebel commented Jan 11, 2023

Update iterative write and parallel I/O tutorial #1633

Update iterative write and parallel I/O tutorial #1633

Conversation

oruebel commented Jan 11, 2023 • edited Loading

Motivation

How to test the behavior?

Checklist

codecov bot commented Jan 11, 2023 • edited Loading

Codecov Report

oruebel commented Jan 11, 2023

oruebel commented Jan 11, 2023

oruebel commented Jan 11, 2023 •

edited

Loading

codecov bot commented Jan 11, 2023 •

edited

Loading