Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add time series check #1426

Merged
merged 10 commits into from
Feb 7, 2022
Merged

add time series check #1426

merged 10 commits into from
Feb 7, 2022

Conversation

bendichter
Copy link
Contributor

Check if the length of data is equal to the length of timestamps

* rmv s3 utils
* rmv corresponding tests
* update tutorial to use latest dandi tools instead
@bendichter bendichter requested a review from rly February 2, 2022 19:49
@rly
Copy link
Contributor

rly commented Feb 2, 2022

This approach is good for now.

@oruebel and I discussed adding a general validate or check method to hdmf.container.AbstractContainer that can be manually called by the user and automatically called by the BuildManager or ObjectMapper on write. By having these checks only in __init__ in this PR, users can still set or modify certain properties later and bypass those checks.

We can refactor these checks (as well as the other ones currently in various other __init__ methods) to the new validate method later on.

Comment on lines 151 to 155
if (
isinstance(kwargs["data"], (np.ndarray, list))
and isinstance(kwargs["timestamps"], (np.ndarray, list))
and not len(kwargs["data"]) == len(kwargs["timestamps"])
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these shape-based checks, I think it may be useful to use get_data_shape from HDMF. I think one advantage of using get_data_shape is that we don't need to deal with differences between lists, ndarray, h5py.Dataset, or DataChunkIterator etc. for each of the checks, but those details are handled in get_data_shape. Another benefit is that I think it would allow us to cover more cases than just the nd.array and list case that is covered by the current check, but it should also work for DataChunkIterator in the cases, e.g., when maxshape is set on the DataChunkIterator. Using the get_data_shape function, the check would I think look something like this:

from hdmf.utils import get_data_shape
...

data_shape = get_data_shape(data=kwargs["data"], strict_no_data_load=True)
timestamps_shape = get_data_shape(data=kwargs["data"], strict_no_data_load=True)
if( (data_shape is not None and timestamps_shape is not None)   # check that the shape is known
     and (len(data_shape) > 0 and len(timestamps_shape) > 0)   # check for scalars. Should never happen
     and (data_shape[0] is not None and timestaps_shape[0] is not None)  # check that the length of the first dimension is known
     and (data_shape[0] != timestamps_shape[0])  # check that the data and timestamps match
   )
       warn(...)

@oruebel
Copy link
Contributor

oruebel commented Feb 3, 2022

We can refactor these checks (as well as the other ones currently in various other __init__ methods) to the new validate method later on.

I agree. This approach is fine for now. I think the overhead to refactor this later should be reasonable.

@rly rly mentioned this pull request Feb 3, 2022
10 tasks
oruebel
oruebel previously approved these changes Feb 7, 2022
Copy link
Contributor

@oruebel oruebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

rly
rly previously approved these changes Feb 7, 2022
@rly
Copy link
Contributor

rly commented Feb 7, 2022

Please add an entry to the changelog and then we are good to merge.

@bendichter bendichter dismissed stale reviews from rly and oruebel via c81d238 February 7, 2022 19:28
@bendichter bendichter merged commit 21524b3 into dev Feb 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants