-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
validator erroneously catches non-vector experimenter and related_publications #1090
Comments
Yes and that is how it is supposed to be or? The schema is the reference. |
Yes, I was half explaining the divergent behavior for others and half wondering what everyone thought about this. It seems non-ideal to throw validation errors for older files that were according to spec when they were created and can be read perfectly well in pynwb and matnwb. |
We should be careful here not to mix backward compatibility of the API with backward compatibility of the schema. Testing whether the API can open a file or whether it is compliant with the latest schema are two separate questions. For the validator, I think we have to take the schema as ground truth for validation. If we have a 2.1 file, then we should validate against the 2.1 schema. In that sense, we should check the version of the file and the schema we are validating against and issue a warning if they don't match, to let the users know that validation against different versions may result in warnings due to changes in the schema. |
@oruebel I agree these are two separate issues. While we aim for the API to support all NWB 2.0+ data, it could be important for a user to know precisely which version of the schema their file is valid against. I could see tool support for NWB where a tool would specify what versions of the NWB schema it supports (like pip versions). In that case, I could imagine a user wanting to know the validity of a file against a number of schemas, not just the latest one. What should be the workflow for this? Should the validator provide a way to specify a version of the schema to validate against? |
Just to help me understand our support: So every NWB file created after that should have the schema it was written with by default. And the validation tool uses the shipped schema if present. So if the schema changes in an incompatible way old nwb files with included schema still validate against the latest pynwb/hdmf/schema version or? And pynwb 2.x can read all NWB data of version 2.x? |
Correct and MatNWB also implements this now.
Correct.
I'm not sure what the default behavior of the validator is right now, i.e., whether it uses the schema form the file or from the version installed with PyNWB, but I agree that the default behavior should be to validate against the schema in the file.
Yes. One minor detail, PyNWB and the schema are versioned separately. I.e, PyNWB 1.x will be able to read any 2.x file as long as 2.x is less or equal to the schema version that PyNWB ships with.
If the schema changes in an incompatible way then (at least currently) this means that the old file will validate against the schema version it was generated with, but not necessarily against the latest schema as the incompatible changes would be seen as errors. |
Yes, we cache the schema by default, and recommend that users keep that default, but do not require it, so unless we want to make that required we'll need to handle the case that the schema is not cached. We also need to account for files that were written before caching the schema as an option. |
I did that change with the Andrew's help. |
Currently, I believe the approach for this would be that the user would have to check out the appropriate version of the schema from the nwb-schema repo (if the schema is not cached) to use for validation (or install the corresponding version of PyNWB). Alternatively, we'd need to ship all releases of the schema with PyNWB. |
@oruebel How does the user know which schema version to fetch? |
That is in the nwb_version attribute of the file |
We can access that using h5py or potentially h5dump, but I couldn't find a way to access nwb_version using pynwb |
@oruebel Good. So in principle we could add support for the validator to fetch the appropriate schema from github and use that for validation. I'm not volunteering though ;) |
FWIW that was my non-verbalized in #1091 idea on how to deal with multiple version of a schema. Since they are largely identical text files, adding them directly into git should compress them nicely. I don't think you would want to breed submodules per each version. |
The submodule is used for active development, i.e., to develop the schema and API in conjunction. Prior versions of the schema that have been released are not changing as part of the active development. Another question is, what do we need prior versions of the schema for? Currently the focus in this issue is squarely on validation (which is probably the main use-case for this). But I think we are jumping ahead here a bit. First we need to resolve:
|
I am in favor of shipping prior versions of the schema with the API, even though it is only useful for validation. Doing so prevents the need for a validation tool to connect to the internet and for us to write tools to handle download and file management. The cost is that the file sizes would be slightly larger. The current schema is 82 KB (nwb.core) + 5 KB (hdmf.common). |
I think that's fine. I think it would be good to have that as part of either the setup or packaging process rather than as direct copies in the git repo. |
My 1c from an outsider: I would vote for the opposite (just to keep copies of different versions in git) since git would compress them efficiently and setup/packaging won't need to be unnecessarily overcomplicated and thus possibly more fragile. |
FWIW, for |
EDIT: wrong issue. |
Description
We recently changed the schema to accept a vector for experimenter and for related_publications. Our pynwb io maps were changed to accept either form, but our validator is based entirely on the schema and will output validation errors for older files that use strings instead of vectors of strings for these fields.
Steps to Reproduce
Environment
Checklist
The text was updated successfully, but these errors were encountered: