Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating BIDS validator and schema to contemporary upstream equivalent #1050

Merged
merged 30 commits into from
Jul 28, 2022

Conversation

TheChymera
Copy link
Contributor

@TheChymera TheChymera commented Jul 8, 2022

Brought our code back in sync with upstream BIDS, as we finally have upstream ome.zarr support and there is no longer any reason to maintain a fork of the specification.
This also fixes: #1037

Ideally this will be the last update of bundled code as we are working on importing the built-in Python validator as a library.
Should the need for specification forking arise in the future, this should best be handled via the parameterized schema directory.

Draft for now as there might be inconsistencies I haven't spotted.

@codecov
Copy link

codecov bot commented Jul 8, 2022

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.48%. Comparing base (50ca7b0) to head (468e39c).
Report is 1023 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1050      +/-   ##
==========================================
+ Coverage   88.41%   88.48%   +0.07%     
==========================================
  Files          72       73       +1     
  Lines        9251     9291      +40     
==========================================
+ Hits         8179     8221      +42     
+ Misses       1072     1070       -2     
Flag Coverage Δ
unittests 88.48% <100.00%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@TheChymera
Copy link
Contributor Author

Additionally, we might want to start using the reference bids-examples repository for validation (rather than our own fork).
To be able to perform all relevant tests, this would require having at least one broken dataset in the examples, so that we can verify errors as well: bids-standard/bids-examples#327

In any case, using vanilla upstream would add another ~10s to the BIDS validator tests; as our fork had stripped a lot of the datasets. While we will prospectively no longer run the validator tests themselves (these would be run by the bids-schemacode package), we still need BIDS example data to construct our derivative BIDS datasets for higher-level function testing...

Any ideas if we can somehow check out only some of the files to reduce download time? @yarikoptic

@TheChymera TheChymera marked this pull request as ready for review July 8, 2022 06:22
@TheChymera TheChymera marked this pull request as draft July 9, 2022 00:55
@TheChymera TheChymera force-pushed the bids_update branch 2 times, most recently from 45b386f to 036e41d Compare July 9, 2022 01:01
@lgtm-com
Copy link

lgtm-com bot commented Jul 9, 2022

This pull request introduces 1 alert when merging 4257ef8 into 50ca7b0 - view on LGTM.com

new alerts:

  • 1 for Unused local variable

@TheChymera TheChymera marked this pull request as ready for review July 14, 2022 09:23
Copy link
Member

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only started, need to switch, please attend to --report option meanwhile

"-r",
is_flag=True,
help="Whether to write a report under a unique path in the current directory. "
"Only usable if `--report` is not already used.",
help="Whether to write a report under a unique path in the current directory. ",
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we are breaking CLI ... why not just to keep --report to be an option to provide the path and if not specified, assume that no report writing was requested - print to the screen?
Note: I might not be even able to write to current directory (dataset might be owned by someone else).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

functionality wasn't broken, help text was just incorrect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You removed --report-flag and made --report from taking a string (path) into being is_flag -- so, it got broken that user no longer would be able to say --report myvalidation.log. And I think it is worth removing somewhat odd --report-flag so ok to break but we better "make it right" this time and avoid future breakage.
If you really want to be fancy and support also bool like behavior, I guess you would need to make it nargs='?' (if click supports it, @jwodder could help) and then if not report - assign that default path to the log, so user could do both --report and --report mylog.log.
Alternative -- not bother with "bool" like behavior in CLI, and just make it demand path string.

NB: please do not mark such comments Resolved - let original Author decide if they were resolved or not since it makes it harder for a reviewer to locate prior comments while re-reviewing and see if prior concerns were addressed.

Copy link
Member

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--report handling still needs to be fixed and tested better

dandi/validate.py Outdated Show resolved Hide resolved
dandi/validate.py Outdated Show resolved Hide resolved
dandi/validate.py Outdated Show resolved Hide resolved
"-r",
is_flag=True,
help="Whether to write a report under a unique path in the current directory. "
"Only usable if `--report` is not already used.",
help="Whether to write a report under a unique path in the current directory. ",
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You removed --report-flag and made --report from taking a string (path) into being is_flag -- so, it got broken that user no longer would be able to say --report myvalidation.log. And I think it is worth removing somewhat odd --report-flag so ok to break but we better "make it right" this time and avoid future breakage.
If you really want to be fancy and support also bool like behavior, I guess you would need to make it nargs='?' (if click supports it, @jwodder could help) and then if not report - assign that default path to the log, so user could do both --report and --report mylog.log.
Alternative -- not bother with "bool" like behavior in CLI, and just make it demand path string.

NB: please do not mark such comments Resolved - let original Author decide if they were resolved or not since it makes it harder for a reviewer to locate prior comments while re-reviewing and see if prior concerns were addressed.

@TheChymera
Copy link
Contributor Author

@yarikoptic

so, it got broken that user no longer would be able to say --report myvalidation.log

It didn't get broken, the feature was simply removed. I added it initially because I thought it would be helpful, and I decided that in fact it is not. The validation() function which I understand we want to merge BIDS into at some point does not have such a parameter, and since DANDI doesn't manage logging this way I see no reason to do it for this function specifically...

@TheChymera TheChymera marked this pull request as ready for review July 20, 2022 21:51
dandi/support/bids/validator.py Outdated Show resolved Hide resolved
dandi/support/bids/validator.py Outdated Show resolved Hide resolved
dandi/support/bids/validator.py Outdated Show resolved Hide resolved
dandi/support/bids/validator.py Outdated Show resolved Hide resolved
dandi/cli/cmd_validate.py Outdated Show resolved Hide resolved
dandi/support/bids/validator.py Outdated Show resolved Hide resolved
@@ -453,7 +467,7 @@ def validate_all(

def write_report(
validation_result,
report_path="{logdir}/bids-validator-report_{datetime}-{pid}.log",
report_path="/var/tmp/bids-validator/report_{datetime}-{pid}.log",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest still using a variable for the log base directory, but set it to $TMPDIR (falling back to /tmp) by default.

Copy link
Contributor Author

@TheChymera TheChymera Jul 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for upstream usage, we wrap this command and pass the DANDI log directory via appdirs. /tmp/ is tricky as it's ephemeral on some systems, and users might not know how their system handles it. /var/tmp is the safer bet.

Compare:
https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch05s15.html
to
https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch03s18.html
Particularly:

Programs must not assume that any files or directories in /tmp are preserved between invocations of the program.

is pretty much a non-starter for logging which might need to be consulted over a longer period.

def validate_bids(
bids_paths,
schema_reference_root="{module_path}/support/bids/schemadata/",
schema_reference_root="/usr/share/bids-schema/",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under what conditions is this default used, and why would we expect anything to be in this directory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the DANDI use case this is never used, but this is bundled code from upstream, which we will soon no longer track but import from a python dependency. Prospectively, the dependency would find the BIDS schema index in that directory, as that is where architecture-independent read-only user data belongs as per the FHS.

dandi/tests/test_validate.py Outdated Show resolved Hide resolved
@TheChymera TheChymera requested a review from jwodder July 22, 2022 21:22
dandi/support/bids/validator.py Show resolved Hide resolved
dandi/support/bids/validator.py Outdated Show resolved Hide resolved
@jwodder jwodder added the BIDS label Jul 26, 2022
@TheChymera
Copy link
Contributor Author

@jwodder thanks for your input :3 good to go now?

@yarikoptic
Copy link
Member

ok, I think that main discussions/questions were addressed/resolved. Let's proceed and see where it takes us. Thank you @TheChymera !

@yarikoptic yarikoptic merged commit 3655aa1 into master Jul 28, 2022
@yarikoptic yarikoptic deleted the bids_update branch July 28, 2022 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dandi validate-bids crashes with IndexError: list index out of range
3 participants