Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating BIDS validator and schema to contemporary upstream equivalent #1050

Merged
merged 30 commits into from
Jul 28, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
a13418a
Upstream validation features up to and including:
TheChymera Jul 8, 2022
46988a3
Schema update
TheChymera Jul 8, 2022
8405a99
Reverting to DANDI fork of bids-examples
TheChymera Jul 8, 2022
fc5a720
Docstring fix
TheChymera Jul 8, 2022
4baf689
Record BIDS schema version, analogous to:
TheChymera Jul 8, 2022
34a2af2
Logging to appdirs when BIDS validation is called via DANDI wrapper
TheChymera Jul 9, 2022
a67c91e
Moved external modules to dandi/support/*/
TheChymera Jul 12, 2022
cff8c89
formatting changes to stay in sync with upstream BIDS
TheChymera Jul 12, 2022
2ed8929
Corrected help text
TheChymera Jul 14, 2022
73a2061
Updated report docstring in wrapper
TheChymera Jul 19, 2022
c367e05
verify report file creation
TheChymera Jul 19, 2022
b514d8a
Moved bash usage example to CLI function
TheChymera Jul 19, 2022
24038e8
Formatting
TheChymera Jul 19, 2022
2e53449
Better windows support
TheChymera Jul 19, 2022
d4640bf
Improved windows support
TheChymera Jul 20, 2022
d35e5f2
Dropped debugging print line
TheChymera Jul 20, 2022
cd0b36c
Nicer conditional for recursion
TheChymera Jul 20, 2022
a444368
Removed deprecated parameter
TheChymera Jul 20, 2022
4d4a7cf
Reinstated option for report path specification
TheChymera Jul 20, 2022
3bcc542
code style improvement
TheChymera Jul 22, 2022
140a7cf
style improvements
TheChymera Jul 22, 2022
335bf62
Merge branch 'bids_update' of github.com:dandi/dandi-cli into bids_up…
TheChymera Jul 22, 2022
b582485
Improved windows support
TheChymera Jul 22, 2022
02a6e95
Using pytest's tmp_path
TheChymera Jul 22, 2022
8c1dbcc
Updated library name
TheChymera Jul 22, 2022
41fa4ac
removed unused default values from function wrapped with click
TheChymera Jul 22, 2022
75e8fc3
Simplified directory exclusion logic
TheChymera Jul 26, 2022
ae6970a
Merge branch 'bids_update' of github.com:dandi/dandi-cli into bids_up…
TheChymera Jul 26, 2022
3f43905
Typo
TheChymera Jul 26, 2022
468e39c
Do not index top-level files from pseudofile directories
TheChymera Jul 27, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion dandi/bids_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ def is_valid(
Parameters
----------
validation_result: dict
Dictionary as returned by `dandi.bids_validator_xs.validate_bids()`.
Dictionary as returned by `dandi.support.bids.validator.validate_bids()`.
allow_missing_files: bool, optional
Whether to consider the dataset invalid if any mandatory files are not present.
allow_invalid_filenames: bool, optional
Expand Down
31 changes: 21 additions & 10 deletions dandi/cli/cmd_validate.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,33 +7,44 @@


@click.command()
@devel_option(
@click.option(
"--schema", help="Validate against new BIDS schema version", metavar="VERSION"
)
@click.option("--report", help="Specify path to write a report under.")
@click.option(
"--report-flag",
"--report-path",
help="Write report under path, this option implies `--report/-r`.",
)
@click.option(
"--report",
"-r",
is_flag=True,
help="Whether to write a report under a unique path in the current directory. "
"Only usable if `--report` is not already used.",
help="Whether to write a report under a unique path in the DANDI log directory.",
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we are breaking CLI ... why not just to keep --report to be an option to provide the path and if not specified, assume that no report writing was requested - print to the screen?
Note: I might not be even able to write to current directory (dataset might be owned by someone else).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

functionality wasn't broken, help text was just incorrect.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You removed --report-flag and made --report from taking a string (path) into being is_flag -- so, it got broken that user no longer would be able to say --report myvalidation.log. And I think it is worth removing somewhat odd --report-flag so ok to break but we better "make it right" this time and avoid future breakage.
If you really want to be fancy and support also bool like behavior, I guess you would need to make it nargs='?' (if click supports it, @jwodder could help) and then if not report - assign that default path to the log, so user could do both --report and --report mylog.log.
Alternative -- not bother with "bool" like behavior in CLI, and just make it demand path string.

NB: please do not mark such comments Resolved - let original Author decide if they were resolved or not since it makes it harder for a reviewer to locate prior comments while re-reviewing and see if prior concerns were addressed.

@click.argument("paths", nargs=-1, type=click.Path(exists=True, dir_okay=True))
@devel_debug_option()
@map_to_click_exceptions
def validate_bids(
paths, schema=None, devel_debug=False, report=False, report_flag=False
paths,
schema,
report,
report_path,
devel_debug=False,
):
"""Validate BIDS paths."""
"""Validate BIDS paths.

Notes
-----
Used from bash, eg:
dandi validate-bids /my/path
"""

from ..bids_utils import is_valid, report_errors
from ..validate import validate_bids as validate_bids_

if report_flag and not report:
report = report_flag

validator_result = validate_bids_(
*paths,
report=report,
report_path=report_path,
schema_version=schema,
devel_debug=devel_debug,
)
Expand Down
2 changes: 1 addition & 1 deletion dandi/metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ def get_metadata(path: Union[str, Path]) -> Optional[dict]:
# could still be augmented with `_is_nwb` to disambiguate both cases
# at the detection level.
if _path_in_bids(path):
from .bids_validator_xs import validate_bids
from .validate import validate_bids

_meta = validate_bids(path)
meta = _meta["match_listing"][0]
Expand Down
11 changes: 0 additions & 11 deletions dandi/support/bids/schemadata/1.7.0+012/rules/associated_data.yaml

This file was deleted.

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# BIDS-schema

Portions of the BIDS specification are defined using YAML files, in order to
Portions of the BIDS specification are defined using YAML files in order to
make the specification machine-readable.

Currently, the portions of the specification that rely on this schema are
Currently the portions of the specification that rely on this schema are
the entity tables, entity definitions, filename templates, and metadata tables.
Any changes to the specification should be mirrored in the schema.

Expand Down Expand Up @@ -32,14 +32,14 @@ The types of objects currently supported in the schema are:
- suffixes,
- metadata,
- top-level files,
- and non-BIDS associated folders.
- and non-BIDS associated directories.

Each of these object types has a single file in the `objects/` folder.
Each of these object types has a single file in the `objects/` directory.

- `modalities.yaml`: The modalities, or types of technology, used to acquire data in a BIDS dataset.
These modalities are not reflected directly in the specification.
For example, while both fMRI and DWI data are acquired with an MRI,
in a BIDS dataset they are stored in different folders reflecting the two different `datatypes`.
in a BIDS dataset they are stored in different directories reflecting the two different `datatypes`.

- `datatypes.yaml`: Data types supported by the specification.
The only information provided in the file is:
Expand All @@ -48,7 +48,7 @@ Each of these object types has a single file in the `objects/` folder.
1. each datatype's full name
1. a free text description of the datatype.

- `entities.yaml`: Entities (key/value pairs in folder and filenames).
- `entities.yaml`: Entities (key-value pairs in directory and filenames).

- `metadata.yaml`: All valid metadata fields that are explicitly supported in BIDS sidecar JSON files.

Expand All @@ -58,7 +58,7 @@ Each of these object types has a single file in the `objects/` folder.

- `top_level_files.yaml`: Valid top-level files which may appear in a BIDS dataset.

- `associated_data.yaml`: Folders that may appear within a dataset folder without following BIDS rules.
- `associated_data.yaml`: Directories that may appear within a dataset directory without following BIDS rules.

### On re-used objects with different definitions

Expand All @@ -73,7 +73,7 @@ For objects with `snake_case` names, two underscores must be used.
There should also be a comment near the object definition in the YAML file describing the nature of the different objects.

For example, the TSV column `"reference"` means different things when used for EEG data, as compared to iEEG data.
As such, there are two definitions in `columns.yaml` for the `"reference"` column: `"reference__eeg"` and `"reference_ieeg"`.
As such, there are two definitions in `columns.yaml` for the `"reference"` column: `"reference__eeg"` and `"reference__ieeg"`.

```yaml
# reference column for channels.tsv files for EEG data
Expand Down Expand Up @@ -115,15 +115,15 @@ The `description` field is a freeform description of the modality.
### `datatypes.yaml`

This file contains a dictionary in which each datatype is defined.
Keys are the folder names associated with each datatype (for example, `anat` for anatomical MRI),
Keys are the directory names associated with each datatype (for example, `anat` for anatomical MRI),
and each associated value is a dictionary with two keys: `name` and `description`.

The `name` field is the full name of the datatype.
The `description` field is a freeform description of the datatype.

### `entities.yaml`

This file contains a dictionary in which each entity (key/value pair in filenames) is defined.
This file contains a dictionary in which each entity (key-value pair in filenames) is defined.
Keys are long-form versions of the entities, which are distinct from both the entities as
they appear in filenames _and_ their full names.
For example, the key for the "Contrast Enhancing Agent" entity, which appears in filenames as `ce-<label>`,
Expand Down Expand Up @@ -155,11 +155,11 @@ The `format` field defines the specific format the value should take.
Entities are broadly divided into either `label` or `index` types.

When `format` is `index`, then the entity's associated value should be a non-zero integer, optionally with leading zeros.
For example, `run` should have an index, so a valid key-value pair in a filename would be `run-01`.
For example, `run` should have an index, so a valid entity would be `run-01`.

When `format` is `label`, then the value should be an alphanumeric string.
Beyond limitations on which characters are allowed, labels have few restrictions.
For example, `acq` should have a label, so a valid key-value pair might be `acq-someLabel`.
For example, `acq` should have a label, so a valid entity might be `acq-someLabel`.

For a small number of entities, only certain labels are allowed.
In those cases, instead of a `format` field, there will be an `enum` field, which will provide a list of allowed values.
Expand Down Expand Up @@ -218,7 +218,7 @@ There are additional fields which may define rules that apply to a given type.

- `dataset_relative` (relative paths from dataset root),

- `participant_relative` (relative paths from participant folder).
- `participant_relative` (relative paths from participant directory).

- `enum` defines a list of valid values for the field.
The minimum string length (`minLength`) defaults to 1.
Expand Down Expand Up @@ -269,7 +269,7 @@ There are additional fields which may define rules that apply to a given type.
- `object`: If `type` is `object`, then there MAY be any of the following
fields at the same level as `type`: `additionalProperties`,
`properties`.
Objects are defined as sets of key/value pairs.
Objects are defined as sets of key-value pairs.
Keys MUST be strings, while values may have specific attributes,
which is what `additionalProperties` describes.
Here is an example of a field which MUST be an object,
Expand Down Expand Up @@ -388,29 +388,29 @@ The `description` field is a freeform description of the file.

### `associated_data.yaml`

This file contains a dictionary in which each non-BIDS folder is defined.
Keys are folder names, and each associated value is a dictionary with two keys: `name` and `description`.
This file contains a dictionary in which each non-BIDS directory is defined.
Keys are directory names, and each associated value is a dictionary with two keys: `name` and `description`.

The `name` field is the full name of the folder.
The `description` field is a freeform description of the folder.
The `name` field is the full name of the directory.
The `description` field is a freeform description of the directory.

## Rule files

The files in the `rules/` folder are less standardized than the files in `objects/`,
The files in the `rules/` directory are less standardized than the files in `objects/`,
because rules governing how different object types interact in a valid dataset are more variable
than the object definitions.

- `modalities.yaml`: This file simply groups `datatypes` under their associated modality.

- `datatypes/*.yaml`: Files in the `datatypes` folder contain information about valid filenames within a given datatype.
- `datatypes/*.yaml`: Files in the `datatypes` directory contain information about valid filenames within a given datatype.
Specifically, each datatype's YAML file contains a list of dictionaries.
Each dictionary contains a list of suffixes, entities, and file extensions which may constitute a valid BIDS filename.

- `entities.yaml`: This file simply defines the order in which entities, when present, MUST appear in filenames.

- `top_level_files.yaml`: Requirement levels and valid file extensions of top-level files.

- `associated_data.yaml`: Requirement levels of associated non-BIDS folders.
- `associated_data.yaml`: Requirement levels of associated non-BIDS directories.

### `modalities.yaml`

Expand All @@ -419,7 +419,7 @@ The `datatypes` dictionary contains a list of datatypes that fall under that mod

### `datatypes/*.yaml`

The files in this folder are currently the least standardized of any part of the schema.
The files in this directory are currently the least standardized of any part of the schema.

Each file corresponds to a single `datatype`.
Within the file is a list of dictionaries.
Expand Down Expand Up @@ -496,5 +496,24 @@ In cases where there is a data file and a metadata file, the `.json` extension f

### `associated_data.yaml`

This file contains a dictionary in which each key is a folder and the value is a dictionary with one key: `required`.
The `required` entry contains a boolean value to indicate if that folder is required for BIDS datasets or not.
This file contains a dictionary in which each key is a directory and the value is a dictionary with one key: `required`.
The `required` entry contains a boolean value to indicate if that directory is required for BIDS datasets or not.

## Version of the schema

File `SCHEMA_VERSION` in the top of the directory contains a semantic
version (`MAJOR.MINOR.PATCH`) for the schema (how it is organized).
Note that while in `0.` series, breaking changes are
permitted without changing the `MAJOR` (leading) component of the version.
Going forward, the 2nd, `MINOR` indicator should be
incremented whenever schema organization introduces "breaking changes":
changes which would cause existing tools reading schema to
adjust their code to be able to read it again.
Additions of new components to the schema should increment the last,
`PATCH`, component of the version so that tools could selectively
enable/disable loading specific components of the schema.
With the release of `1.0.0` version of the schema,
we expect that the `MAJOR` component
will be incremented whenever schema organization introduces "breaking changes",
`MINOR` - when adding new components to the schema,
and `PATCH` - when fixing errors in existing components.
1 change: 1 addition & 0 deletions dandi/support/bids/schemadata/1.7.0+369/SCHEMA_VERSION
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
0.3.0
Loading