-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Measurements command #879
Measurements command #879
Conversation
Codecov Report
@@ Coverage Diff @@
## master #879 +/- ##
===========================================
- Coverage 59.36% 36.51% -22.85%
===========================================
Files 43 43
Lines 6014 7292 +1278
Branches 1539 2138 +599
===========================================
- Hits 3570 2663 -907
- Misses 2185 4474 +2289
+ Partials 259 155 -104
Continue to review full report at Codecov.
|
Adds a new Snakemake file that mirrors `Snakefile_WHO` and adds logic to build measurements JSONs for HA-based builds with the new `augur measurements` command [1]. The current prototype only works for H3N2. [1] nextstrain/augur#879
3a4dc98
to
5010833
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice work, @joverlee521! This is probably the most thoroughly documented and tested a new Augur subcommand has ever been. The implementation is generally quite clear and easy to follow. The API docs and tests go a long way toward demonstrating how to use this new command. We might eventually want a how-to guide or tutorial for using this, but that's better for a later PR.
Although I've requested changes, my comments below are primarily stylistic. The exception are the comments about how functions are defined in the new measurements.py
module (which are more about function than style). We can chat about these comments more synchronously with anyone else who has feedback on the command.
Updates lists of valid panels in Auspice config JSON schema, Auspice JSON v2 schema, and the export v2 command's argparse choices. These changes allow users to add "measurements" to the list of panels in an Auspice config JSON or the command line.
A schema for the measurements collection config JSON to be supplied to `augur measurements export`. Basically a copy of the config properties for a collection within the measurements schema. In the future, we could look into using `jsonschema.RefResolver` to use refs that refer to a separate schema file.
Validates the provided measurements collection config JSON against the JSON schema `augur/data/schema-measurements-collection-config.json`
Adds validations for constraints on the values within measurements and collection config JSONs that cannot be checked via JSON schema validation. These include: 1. A collection's fields, groupings, and filters are valid fields in the collection's measurements. 2. A collection's display default group-by is included in the groupings 3. All collections within measurements JSON have unique keys 4. The default collection value is a valid key that matches one of the the collections
The `augur measurements export` subcommand creates the measurements JSON for a single collection of measurements provided in a TSV. The most basic measurements export command takes the command-line options for the required fields to create the minimal measurements JSON.
Add ability to provide config options via a collection config JSON or via command line args. The config JSON includes all available configs but there are only command line args for configs that have 1:1 options. Nested options such as fields' titles and groupings' orders have been excluded to reduce complexity of command line args. Similar to the auspice export command, the command line args will override the values of the config JSON.
The `augur measurements concat` subcommand concatentates multiple measurements JSONs into a single measurements JSON. Depends on the measurements validation to verify each measurements JSON is valid and the final produced measurments JSON is valid.
1. Pass long strings as multiple strings to `print()` and let it handle the formatting of the output with a default space separator so we don't have think about it. 2. Standardizes error messages by removing '«»' in favor of Python's built-in `repr()` formatting with `!r` in f-strings.
1. consistently capitalize "Auspice" 2. fully spell out "Concatenate" 3. add generic description for `--threshold` option
A subset of optional args (currently just `title` and `x-axis-label`) need a default value for Auspice but we also do not want to overwrite the config file with the argument default values every time. Use a global dict `DEFAULT_ARGS` to hold these default values instead of the argparse `default` argument.
Seems like a handy argparse action that can be used by other modules. Suggested by @huddlej in review
Only write stdout to `/dev/null` so that we can track the expected error messages in the Cram tests.
Suggested by @huddlej in review. The error printed within the function seems like a side effect. The code would be clearer if the error printing is handled directly with the boolean check.
If any of the groupings provided by the user does not exist as a column, then exit command with an error message to ensure that these are not ignored by the user. If any defined grouping does not exist, then it might indicate a greater error such as passing the wrong collection TSV.
Makes it clearer that these functions will read and return the file contents. Similar to other `io` functions in Augur, the validation of the file structure is an expected side effect.
Renamed argument group variables for export as well to differentiate the argument groups for each subcommand.
Follow the pattern used by most Augur subcommands, which uses the argparse.Namespace object in their "main" logic
Allows users to specify a list of columns from their collection TSV to include in the output measurements JSON. All columns will be included by default if option is not provided. The 'strain' and 'value' columns do not have to be included in the list since these are required columns. However, other configuration columns (e.g. groupings) will need to be explicitly included in the list if users are using the option. This ensures that we do not give unexpected outputs by auto-including grouping columns. Includes new functional test for the new option.
The measurements sub-subcommands are very straightforward that is seems to add unnecessary complexity by breaking out into functions that are mainly used to raise errors that then have to be caught by the "main" function.
fc70c8b
to
8fbdc82
Compare
Rebased on master and fixed up some small fix commits. I'll merge after all of the CI tests have passed. |
@joverlee521 Looks great! Let's merge this, release it, and start using it in our refactored flu builds. 🚀 |
Description of proposed changes
Adds the
measurements
subcommand with two sub-subcommands,export
andconcat
.See commits for details.
Related issue(s)
Based on proposal in #869
Testing
Added unit tests for the changes in the
validate
subcommand.Added functional tests for
measurements export
andmeasurements concat
.