-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
filter: Add --empty-results-reporting={error,warn,skip} option #1175
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #1175 +/- ##
==========================================
+ Coverage 68.19% 68.25% +0.05%
==========================================
Files 63 63
Lines 6748 6772 +24
Branches 1654 1659 +5
==========================================
+ Hits 4602 4622 +20
- Misses 1838 1841 +3
- Partials 308 309 +1
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
augur/filter/__init__.py
Outdated
output_group.add_argument( | ||
'--empty-results-reporting', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This option doesn't fit under the "outputs" group:
outputs:
possible representations of filtered data (at least one required)
--output OUTPUT, --output-sequences OUTPUT, -o OUTPUT
filtered sequences in FASTA format (default: None)
--output-metadata OUTPUT_METADATA
metadata for strains that passed filters (default: None)
--output-strains OUTPUT_STRAINS
list of strains that passed filters (no header) (default: None)
--output-log OUTPUT_LOG
tab-delimited file with one row for each filtered strain and the reason it was filtered. Keyword arguments
used for a given filter are reported in JSON format in a `kwargs` column. (default: None)
--empty-results-reporting {error,warn,skip}
How should empty filter results be reported. (default: error)
Maybe it would fit better under the default "options" group?
options:
-h, --help show this help message and exit
--empty-results-reporting {error,warn,skip}
How should empty filter results be reported. (default: error)
output_group.add_argument( | |
'--empty-results-reporting', | |
parser.add_argument( | |
'--empty-results-reporting', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conceptually it sort of fits under output options, no? It's an option that controls what to do if output is empty.
Though maybe the option name should use "output" instead of "results", e.g. --empty-output-reporting
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would fit better under the default "options" group?
Hmm, I would look for options related to the outputs in the outputs
group.
I can reword the argument group description to make it clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though maybe the option name should use "output" instead of "results", e.g. --empty-output-reporting?
I specifically named the option with "results" since we have the --output-log
option, which is not taken into account by the --empty-results-reporting
option. That might just be me being nitpicky about words 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated description in force push:
outputs:
options related to outputs, at least one of the possible representations
of filtered data (--output, --output-metadata, --output-strains) is
required
--output OUTPUT, --output-sequences OUTPUT, -o OUTPUT
filtered sequences in FASTA format (default: None)
--output-metadata OUTPUT_METADATA
metadata for strains that passed filters (default:
None)
--output-strains OUTPUT_STRAINS
list of strains that passed filters (no header)
(default: None)
--output-log OUTPUT_LOG
tab-delimited file with one row for each filtered
strain and the reason it was filtered. Keyword
arguments used for a given filter are reported in JSON
format in a `kwargs` column. (default: None)
--empty-results-reporting {error,warn,skip}
How should empty filter results be reported. (default:
error)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I like --empty-results-reporting
over --empty-output-reporting
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-blocking
I think results vs. output is splitting hairs that and no one will be confused by it not applying to --output-log
. We also don't use "results" anywhere in the augur filter
help text and nearly nowhere else in Augur's user-facing interface, whereas we do use "output" everywhere for this concept.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the resolution, the rewording of
possible representations of filtered data (at least one required)
to
options related to outputs, at least one of the possible representations
of filtered data (--output, --output-metadata, --output-strains) is
required`
is great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also don't use "results" anywhere in the
augur filter
help text and nearly nowhere else in Augur's user-facing interface, whereas we do use "output" everywhere for this concept.
This is a good point. I'm on board with --empty-output-reporting
now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the all the feedback! I updated the option to --empty-output-reporting
and reworded the help message to be clear that the option applies to the filtered data:
--empty-output-reporting {error,warn,silent}
How should empty outputs be reported when no strains
pass filtering and/or subsampling. (default: error)
057db7b
to
f7d9e7d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for putting this together, @joverlee521! I confirmed that this new functionality allows the seasonal flu workflow to move past the step where it had previously crashed. In doing so, I realized that we've only kicked the can down the road a bit and now have to make augur frequencies
handle empty alignment input or modify the workflow to conditionally check for empty inputs. 😅
My only major request below is for a slightly expanded functional test. Other comments are mostly about naming of things and why we use enum
instead of a list of choices. I realize that the enum
pattern predates this PR, but I'm not sure why we'd prefer to use the enum class as an argument type here specifically.
augur/types.py
Outdated
Enum representation of string values that represent how validation should | ||
be handled. | ||
Enum representation of string values that represent how empty results should | ||
be reported. | ||
""" | ||
ERROR = 'error' | ||
WARN = 'warn' | ||
SKIP = 'skip' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a minor point, but "skip" doesn't quite match the context in augur filter like it did in the augur validation context. I'd think "ignore" might be a more appropriate verb for the empty filter results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. I'm also a stickler for wording context, but in this case I think it does match: the "empty results" are being ignored, but the "reporting" of them is being skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that "skip" matches "reporting", but I'm suggesting that "reporting" reflects the internals of the code and not what the user is asking augur filter to do.
As a user, I want to tell augur filter what to do (or how to handle the case) when there are no results in the output: throw an error, print a warning, or do nothing (i.e., ignore the empty output). Asking augur filter to "skip reporting empty results" doesn't quite match how I normally think about responding to errors and warnings (which would be to "ignore empty results"). There might be a better verb than "ignore" or "skip" that captures "do nothing" in a single word though...
By the same logic, asking augur export to "skip validation" matches what I want as a user (when I get the annoying augur version mismatch errors). I don't think I would ask augur export to "ignore validation" because I would want to avoid running validation altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, I'm thinking the opposite, where "handled" reflects internals of the code.
As a user, I'm telling augur filter how to report empty results to me. Report it as an error, a warning, or skip the report altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, well, your examples of what makes sense to you for empty results vs. validation seem inconsistent to me... 🙃 so I think we either have different connotations here or else we're accidentally talking past each other.
This is minor, and I'm happy to drop it in favor of whatever @joverlee521 and you think is best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be a better verb than "ignore" or "skip" that captures "do nothing" in a single word though...
We can continue the augur curate model and go with silent
😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, yeah, silent
could work, too. Maybe it's too minor to dwell on more. I'm happy with whatever you prefer, @joverlee521.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to silent
in the forced-push.
augur/filter/__init__.py
Outdated
type=EmptyResultsReportingMethod, | ||
choices=list(EmptyResultsReportingMethod), | ||
default=EmptyResultsReportingMethod.ERROR, | ||
help="How should empty filter results be reported.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-blocking
Related to my minor comment about using "ignore" instead of "skip", I slightly prefer "handled" to "reported" in the context of "how should empty filter results be...".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's precedent for --*-reporting
options in augur curate
commands.
The older head (BSD June 6, 1993) on older macOS (10.14.6) can potentially read too much of a file into the buffer.¹ In this particular test, the file is so small that the `head` command reads the entire file, which causes the test to fail. ¹ https://stackoverflow.com/a/13736974
Moved the custom `__str__` method from `ValidationMode` to a parent class `ArgparseEnum` so that it can be inherited by other classes. This class can be replaced by `enum.StrEnum` once Augur's minimum supported Python version is 3.11.
23f4fdb
to
4decbe4
Compare
Allows users to specify reporting behavior if filtering produces an empty result. The default behavior is still to raise an error if there's an empty result. This change was prompted by the refactoring of the Nextstrain seasonal influenza builds, where we use `augur filter` to filter sequences for a specific region.¹ The lack of B Victoria sequences in the Japan/Korea region in the past 2 years leads to an empty filtered result that raises an error, which stops the entire pipeline.² The downstream rules are able to handle an empty filtered result, so we just need an option to silence the error. ¹ https://github.com/nextstrain/seasonal-flu/blob/de7875364a52d05cfa6c66cc49d53a5f8b2de1a0/profiles/private.nextflu.org/export.smk#L67-L87 ² https://bedfordlab.slack.com/archives/C03KWDET9/p1677875939132259?thread_ts=1677792435.138489&cid=C03KWDET9
Intended to be used as the type converter method for argparse options that use the enum values as choices. It raises a custom `argparse.ArgumentTypeError` so that we can provide a helpful error message that lists all of the valid enum values.
4decbe4
to
361ee78
Compare
Description of proposed changes
Allows users to specify reporting behavior if filtering produces an empty result. The default behavior is still to raise an error if there's an empty result.
Related issue(s)
Related to nextstrain/seasonal-flu#76 (see discussion on Slack)
Testing
Added Cram test for new
--empty-results-reporting
option.Checklist