Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter: Add --empty-results-reporting={error,warn,skip} option #1175

Merged
merged 5 commits into from
Mar 13, 2023

Conversation

joverlee521
Copy link
Contributor

@joverlee521 joverlee521 commented Mar 8, 2023

Description of proposed changes

Allows users to specify reporting behavior if filtering produces an empty result. The default behavior is still to raise an error if there's an empty result.

Related issue(s)

Related to nextstrain/seasonal-flu#76 (see discussion on Slack)

Testing

Added Cram test for new --empty-results-reporting option.

Checklist

  • Add a message in CHANGES.md summarizing the changes in this PR that are end user focused. Keep headers and formatting consistent with the rest of the file.

@joverlee521 joverlee521 requested a review from a team March 8, 2023 00:24
@codecov
Copy link

codecov bot commented Mar 8, 2023

Codecov Report

Patch coverage: 83.33% and project coverage change: +0.05 🎉

Comparison is base (3cf9671) 68.19% compared to head (4decbe4) 68.25%.

❗ Current head 4decbe4 differs from pull request most recent head 361ee78. Consider uploading reports for the commit 361ee78 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1175      +/-   ##
==========================================
+ Coverage   68.19%   68.25%   +0.05%     
==========================================
  Files          63       63              
  Lines        6748     6772      +24     
  Branches     1654     1659       +5     
==========================================
+ Hits         4602     4622      +20     
- Misses       1838     1841       +3     
- Partials      308      309       +1     
Impacted Files Coverage Δ
augur/filter/_run.py 95.50% <77.77%> (-0.86%) ⬇️
augur/types.py 89.28% <83.33%> (-3.58%) ⬇️
augur/filter/__init__.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Comment on lines 72 to 73
output_group.add_argument(
'--empty-results-reporting',
Copy link
Member

@victorlin victorlin Mar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option doesn't fit under the "outputs" group:

outputs:
  possible representations of filtered data (at least one required)

  --output OUTPUT, --output-sequences OUTPUT, -o OUTPUT
                        filtered sequences in FASTA format (default: None)
  --output-metadata OUTPUT_METADATA
                        metadata for strains that passed filters (default: None)
  --output-strains OUTPUT_STRAINS
                        list of strains that passed filters (no header) (default: None)
  --output-log OUTPUT_LOG
                        tab-delimited file with one row for each filtered strain and the reason it was filtered. Keyword arguments
                        used for a given filter are reported in JSON format in a `kwargs` column. (default: None)
  --empty-results-reporting {error,warn,skip}
                        How should empty filter results be reported. (default: error)

Maybe it would fit better under the default "options" group?

options:
  -h, --help            show this help message and exit
  --empty-results-reporting {error,warn,skip}
                        How should empty filter results be reported. (default: error)
Suggested change
output_group.add_argument(
'--empty-results-reporting',
parser.add_argument(
'--empty-results-reporting',

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptually it sort of fits under output options, no? It's an option that controls what to do if output is empty.

Though maybe the option name should use "output" instead of "results", e.g. --empty-output-reporting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would fit better under the default "options" group?

Hmm, I would look for options related to the outputs in the outputs group.
I can reword the argument group description to make it clearer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though maybe the option name should use "output" instead of "results", e.g. --empty-output-reporting?

I specifically named the option with "results" since we have the --output-log option, which is not taken into account by the --empty-results-reporting option. That might just be me being nitpicky about words 🤷

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated description in force push:

outputs:
  options related to outputs, at least one of the possible representations
  of filtered data (--output, --output-metadata, --output-strains) is
  required

  --output OUTPUT, --output-sequences OUTPUT, -o OUTPUT
                        filtered sequences in FASTA format (default: None)
  --output-metadata OUTPUT_METADATA
                        metadata for strains that passed filters (default:
                        None)
  --output-strains OUTPUT_STRAINS
                        list of strains that passed filters (no header)
                        (default: None)
  --output-log OUTPUT_LOG
                        tab-delimited file with one row for each filtered
                        strain and the reason it was filtered. Keyword
                        arguments used for a given filter are reported in JSON
                        format in a `kwargs` column. (default: None)
  --empty-results-reporting {error,warn,skip}
                        How should empty filter results be reported. (default:
                        error)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I like --empty-results-reporting over --empty-output-reporting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking

I think results vs. output is splitting hairs that and no one will be confused by it not applying to --output-log. We also don't use "results" anywhere in the augur filter help text and nearly nowhere else in Augur's user-facing interface, whereas we do use "output" everywhere for this concept.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the resolution, the rewording of

possible representations of filtered data (at least one required)

to

options related to outputs, at least one of the possible representations
  of filtered data (--output, --output-metadata, --output-strains) is
  required`

is great!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also don't use "results" anywhere in the augur filter help text and nearly nowhere else in Augur's user-facing interface, whereas we do use "output" everywhere for this concept.

This is a good point. I'm on board with --empty-output-reporting now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the all the feedback! I updated the option to --empty-output-reporting and reworded the help message to be clear that the option applies to the filtered data:

  --empty-output-reporting {error,warn,silent}
                        How should empty outputs be reported when no strains
                        pass filtering and/or subsampling. (default: error)

augur/filter/__init__.py Outdated Show resolved Hide resolved
@joverlee521 joverlee521 force-pushed the filter-empty-results branch from 057db7b to f7d9e7d Compare March 8, 2023 21:01
Copy link
Contributor

@huddlej huddlej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for putting this together, @joverlee521! I confirmed that this new functionality allows the seasonal flu workflow to move past the step where it had previously crashed. In doing so, I realized that we've only kicked the can down the road a bit and now have to make augur frequencies handle empty alignment input or modify the workflow to conditionally check for empty inputs. 😅

My only major request below is for a slightly expanded functional test. Other comments are mostly about naming of things and why we use enum instead of a list of choices. I realize that the enum pattern predates this PR, but I'm not sure why we'd prefer to use the enum class as an argument type here specifically.

augur/types.py Outdated
Enum representation of string values that represent how validation should
be handled.
Enum representation of string values that represent how empty results should
be reported.
"""
ERROR = 'error'
WARN = 'warn'
SKIP = 'skip'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a minor point, but "skip" doesn't quite match the context in augur filter like it did in the augur validation context. I'd think "ignore" might be a more appropriate verb for the empty filter results.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I'm also a stickler for wording context, but in this case I think it does match: the "empty results" are being ignored, but the "reporting" of them is being skipped.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that "skip" matches "reporting", but I'm suggesting that "reporting" reflects the internals of the code and not what the user is asking augur filter to do.

As a user, I want to tell augur filter what to do (or how to handle the case) when there are no results in the output: throw an error, print a warning, or do nothing (i.e., ignore the empty output). Asking augur filter to "skip reporting empty results" doesn't quite match how I normally think about responding to errors and warnings (which would be to "ignore empty results"). There might be a better verb than "ignore" or "skip" that captures "do nothing" in a single word though...

By the same logic, asking augur export to "skip validation" matches what I want as a user (when I get the annoying augur version mismatch errors). I don't think I would ask augur export to "ignore validation" because I would want to avoid running validation altogether.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, I'm thinking the opposite, where "handled" reflects internals of the code.

As a user, I'm telling augur filter how to report empty results to me. Report it as an error, a warning, or skip the report altogether.

Copy link
Member

@tsibley tsibley Mar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, well, your examples of what makes sense to you for empty results vs. validation seem inconsistent to me... 🙃 so I think we either have different connotations here or else we're accidentally talking past each other.

This is minor, and I'm happy to drop it in favor of whatever @joverlee521 and you think is best.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be a better verb than "ignore" or "skip" that captures "do nothing" in a single word though...

We can continue the augur curate model and go with silent 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, yeah, silent could work, too. Maybe it's too minor to dwell on more. I'm happy with whatever you prefer, @joverlee521.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to silent in the forced-push.

augur/types.py Show resolved Hide resolved
type=EmptyResultsReportingMethod,
choices=list(EmptyResultsReportingMethod),
default=EmptyResultsReportingMethod.ERROR,
help="How should empty filter results be reported.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking
Related to my minor comment about using "ignore" instead of "skip", I slightly prefer "handled" to "reported" in the context of "how should empty filter results be...".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's precedent for --*-reporting options in augur curate commands.

augur/filter/__init__.py Outdated Show resolved Hide resolved
The older head (BSD June 6, 1993) on older macOS (10.14.6) can
potentially read too much of a file into the buffer.¹

In this particular test, the file is so small that the `head` command
reads the entire file, which causes the test to fail.

¹ https://stackoverflow.com/a/13736974
Moved the custom `__str__` method from `ValidationMode` to a parent
class `ArgparseEnum` so that it can be inherited by other classes.

This class can be replaced by `enum.StrEnum` once Augur's minimum
supported Python version is 3.11.
@joverlee521 joverlee521 force-pushed the filter-empty-results branch 2 times, most recently from 23f4fdb to 4decbe4 Compare March 9, 2023 01:49
augur/types.py Outdated Show resolved Hide resolved
Allows users to specify reporting behavior if filtering produces an
empty result. The default behavior is still to raise an error if
there's an empty result.

This change was prompted by the refactoring of the Nextstrain
seasonal influenza builds, where we use `augur filter` to filter
sequences for a specific region.¹ The lack of B Victoria sequences in
the Japan/Korea region in the past 2 years leads to an empty filtered
result that raises an error, which stops the entire pipeline.²
The downstream rules are able to handle an empty filtered result, so
we just need an option to silence the error.

¹ https://github.com/nextstrain/seasonal-flu/blob/de7875364a52d05cfa6c66cc49d53a5f8b2de1a0/profiles/private.nextflu.org/export.smk#L67-L87
² https://bedfordlab.slack.com/archives/C03KWDET9/p1677875939132259?thread_ts=1677792435.138489&cid=C03KWDET9
Intended to be used as the type converter method for argparse options
that use the enum values as choices.

It raises a custom `argparse.ArgumentTypeError` so that we can provide
a helpful error message that lists all of the valid enum values.
@joverlee521 joverlee521 force-pushed the filter-empty-results branch from 4decbe4 to 361ee78 Compare March 13, 2023 19:00
@joverlee521 joverlee521 merged commit 8d24a79 into master Mar 13, 2023
@joverlee521 joverlee521 deleted the filter-empty-results branch March 13, 2023 21:01
@victorlin victorlin mentioned this pull request Apr 29, 2024
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

5 participants