Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throw an error when any requested filter or group-by columns don't exist #754

Closed
huddlej opened this issue Jul 28, 2021 · 1 comment · Fixed by #933
Closed

Throw an error when any requested filter or group-by columns don't exist #754

huddlej opened this issue Jul 28, 2021 · 1 comment · Fixed by #933
Assignees
Labels
bug Something isn't working

Comments

@huddlej
Copy link
Contributor

huddlej commented Jul 28, 2021

Current Behavior

Augur's filter command allows filter or group-by columns to be completely missing and (in most cases) will only print a warning when this happens and continue with regular filtering behavior.

The exception is the --query filter for which a missing column will trigger a pandas KeyError that does not get properly handled by the calling code: #939

Expected behavior

Augur should throw an error when users request group-by any columns that do not exist, to prevent unexpected output from propagating through a workflow.

How to reproduce

# Change into filter tests.
cd tests/functional/filter/

# Create metadata without a date column.
cut -f 1-3,5- metadata.tsv > metadata_without_date.tsv

# Run filter requesting one column that does not exist and one that does.
../../../bin/augur filter --metadata metadata_without_date.tsv --group-by region month --sequences-per-group 1 --output-strains test.txt
WARNING: A 'date' column could not be found to group-by year or month.
Filtering by group may behave differently than expected!
8 strains were dropped during filtering
	8 of these were dropped because of subsampling criteria
3 strains passed all filters

# Run filter requesting one column that does not exist.
../../../bin/augur filter --metadata metadata_without_date.tsv --group-by date --sequences-per-group 1 --output-strains test.txt
WARNING: The specified group-by categories (['date']) were not found. No sequences-per-group sampling will be done.
0 strains were dropped during filtering
11 strains passed all filters

Possible solution

In the check for group column, instead of populating groups with an "unknown" value, raise an exception that can be handled by the calling code and print a helpful error to the user.

Raise exceptions in filter functions that require specific columns that do not exist and properly handle those exceptions in the calling code.

Additional context

#749 (comment)

@huddlej huddlej added the bug Something isn't working label Jul 28, 2021
@huddlej huddlej changed the title Throw an error when any requested group-by columns don't exist Throw an error when any requested filter or group-by columns don't exist Jul 28, 2021
@victorlin
Copy link
Member

Following #794, this now (unintentionally) raises a KeyError similar to #795:

WARNING: The specified group-by categories (['invalid']) were not found. No sequences-per-group sampling will be done.
Sampling at 10 per group.
Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniconda/base/envs/nextstrain-native/bin/augur", line 33, in <module>
    sys.exit(load_entry_point('nextstrain-augur', 'console_scripts', 'augur')())
  File "/Users/vlin/repos/augur/augur/__main__.py", line 10, in main
    return augur.run( argv[1:] )
  File "/Users/vlin/repos/augur/augur/__init__.py", line 75, in run
    return args.__command__.run(args)
  File "/Users/vlin/repos/augur/augur/filter.py", line 1523, in run
    queues_by_group[group].add(
KeyError: ('_dummy',)

In #809 the behavior gets reverted back to the original description of this bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

2 participants