You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Augur's filter command allows filter or group-by columns to be completely missing and (in most cases) will only print a warning when this happens and continue with regular filtering behavior.
The exception is the --query filter for which a missing column will trigger a pandas KeyError that does not get properly handled by the calling code: #939
Expected behavior
Augur should throw an error when users request group-by any columns that do not exist, to prevent unexpected output from propagating through a workflow.
How to reproduce
# Change into filter tests.cd tests/functional/filter/
# Create metadata without a date column.
cut -f 1-3,5- metadata.tsv > metadata_without_date.tsv
# Run filter requesting one column that does not exist and one that does.
../../../bin/augur filter --metadata metadata_without_date.tsv --group-by region month --sequences-per-group 1 --output-strains test.txt
WARNING: A 'date' column could not be found to group-by year or month.
Filtering by group may behave differently than expected!
8 strains were dropped during filtering
8 of these were dropped because of subsampling criteria
3 strains passed all filters
# Run filter requesting one column that does not exist.
../../../bin/augur filter --metadata metadata_without_date.tsv --group-by date --sequences-per-group 1 --output-strains test.txt
WARNING: The specified group-by categories (['date']) were not found. No sequences-per-group sampling will be done.
0 strains were dropped during filtering
11 strains passed all filters
Possible solution
In the check for group column, instead of populating groups with an "unknown" value, raise an exception that can be handled by the calling code and print a helpful error to the user.
Raise exceptions in filter functions that require specific columns that do not exist and properly handle those exceptions in the calling code.
huddlej
changed the title
Throw an error when any requested group-by columns don't exist
Throw an error when any requested filter or group-by columns don't exist
Jul 28, 2021
Following #794, this now (unintentionally) raises a KeyError similar to #795:
WARNING: The specified group-by categories (['invalid']) were not found. No sequences-per-group sampling will be done.
Sampling at 10 per group.
Traceback (most recent call last):
File "/opt/homebrew/Caskroom/miniconda/base/envs/nextstrain-native/bin/augur", line 33, in <module>
sys.exit(load_entry_point('nextstrain-augur', 'console_scripts', 'augur')())
File "/Users/vlin/repos/augur/augur/__main__.py", line 10, in main
return augur.run( argv[1:] )
File "/Users/vlin/repos/augur/augur/__init__.py", line 75, in run
return args.__command__.run(args)
File "/Users/vlin/repos/augur/augur/filter.py", line 1523, in run
queues_by_group[group].add(
KeyError: ('_dummy',)
In #809 the behavior gets reverted back to the original description of this bug.
Current Behavior
Augur's filter command allows filter or group-by columns to be completely missing and (in most cases) will only print a warning when this happens and continue with regular filtering behavior.
The exception is the
--query
filter for which a missing column will trigger a pandas KeyError that does not get properly handled by the calling code: #939Expected behavior
Augur should throw an error when users request group-by any columns that do not exist, to prevent unexpected output from propagating through a workflow.
How to reproduce
Possible solution
In the check for group column, instead of populating groups with an "unknown" value, raise an exception that can be handled by the calling code and print a helpful error to the user.
Raise exceptions in filter functions that require specific columns that do not exist and properly handle those exceptions in the calling code.
Additional context
#749 (comment)
The text was updated successfully, but these errors were encountered: