Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move filter date parsing logic to dates, use in frequencies #890

Merged
merged 5 commits into from
Apr 15, 2022

Conversation

victorlin
Copy link
Member

@victorlin victorlin commented Apr 12, 2022

Description of proposed changes

See commit messages

Related issue(s)

Testing

  • Added functional tests for frequencies

The multi-line string is an unnecessary artifact of a longer string used while working on 873a70a.
@victorlin victorlin self-assigned this Apr 12, 2022
@victorlin victorlin force-pushed the victorlin/frequencies/relative-dates branch from d2be0ba to e12e941 Compare April 12, 2022 18:48
@codecov
Copy link

codecov bot commented Apr 12, 2022

Codecov Report

Merging #890 (542bca5) into master (b464692) will not change coverage.
The diff coverage is 94.28%.

@@           Coverage Diff           @@
##           master     #890   +/-   ##
=======================================
  Coverage   34.64%   34.64%           
=======================================
  Files          41       42    +1     
  Lines        6003     6003           
  Branches     1536     1536           
=======================================
  Hits         2080     2080           
  Misses       3850     3850           
  Partials       73       73           
Impacted Files Coverage Δ
augur/frequencies.py 10.25% <33.33%> (-4.38%) ⬇️
augur/dates.py 100.00% <100.00%> (ø)
augur/filter.py 70.37% <100.00%> (-1.24%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b464692...542bca5. Read the comment docs.

@victorlin
Copy link
Member Author

@huddlej I think this should work but unsure what's the best way to test the --min-date/--max-date feature of frequencies. I've never used it directly and not familiar with the math going on within.

Are you able to come up with a small snippet for test_frequencies.py? I can expand from that.

@victorlin victorlin force-pushed the victorlin/frequencies/relative-dates branch 2 times, most recently from d12f0b5 to 5df2749 Compare April 12, 2022 20:29
victorlin added a commit that referenced this pull request Apr 13, 2022
to be more in line with #890

- define supported date formats via exhaustive list of regular expressions
- add TODO for relative dates
- factor out get_date_min/max() into any_to_numeric()/iso_to_numeric()
    - do sqlite3 UDF handling in try_get_numeric_date_min/max()
- remove date_type()/valid_date() functions, replace with any_to_numeric_type_min/max()
    - this ensures args.min_date/max_date are numeric
Copy link
Contributor

@huddlej huddlej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look great, @victorlin. I like how simple and reusable the new date_parsing.py module. I'll work on some tests for the frequencies module that rely on the this new interface. In the mean time, I had a comment threaded below where I propose a different organization of this new date-handling logic. We can chat in person about this and/or as a broader group, though, to see how others feel about the idea of a top-level dates module. No need to change anything until I get tests written.

augur/filter.py Outdated Show resolved Hide resolved
@huddlej
Copy link
Contributor

huddlej commented Apr 14, 2022

@victorlin Since this PR only changes the command line interface to the augur frequencies command and not the internal interface of any frequencies classes, it seemed most appropriate to implement functional tests for the new relative date functionality. ea17e7b adds Cram tests in frequencies.t that test both fixed and relative date ranges. Let me know what you think, or if you have questions...

victorlin and others added 3 commits April 14, 2022 18:15
With #740, filter's numeric_date() provides support for 3 date formats.
However, supporting various date formats is not specific to filter (e.g. frequencies has a separate numeric_date() which is now out-of-sync with this filter's numeric_date()).

This commit:

1. Moves numeric_date() to a new submodule augur.dates
2. Moves the related SUPPORTED_DATE_HELP_TEXT to augur.dates
3. Updates numeric_date() to raise a TypeError rather than argparse.ArgumentTypeError so it can be generalized to non-argparse usage
4. Adds a new function numeric_date_type() which wraps numeric_date() and raises an argparse.ArgumentTypeError per #740 (comment)
Reasons:

1. Sync-up with new filter --min-date/--max-date features (support for relative dates)
2. Less code duplication
Copies a functional test for augur frequencies from the Zika functional
tests and expands on this test with two additional tests of the min/max
date range arguments using fixed and relative dates.
@victorlin victorlin force-pushed the victorlin/frequencies/relative-dates branch from 3413c81 to d650f04 Compare April 15, 2022 01:16
@victorlin victorlin marked this pull request as ready for review April 15, 2022 02:11
@victorlin victorlin added this to the Next release X.X.X milestone Apr 15, 2022
@huddlej
Copy link
Contributor

huddlej commented Apr 15, 2022

Just tested this with the ncov workflow using the following build config and all worked as expected:

inputs:
  - name: botswana-data
    metadata: data/gisaid_auspice_input_hcov-19_2022_03_29_17.tar
    sequences: data/gisaid_auspice_input_hcov-19_2022_03_29_17.tar
  - name: references
    metadata: data/references_metadata.tsv
    sequences: data/references_sequences.fasta

builds:
  botswana:
    subsampling_scheme: recent

filter:
  skip_diagnostics: true
  min_length: 5000

subsampling:
  recent:
    recent_focal:
      min_date: --min-date 1M
    older_focal:
      min_date: --min-date 2M
      max_date: --max-date 1M

frequencies:
  min_date: 2M

@victorlin victorlin merged commit 8c54817 into master Apr 15, 2022
@victorlin victorlin deleted the victorlin/frequencies/relative-dates branch April 15, 2022 23:10
@victorlin victorlin changed the title Move filter date parsing logic to utils, use in frequencies Move filter date parsing logic to dates, use in frequencies Apr 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

frequencies: Support relative dates for --min-date and --max-date
4 participants