-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support treetime ambiguous date format #1305
Conversation
Add support for the treetime ambiguous date format `[float_min:float_max]`, e.g. `[2004.43:2005.58]` This is useful for when ambiguity is not limited to month/year boundaries. Resolves #1304
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1305 +/- ##
==========================================
- Coverage 69.90% 69.89% -0.01%
==========================================
Files 67 67
Lines 7167 7172 +5
Branches 1748 1750 +2
==========================================
+ Hits 5010 5013 +3
- Misses 1851 1853 +2
Partials 306 306 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for implementing this! It's a simple solution to something we've discussed (I forget where exactly) and seems to work well for you.
@@ -121,6 +121,12 @@ def get_numerical_date_from_value(value, fmt=None, min_max_year=None): | |||
except InvalidDate as error: | |||
raise AugurError(str(error)) from error | |||
return [treetime.utils.numeric_date(d) for d in ambig_date] | |||
if value.startswith('[') and value.endswith(']') and len(value[1:-1].split(':'))==2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For added context, this is the corresponding logic in TreeTime: treetime/utils.py#L305-L312
@@ -121,6 +121,12 @@ def get_numerical_date_from_value(value, fmt=None, min_max_year=None): | |||
except InvalidDate as error: | |||
raise AugurError(str(error)) from error | |||
return [treetime.utils.numeric_date(d) for d in ambig_date] | |||
if value.startswith('[') and value.endswith(']') and len(value[1:-1].split(':'))==2: | |||
# Treetime ambiguous date format, e.g. [2019.5:2020.5] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious how you get the numerical dates in metadata. Is this a standard format outside of TreeTime, or do you run ISO dates through something like treetime.utils.numeric_date
to get the float representation?
Asking because I think a good next step after this gets merged is to generalize for any [A:B]
date pair where A
and B
can be anything that's currently supported as an exact scalar date (e.g. [2023-07-31:2023-08-06]
). I did some prototyping of this a while back but never got back to creating a proper PR for it.
try: | ||
return [float(x) for x in value[1:-1].split(':')] | ||
except ValueError as error: | ||
raise AugurError(str(error)) from error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: customize the error message.
When given something like [2023-07-31:2023-08-06]
, the ValueError
message looks something like:
could not convert string to float: '2023-07-31'
It's obvious what this means to us as developers, but maybe not so obvious to users. Something like this might be more clear:
The date range format must be `[A:B]` where both `A` and `B` are numerical dates.
Shifting the design decision a bit in the motivating issue. I'll plan to update this PR with that and address the conversations above. |
Add support for the treetime ambiguous date format
[float_min:float_max]
, e.g.[2004.43:2005.58]
This is useful for when ambiguity is not limited to month/year boundaries, e.g. in case of Danish SARS-CoV-2 sequences which have their dates binned rounded down to the most recent Monday.
Resolves #1304
With this PR, we can use cross-month ambiguous dates that aren't whole year, and within-month ambiguous dates as well.

E.g. this wouldn't have been possible previously without unconstraining everything but the year:
Testing
What steps should be taken to test the changes you've proposed?
If you added or changed behavior in the codebase, did you update the tests, or do you need help with this?
ncov-simplest
, turning Danish dates into ranges here and running refine with the new parsing feature hereChecklist