Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ISO time intervals #1770

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

victorlin
Copy link
Member

@victorlin victorlin commented Mar 10, 2025

Description of proposed changes

Previously, the only way an interval could be specified was through an ambiguous date, which has limited flexibility. Conveniently, the aniso8601 library provides a function to parse the various ISO interval formats¹.

Note that this supports time information where all other date format support and logic is based on date without time information. This is fine because it is trivial to ignore the time information and have everything work as expected. It seems far-fetched for now, but maybe someday we will end up using time information, since date is inherently ambiguous without at least a time zone.

¹ https://en.wikipedia.org/wiki/ISO_8601#Time_intervals

Related issue(s)

Closes #1304
Closes #882

Checklist

  • Automated checks pass
  • Check if you need to add a changelog message
  • Check if you need to add tests
  • Check if you need to update docs

@victorlin victorlin self-assigned this Mar 10, 2025
@@ -121,6 +122,25 @@ def get_numerical_date_from_value(value, fmt=None, min_max_year=None):
except InvalidDate as error:
raise AugurError(str(error)) from error
return [treetime.utils.numeric_date(d) for d in ambig_date]
if '/' in value:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to ask about ambiguity in date range bounds, but I don't think this would provide any advantage. E.g. 2025-01-XX/... vs 2025-01-01/.... Did this come up anywhere?

Copy link
Member Author

@victorlin victorlin Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be useful in the case where the collection date is year-only (2025) while submission date is exact (2025-01-23). This is supported by the current implementation.

A few more notes:

  1. It doesn't work the other way around when the upper bound is ambiguous. I'm not sure why, since I would expect it to work by taking the maximum similar to how it worked for the lower bound by taking the minimum. Maybe an oversight by aniso8601. I've opened an issue: nielsenb / aniso8601 / issues / #32 - Inconsistent support for reduced precision dates in intervals — Bitbucket
  2. You can get nonsense intervals such as 2025/2024-01-23. This is equivalent to 2025-01-01/2024-01-23, which is also valid. Current logic simply flips it and continues, which was only meant for the duration>/<end> syntax, but in hindsight could mask issues with other situations.
  3. I don't think we should support the XX syntax here. That's been used to represent intervals in absence of this more precise syntax. I think it's fine to support the incomplete syntax (i.e. "reduced precision" in ISO terms - 2025, 2025-01) simply because it's already supported by the ISO standard.

if start > end:
start, end = end, start

return list(map(treetime.utils.numeric_date, (start, end)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[not for this PR, just a question]

For ambiguous dates or date ranges, do you know if we ever compare the end date to the current date? For a strain where all we know is 2025-XX-XX, augur should be smart enough to know to bound this to today and thus not allow dates in the future. The plan (for augur curate, I guess?) is to replace such ambiguous dates with a date bounded by the submission date.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ambiguous dates are limited to current date:

# Limit the min and max dates to be no later than today's date.
min_date = min(min_date, datetime.date.today())
max_date = min(max_date, datetime.date.today())

I don't think there are any errors when using an exact date in the future (e.g. 2026-01-01).

Copy link

codecov bot commented Mar 10, 2025

Codecov Report

Attention: Patch coverage is 87.50000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 73.23%. Comparing base (033c73b) to head (ed382a0).
Report is 8 commits behind head on master.

Files with missing lines Patch % Lines
augur/dates/__init__.py 85.71% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1770      +/-   ##
==========================================
+ Coverage   73.22%   73.23%   +0.01%     
==========================================
  Files          79       79              
  Lines        8391     8406      +15     
  Branches     1712     1717       +5     
==========================================
+ Hits         6144     6156      +12     
- Misses       1959     1960       +1     
- Partials      288      290       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@victorlin victorlin force-pushed the victorlin/support-iso-time-intervals branch from 6e8ba3b to 9aa117f Compare March 11, 2025 22:49
Previously, the only way an interval could be specified was through an
ambiguous date, which has limited flexibility. Conveniently, the
aniso8601 library provides a function to parse the various ISO interval
formats¹.

Note that this supports time information where all other date format
support and logic is based on date without time information. This is
fine because it is trivial to ignore the time information and have
everything work as expected. It seems far-fetched for now, but maybe
someday we will end up using time information, since date is inherently
ambiguous without at least a time zone.

¹ <https://en.wikipedia.org/wiki/ISO_8601#Time_intervals>
@victorlin victorlin force-pushed the victorlin/support-iso-time-intervals branch from 9aa117f to c8b8005 Compare March 11, 2025 22:59
Copy link
Contributor

@joverlee521 joverlee521 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM. My only suggestion is to clearly document the supported formats for date intervals since that will dictate the outputs for augur curate format-dates in #1494

This allows the error class to directly terminate the program without
rerouting the error message to AugurError.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow precise date ranges Add documentation page on supported date formats
3 participants