Improved algorithm for index 'dry_spell_total_length' with new option… · Pull Request #884 · Ouranosinc/xclim

ghost · 2021-10-25T17:51:03Z

…s (operator 'max', possibility to specify start/end dates) and calculation for days near dataset boundaries.

Pull Request Checklist:

This PR addresses an already opened issue (for bug fixes / features)
- This PR adds features and fixes problems described in Proposal of a drought duration climate index #843
Tests for the changes have been added (for bug fixes / features)
- 1 test has been added in test_precip.py.
- 1 test has been added in test_indices.py.
- 30 additional tests have been done outside of xclim to test the algorithm in detail (see scen_workflow_afr.unit_tests).
Documentation has been added / updated (for bug fixes / features)
HISTORY.rst has been updated (with summary of main changes)
- Link to issue (:issue:number) and pull request (:pull:number) has been added.
bumpversion patch has been called on this branch
The relevant author information has been added to .zenodo.json

What kind of change does this PR introduce?

A new argument allows to specify the resampling operator. The initial version of the algorithm was resampling according to the sum operator. The improved algorithm gives the possibility to resample using the max operator, which means that a sequence of window dry days can happen if there is a single day with precipitation reaching or exceeding a threshold (tresh).
It is now possible to specify a start_date and end_date. The days comprised in a sequence of window dry days are first identified. A mask is then applied to this result using the provided start and end dates. Resampling at the requested frequency (freq) is finally done. The implication of this change is that it is possible to have fewer than window days in a year, because some of these days could have been removed by the mask.
The algorithm allows a start_date (ex: "12-01") to be larger than end_date (ex: "02-28"). In this example, the dry days between December and February are not considered.
Calculation will be correct when specifying a window size that is an even number. The previous version of the algorithm was pushing a sequence of dry days to the right by one day with an even window size, assigning a wet label to the first day of a dry sequence, and a dry label to the day following the sequence. It can be an issue if one dry day is transferred to the following year (when a dry period overlaps two years). This situation seems to have been caused by the centre-based window, which is asymmetric if it's size is an even number.

Does this PR introduce a breaking change?

With the previous algorithm, the fist window / 2 days and last (window - 1) / 2 days of the input precipitation dataset were assigned nan value, and thus were assumed to be wet (because they were not dry), which will result in unrealistic indices in certain climates, e.g., West Africa (there is no precipitation between December and February). Ideally, the first and last years of the computed indices should be excluded, but the user may be unaware of this situation, which implies a risk of misusing the function. The combination of a left-based window and a bidirectional scanning (start to end of the input dataset, and the opposite), allows to calculate indices that consider the first and last days, without risk.

Other information:

It is possible that climate indices behave in a similar way as the current dry_spell_total_length algorithm (indices based on a window, centre-based or not). A similar solution, or an improved one, could be implemented for other indices.
This is certainly the case with dry_spell_frequency, which should work exactly the same way as dry_spell_total_length.

…s (operator 'max', possibility to specify start/end dates) and calculation for days near dataset boundaries.

ghost · 2021-11-05T00:27:17Z

@aulemahal I updated the algorithm to account for bisextile years. I removed 3 instances of xarray.DataArray casting, which should work better with dask. I also added 2 test cases (see test_indices.py:test_dry_spell). Overall, there are now 6 cases (3 cases x 2 operators). The current (xclim) and proposed algorithms give different results.

aulemahal · 2021-12-09T16:10:00Z

@yrouranos Comme je n'ai pas accès en écriture à ta branche, j'ai fait une 3e branche que je veux fusionner dans celle-ci pour ensuite fusionner dans le master. Ma PR est donc ouverte dans ton repo.
On peut aussi faire autrement, si tu veux : je pourrais fermer cette PR et la remplacer par une nouvelle à partir de ma tierce branche.

coveralls · 2022-01-20T20:21:15Z

Coverage increased (+0.01%) to 91.47% when pulling 522a03f on yrouranos:dry-spell-total-length-new-options into 39eab0a on Ouranosinc:master.

aulemahal · 2022-01-20T22:54:42Z

@yrouranos J'ai fusionné mes changements et le master! J'ai aussi ajusté la docstring et ajouté une entrée au "history".

@huard J'ai ajouté une note pour soulever le problème qui survient aux bords. Je ne pense pas par contre qu'il soit nécessaire de mettre une référence ici? Ça reste un calcul simple?

ghost · 2022-01-21T17:06:21Z

J'ai 2 commentaires et une suggestion:

À propos de la note, je crois qu'on assume plutôt que les jours précédant ou suivant la série sont secs. S'il pleut le jour no. 1 de la série, les jours qui précèdent (pour lesquels les données ne sont pas disponibles) n'ont aucune importance étant donné que nous savons que le jour no. 1 ne fait pas partie d'une séquence de jours secs étant donné qu'il a plu cette journée; considérer les jours qui précèdent la série ne changerait rien. Par contre, si le jour no. 1 est sec, il pourrait ne pas faire partie d'une séquence de jours secs en réalité s'il avait plu lors du jour précédent le jour no. 1. Dans ce cas, ça voudrait dire que la condition est moins stricte près des frontières (nous avons besoin de 3 jours consécutifs près des frontières, au lieu de 5 ailleurs).
C'est une bonne idée d'avoir ajouté une note étant donné que ça laisse la possibilité à l'utilisateur d'exclure la première et dernière année si l'hypothèse est problématique pour le contexte d'une étude. Pour cet indice, je crois que l'hypothèse sera rarement problématique. Par contre, si un indice similaire visait à calculer la période sèche la plus longue par année, un utilisateur intéressé au climat d'une région de l'hémisphère sud (ex: Afrique de l'Ouest) pourrait avoir à exclure la première et la dernière année (la période sèche débute en novembre et se termine au mois de mars). Les indices calculés n'auraient pas de sens pour les années 1 et n.
J'ai essayé le nouvel algorithme, et les cas ont tous fonctionné, sauf les no. 20-21. J'ai réalisé que c'était causé par une date de début non spécifiée (cas no. 20) ou une date de fin non spécifiée (cas no. 21). L'algorithme que j'avais développé, assumais que le premier jour était le 1er janvier (si non spécifié) et que le dernier jour était le 31 décembre (si non spécifié). Je me demandais si ça vaudrait la peine d'intégrer ce comportement dans select_time(), même si ce n'est pas indispensable. L'utilisateur spécifiera normalement les deux dates.

aulemahal · 2022-01-25T20:47:00Z

J'ai l'impression que ma note est plus proche du code. Dans mon exemple, window = 3. Donc les "périodes sèches" sont par définition de longueur 3. Une période plus longue est simplement une superposition de périodes de longueur 3. Le jour 0 (allons-y à la numérotation python) fait partie de 3 périodes candidates : (-2, -1, 0), (-1, 0, 1) et (0, 1, 2).

Ligne 698 : Il n'y pas de données pour -2 et -1, et le drapeau booléen de "période sèche" est attribué au dernier jour de la période, donc 0 et 1 sont NaN.
Ligne 699 : On refait un rolling pour voir si l'une de ces 3 périodes est sèche (et donc le jour fait partie d'une période sèche). Comme on fait un shift de 2 vers la gauche, c'est la valeur à l'indice 2 qui ramenée au jour 0. On aura True, si on a au moins un True dans 0, 1, 2. Comme 0 = 1 = NaN, il faut absolument que (0, 1, 2) soit une période sèche. Ce qui insinue que (-2, -1, 0) et (-1, 0, 1) ne sont pas sèches, et donc c'est pourquoi je dis que nous assumons implicitement que -2 et -1 sont mouillés.

En effet, select_time s'attend à ce que les deux bornes soient données. Comme la fonction ne prend pas freq en entrée, donc elle ne peut pas savoir qu'elle date/doy serait le plus logique à mettre. Mais ça pourrait être modifié, cette information existe.

huard

J'ai un peu l'impression que les effets de bords de la méthode sont indésirables, et non un feature.

Il n'y a pas d'indicator associé ?

…anos/xclim into dry-spell-total-length-new-options

yrouranos added 2 commits October 25, 2021 12:29

Improved algorithm for index 'dry_spell_total_length' with new option…

b1fbbf7

…s (operator 'max', possibility to specify start/end dates) and calculation for days near dataset boundaries.

Improved algorithm for index 'dry_spell_total_length' with new option…

74d3a3b

…s (operator 'max', possibility to specify start/end dates) and calculation for days near dataset boundaries.

ghost mentioned this pull request Oct 25, 2021

Proposal of a drought duration climate index #843

Closed

aulemahal self-requested a review October 29, 2021 14:23

Improved algorithm for index 'dry_spell_total_length' with new option…

21733d0

…s (operator 'max', possibility to specify start/end dates) and calculation for days near dataset boundaries.

huard requested changes Nov 19, 2021

View reviewed changes

Comment thread xclim/indices/_agro.py Outdated

Comment thread xclim/indices/_agro.py Outdated

aulemahal mentioned this pull request Nov 24, 2021

Indexing within indicators #934

Merged

7 tasks

aulemahal added 3 commits December 1, 2021 10:56

merge master

5908548

Adapt to new indexer - modifications to simplify code

a540a79

Add indexing tests

0348d50

merge master

f49a914

aulemahal added 2 commits January 20, 2022 17:40

Docstring ajustment

9a11bf5

upd hist

0d58d99

Merge branch 'master' into dry-spell-total-length-new-options

050bde1

aulemahal requested a review from huard January 25, 2022 20:48

huard reviewed Jan 26, 2022

View reviewed changes

Comment thread xclim/indices/_agro.py

Comment thread xclim/testing/tests/test_indices.py Outdated

Zeitsperre and others added 5 commits January 27, 2022 10:09

Merge branch 'master' into dry-spell-total-length-new-options

0e777c9

edit indicator metadata

3ed1239

Merge branch 'master' into dry-spell-total-length-new-options

aafa6eb

parametrize tests

e19a0d6

Merge branch 'dry-spell-total-length-new-options' of github.com:yrour…

cbb3c79

…anos/xclim into dry-spell-total-length-new-options

aulemahal requested a review from huard January 28, 2022 21:41

Merge branch 'master' into dry-spell-total-length-new-options

522a03f

Zeitsperre mentioned this pull request Jan 28, 2022

Prepare version 0.33.0 #999

Merged

3 tasks

aulemahal approved these changes Jan 28, 2022

View reviewed changes

Zeitsperre merged commit 422a44c into Ouranosinc:master Jan 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved algorithm for index 'dry_spell_total_length' with new option…#884

Improved algorithm for index 'dry_spell_total_length' with new option…#884
Zeitsperre merged 16 commits intomasterfrom
unknown repository

ghost commented Oct 25, 2021 •

edited by aulemahal

Loading

Uh oh!

ghost commented Nov 5, 2021 •

edited by ghost

Loading

Uh oh!

Uh oh!

Uh oh!

aulemahal commented Dec 9, 2021

Uh oh!

coveralls commented Jan 20, 2022 •

edited

Loading

Uh oh!

aulemahal commented Jan 20, 2022

Uh oh!

ghost commented Jan 21, 2022

Uh oh!

aulemahal commented Jan 25, 2022

Uh oh!

huard left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ghost commented Oct 25, 2021 • edited by aulemahal Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Checklist:

What kind of change does this PR introduce?

Does this PR introduce a breaking change?

Other information:

Uh oh!

ghost commented Nov 5, 2021 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aulemahal commented Dec 9, 2021

Uh oh!

coveralls commented Jan 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aulemahal commented Jan 20, 2022

Uh oh!

ghost commented Jan 21, 2022

Uh oh!

aulemahal commented Jan 25, 2022

Uh oh!

huard left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ghost commented Oct 25, 2021 •

edited by aulemahal

Loading

ghost commented Nov 5, 2021 •

edited by ghost

Loading

coveralls commented Jan 20, 2022 •

edited

Loading