Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(jinja): add advanced temporal filter functionality #30142

Merged
merged 10 commits into from
Sep 6, 2024

Conversation

villebro
Copy link
Member

@villebro villebro commented Sep 3, 2024

SUMMARY

Currently the from_dttm and to_dttm Jinja variables make it possible to reference the time range endpoints from within the Jinja context. However, they have some shortcomings:

  • to_dttm always points to the current day, even if there's no time range present in the query (this appears to be a quirk of how the get_since_until helper function works).
  • They are always formatted using the .isoformat() method. This means that it's difficult to use them for database-specific formatting that's readily available in BaseEngineSpec.convert_dttm.
  • It's impossible to access a specific time filter if there are multiple time filters present.
  • In the case of virtual datasets, it's not possible to remove the time filter from the outer query (this is possible for regular filter types using the filter_value and get_filters macros).

This PR adds a new Jinja macro called get_time_filter, which returns a dataclass with the following properties:

  • from_expr: The SQL expression representing the start of the time range, if available
  • to_expr: The SQL expression representing the end of the time range, if available
  • time_range: The (usually) human readable raw value of the time filter (e.g. "No filter", "Last week" etc)

Please refer to the added docs for how the feature works in practice:

image
image

TESTING INSTRUCTIONS

  • Run examples in added docs
  • Added unit tests (100 % coverage for added logic)

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@github-actions github-actions bot added the doc Namespace | Anything related to documentation label Sep 3, 2024
@villebro villebro changed the title Villebro/jinja temporal feat(jinja): add advanced temporal functionality Sep 3, 2024
@dosubot dosubot bot added the global:jinja Related to Jinja templating label Sep 3, 2024
Copy link

codecov bot commented Sep 3, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.67%. Comparing base (76d897e) to head (0212fa9).
Report is 692 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff             @@
##           master   #30142       +/-   ##
===========================================
+ Coverage   60.48%   83.67%   +23.18%     
===========================================
  Files        1931      529     -1402     
  Lines       76236    38334    -37902     
  Branches     8568        0     -8568     
===========================================
- Hits        46114    32075    -14039     
+ Misses      28017     6259    -21758     
+ Partials     2105        0     -2105     
Flag Coverage Δ
hive 48.85% <37.83%> (-0.31%) ⬇️
javascript ?
mysql 76.60% <37.83%> (?)
postgres 76.70% <37.83%> (?)
presto 53.39% <37.83%> (-0.41%) ⬇️
python 83.67% <100.00%> (+20.18%) ⬆️
sqlite 76.16% <37.83%> (?)
unit 60.45% <100.00%> (+2.82%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nytai
Copy link
Member

nytai commented Sep 4, 2024

So something that comes up often in the templateized datasets using these filter values is having defaults so that users can run the "sync columns from source" action and have their query execute. The dataset modal does have "Template parameters" setting but that actually ends up overriding anything set at the dashboard or chart level which is kind of unexpected.

I've been advising users to always use if/else to check if the filter values are defined otherwise fallback to a reasonable default (eg last week or day), so that their query is valid even if not run in a dashboard or chart context. It would be good to bake this default behavior into the jinja expression so that expressing this is a little less verbose. Something like:

get_time_filter(..., default_time_range="Last Week") and maybe even get_time_filter(..., default_ from_expr ="2024-09-04", default_to_expr="2024-09-05")

@villebro
Copy link
Member Author

villebro commented Sep 4, 2024

So something that comes up often in the templateized datasets using these filter values is having defaults so that users can run the "sync columns from source" action and have their query execute. The dataset modal does have "Template parameters" setting but that actually ends up overriding anything set at the dashboard or chart level which is kind of unexpected.

I've been advising users to always use if/else to check if the filter values are defined otherwise fallback to a reasonable default (eg last week or day), so that their query is valid even if not run in a dashboard or chart context. It would be good to bake this default behavior into the jinja expression so that expressing this is a little less verbose. Something like:

get_time_filter(..., default_time_range="Last Week") and maybe even get_time_filter(..., default_from_expr ="2024-09-04", default_to_expr="2024-09-05")

That's a really good idea 👍 To keep things simple, I would prefer only having default_time_range, as it would be possible to express the explicit dates in the latter example with the time range syntax as follows: "2024-09-04 : 2024-09-05". And maybe even simplifying the naming further to keep it in line with the existing filter_values, and just calling it default:

def get_time_filter(
    self,
    column: str | None = None,
    default: str | None = None,
    target_type: str | None = None,
    remove_filter: bool = False,
) -> TimeFilter:
    ...
    # logic to populate time_range from filter value
    ...
    if time_range == NO_TIME_RANGE and default:
        time_range = default
    ...

@villebro
Copy link
Member Author

villebro commented Sep 5, 2024

@nytai I've added the default option in the latest commit: 288e6af

@villebro villebro requested a review from sfirke September 5, 2024 17:12
Copy link
Member

@nytai nytai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@villebro villebro changed the title feat(jinja): add advanced temporal functionality feat(jinja): add advanced temporal filter functionality Sep 5, 2024
Copy link
Member

@sfirke sfirke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs & comments: A few documentation language suggestions and I pointed out a place where a sentence stops halfway.

Code: I glanced at the code, it's mostly beyond me so I'll defer to others there.

Overall: seems like a nice improvement in functionality, I appreciate the complete documentation to go with it 🙏

docs/docs/configuration/sql-templating.mdx Outdated Show resolved Hide resolved
docs/docs/configuration/sql-templating.mdx Outdated Show resolved Hide resolved
docs/docs/configuration/sql-templating.mdx Outdated Show resolved Hide resolved
docs/docs/configuration/sql-templating.mdx Outdated Show resolved Hide resolved
superset/jinja_context.py Outdated Show resolved Hide resolved
@villebro
Copy link
Member Author

villebro commented Sep 5, 2024

Thanks for the review @sfirke ! All very good callouts, I've addressed the comments in 0212fa9

Copy link
Contributor

@giftig giftig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@villebro
Copy link
Member Author

villebro commented Sep 6, 2024

I added a self-assigned task for removing docs for from_dttm and to_dttm in favor of this macro in 5.0: https://github.com/orgs/apache/projects/345?pane=issue&itemId=78905534

@villebro villebro merged commit 601e556 into apache:master Sep 6, 2024
40 of 41 checks passed
@villebro villebro deleted the villebro/jinja-temporal branch September 6, 2024 18:15
@Vitor-Avila
Copy link
Contributor

hey @villebro,

This is amazing! Thank you so much for working on this new macro. 🙏 It's super powerful -- can't wait to test it. 🙌

One question: you mentioned that one of the limitations with the from_dttm and to_dttm macros is the inability to apply multiple temporal filters. Could you confirm if this is now possible with this macro? I could see it happening since it supports specifying a column, however in the dashboard temporal filter a column is not specified, so I assume it's still impossible to add two distinct/indepentent temporal filters at the dashboard level -- is that correct?

@villebro
Copy link
Member Author

villebro commented Sep 24, 2024

hey @villebro,

This is amazing! Thank you so much for working on this new macro. 🙏 It's super powerful -- can't wait to test it. 🙌

Thanks for the feedback, super happy if this is useful to someone else, too ❤️

One question: you mentioned that one of the limitations with the from_dttm and to_dttm macros is the inability to apply multiple temporal filters. Could you confirm if this is now possible with this macro? I could see it happening since it supports specifying a column, however in the dashboard temporal filter a column is not specified, so I assume it's still impossible to add two distinct/indepentent temporal filters at the dashboard level -- is that correct?

With this it's indeed possible to extract the temporal filter for multiple different columns. However, in the context of dashboard filtering, unfortunately the dashboard time range filter doesn't yet support applying to a specific temporal column (when the time range filter applies a value, I believe it attaches to all temporal filters currently 🙁). What I'd like to see is adding an optional dataset and column dropdown in the time range filter, and if that's selected, then it will apply a temporal filter only to that column. I've heard lots of people ask for that, and it shouldn't be that difficult to implement IMO.

@Vitor-Avila
Copy link
Contributor

that totally makes sense! 🙏 I also believe this new macro would already allow you to pull distinct temporal filters from the chart context to apply to the inner query, which is super cool.

I 100% support and agree with those changes to the temporal filter - Ideally we should support selecting which temporal column is getting filtered, to allow multiple distinct filters in a dashboard (similar to the filter value). I think the only additional consideration is that right now when you create a chart, it automatically adds the dataset's main dttm column to the filter as a "placeholder", so that dashboard time range filters are applied to it. If it becomes possible to control/differentiate the column in the filter configuration, we would just need to make sure that this filter doesn't get applied to the "placeholder columns" too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Namespace | Anything related to documentation global:jinja Related to Jinja templating size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants