Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement schedule on @asset #42315

Closed
2 tasks done
Lee-W opened this issue Sep 18, 2024 · 5 comments · Fixed by #46851
Closed
2 tasks done

Implement schedule on @asset #42315

Lee-W opened this issue Sep 18, 2024 · 5 comments · Fixed by #46851
Assignees
Labels
AIP-75 Asset-Centric Syntax area:datasets Issues related to the datasets feature kind:feature Feature Requests

Comments

@Lee-W
Copy link
Member

Lee-W commented Sep 18, 2024

Description

Schedules

This will probably be achieve by using AssetRef to access Asset by name

This one could be deferred to "Data Completeness"

Use case/motivation

Rationale

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@Lee-W Lee-W added kind:feature Feature Requests AIP-75 Asset-Centric Syntax labels Sep 18, 2024
@Lee-W Lee-W added the area:datasets Issues related to the datasets feature label Sep 19, 2024
@phanikumv phanikumv moved this from Backlog to In progress in AIP-75 New Asset-Centric Syntax Dec 4, 2024
@uranusjr
Copy link
Member

PR to add asset reference support #45028

@uranusjr
Copy link
Member

There’s a second part to come later to implement modern schedule dates (logical date at the end, no data intervals).

@uranusjr
Copy link
Member

PR for asset ref merged.

@uranusjr
Copy link
Member

uranusjr commented Jan 6, 2025

There’s still this part of the AIP left https://cwiki.apache.org/confluence/x/RA2TEg

The key difference is, since we are decoupling partitioning from scheduling, the schedule parameter no longer controls the interval, i.e. not designed around the logical/execution date. It simply controls when the next round should happen.

It is expected that scheduling of non-asset workflows (DAGs) will also be changed in a similar way to match the behavior for assets. Existing operators must be reviewed to ensure they account for the new scheduling semantic, but we should provide a transition interface to assist rewrites and better allow providers to develop a implementation both compatible to 2 and 3. The timetable protocol will also require new methods to implement the new semantic, but both old and new should be able to both implemented on the same class, with each major version only calling methods its uses.

This behaviour is partially in line with the amendments proposed for AIP-83, and will be implemented together with changes to the DAG class. https://cwiki.apache.org/confluence/x/Ngv0Ew

Specifically:

  • A DAG run backing an @asset materialisation will have its logical date and data interval values set to null in the database.
  • The @asset function’s execution context dict will not contain keys logical_date (and derived values such as ds), data_interval_start, and data_interval_end.
  • The run will be represented in user-facing interfaces by the run_after date (in line with all logical-date-less DAG runs in the amendment).

@phanikumv
Copy link
Contributor

phanikumv commented Feb 10, 2025

Depends on #46192

@phanikumv phanikumv moved this from Todo to In Progress in AIP-83 amendment Feb 17, 2025
@github-project-automation github-project-automation bot moved this from In progress to Done in AIP-75 New Asset-Centric Syntax Feb 20, 2025
@github-project-automation github-project-automation bot moved this from In Progress to Done in AIP-83 amendment Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AIP-75 Asset-Centric Syntax area:datasets Issues related to the datasets feature kind:feature Feature Requests
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants