-
Notifications
You must be signed in to change notification settings - Fork 427
Add blog post for 2.10.0 #1052
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add blog post for 2.10.0 #1052
Changes from 5 commits
56dd03a
b59517a
88d708f
a736694
ea0e7c2
0db9cf8
6ea6bc5
9a29aea
a13beff
c0453a2
73bbed0
88fd069
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,174 @@ | ||
| --- | ||
| title: "Apache Airflow 2.10.0 is here" | ||
| linkTitle: "Apache Airflow 2.10.0 is here" | ||
| author: "Utkarsh Sharma" | ||
| github: "utkarsharma2" | ||
| linkedin: "utkarsh-sharma-5791ab8a" | ||
| description: "Apache Airflow 2.10.0 is a game-changer, with powerful Dataset improvements and the groundbreaking Hybrid Executor, set to redefine your workflow capabilities!" | ||
| tags: [Release] | ||
| date: "2024-08-08" | ||
| --- | ||
|
|
||
| I'm happy to announce that Apache Airflow 2.10.0 is now available, bringing an array of noteworthy enhancements and new features that will greatly serve our community. | ||
|
|
||
| Apache Airflow 2.10.0 contains over 135 commits, which include 43 new features, 85 improvements, 43 bug fixes, and 26 documentation changes. | ||
|
|
||
| **Details**: | ||
|
|
||
| 📦 PyPI: https://pypi.org/project/apache-airflow/2.10.0/ \ | ||
| 📚 Docs: https://airflow.apache.org/docs/apache-airflow/2.10.0/ \ | ||
| 🛠 Release Notes: https://airflow.apache.org/docs/apache-airflow/2.10.0/release_notes.html \ | ||
| 🐳 Docker Image: "docker pull apache/airflow:2.10.0" \ | ||
| 🚏 Constraints: https://github.com/apache/airflow/tree/constraints-2.10.0 | ||
|
|
||
|
|
||
| ## Hybrid Execution | ||
|
utkarsharma2 marked this conversation as resolved.
Outdated
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd also like to write a blog post entirely for multi exec config, since there is some nuance and details there that's worth spelling out and having it on the website blog like this example. I have not created one before, should I just follow this PR as an example of how to do that? I have the content written already but would need to translate it into this formatting.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, missed this. Yeah. Totally open to that and we can link to it here! We did that for setup/teardown in 2.7, if you need another example. |
||
|
|
||
| Each executor comes with its unique set of strengths and weaknesses, typically balancing latency, isolation, and compute efficiency. Traditionally, an Airflow environment is limited to a single executor, requiring users to make trade-offs, as no single executor is perfectly suited for all types of tasks. | ||
|
|
||
| We are introducing a new feature that allows for the concurrent use of multiple executors within a single Airflow environment. This flexibility enables users to take advantage of the specific strengths of different executors for various tasks, improving overall efficiency and mitigating weaknesses. Users can set a default executor for the entire environment and, if necessary, assign particular executors to individual DAGs or tasks. | ||
|
|
||
| To configure multiple executors we can pass comma separated list in airflow configuration. The first executor in the list will be the default executor for the environment. | ||
|
|
||
| ``` | ||
| [core] | ||
| executor = 'LocalExecutor,CeleryExecutor' | ||
| ``` | ||
| To make it easier for dag authors, we can also specify aliases for executors that can be specified in the executor configuration | ||
| ```commandline | ||
| [core] | ||
| executor = 'LocalExecutor,my.custom.module.ExecutorClass:ShortName' | ||
| ``` | ||
|
|
||
| DAG authors can specify executors to use at the task | ||
| ```python | ||
| BashOperator( | ||
| task_id="hello_world", | ||
| executor="LocalExecutor", | ||
|
utkarsharma2 marked this conversation as resolved.
Outdated
|
||
| bash_command="echo 'hello world!'", | ||
| ) | ||
|
|
||
| @task(executor="LocalExecutor") | ||
| def hello_world(): | ||
| print("hello world!") | ||
| ``` | ||
|
|
||
| We can also specify executors on the DAG level | ||
|
|
||
| ```python | ||
| def hello_world(): | ||
| print("hello world!") | ||
|
|
||
| def hello_world_again(): | ||
| print("hello world again!") | ||
|
|
||
| with DAG( | ||
| dag_id="hello_worlds", | ||
| default_args={"executor": "LocalExecutor"}, # Applies to all tasks in the DAG | ||
| ) as dag: | ||
| # All tasks will use the executor from default args automatically | ||
| hw = hello_world() | ||
| hw_again = hello_world_again() | ||
| ``` | ||
|
|
||
| ## Dynamic Dataset scheduling through DatasetAlias | ||
|
|
||
| Airflow 2.10 comes with `DatasetAlias` class which can be passed as a value in the `outlets`, `inlets` on a task, and `schedule` on a DAG. An instance of `DatasetAlias` is resolved dynamically to a real dataset. Downstream can depend on either the resolved dataset or on an alias itself. | ||
|
|
||
| `DatasetAlias` has one argument `name` that uniquely identifies the dataset. The task must first declare the alias as an outlet, and use `outlet_events` or `yield Metadata` to add events to it. | ||
|
|
||
| ### Emit a dataset event during task execution through outlet_events | ||
| ```python | ||
| from airflow.datasets import DatasetAlias | ||
|
|
||
| @task(outlets=[DatasetAlias("my-task-outputs")]) | ||
| def my_task_with_outlet_events(*, outlet_events): | ||
| outlet_events["my-task-outputs"].add(Dataset("s3://bucket/my-task"), extra={"k": "v"}) | ||
|
utkarsharma2 marked this conversation as resolved.
Outdated
|
||
| ``` | ||
| ### Emit a dataset event during task execution by yielding Metadata | ||
| ```python | ||
| from airflow.datasets.metadata import Metadata | ||
|
|
||
| @task(outlets=[DatasetAlias("my-task-outputs")]) | ||
| def my_task_with_metadata(): | ||
| s3_dataset = Dataset("s3://bucket/my-task}") | ||
| yield Metadata(s3_dataset, extra={"k": "v"}, alias="my-task-outputs") | ||
| ``` | ||
|
|
||
| There are two options for scheduling based on dataset aliases. Schedule based on `DatasetAlias` or real datasets. | ||
|
|
||
| ```python | ||
| with DAG(dag_id="dataset-producer"): | ||
| @task(outlets=[Dataset("example-alias")]) | ||
| def produce_dataset_events(): | ||
| pass | ||
|
utkarsharma2 marked this conversation as resolved.
Outdated
|
||
|
|
||
| with DAG(dag_id="dataset-alias-producer"): | ||
| @task(outlets=[DatasetAlias("example-alias")]) | ||
| def produce_dataset_events(*, outlet_events): | ||
| outlet_events["example-alias"].add(Dataset("s3://bucket/my-task")) | ||
|
|
||
| with DAG(dag_id="dataset-consumer", schedule=Dataset("s3://bucket/my-task")): | ||
| ... | ||
|
|
||
| with DAG(dag_id="dataset-alias-consumer", schedule=DatasetAlias("example-alias")): | ||
| ... | ||
| ``` | ||
| ### Dataset aliases UI Enhancements | ||
|
|
||
| Now users can see Dataset Aliases in legend of each cross-dag dependency graph with a corresponded icon/color. | ||
|
|
||
|  | ||
|
|
||
| ## Dark mode for Airflow UI | ||
|
|
||
| Airflow 2.10 comes with new Dark Mode feature which is designed to enhance user experience by offering an alternative visual theme that is easier on the eyes, especially in low-light conditions. You can toggle the crescent icon on the right side of the navigation bar to switch between light and dark mode. | ||
|
|
||
|  | ||
|
|
||
|  | ||
|
|
||
|
|
||
|
|
||
| ## Task Instance History | ||
|
|
||
| In Apache Airflow 2.10.0, when a task instance is retried or cleared, its execution history is maintained. You can view this history by clicking on the task instance in the Grid view, allowing you to access information about each attempt, such as logs, execution durations, and any failures. This feature improves transparency into the task's execution process, making it easier to troubleshoot and analyze your DAGs. | ||
|
|
||
|  | ||
|
|
||
| The history displays the final values of the task instance attributes for each specific run. On the log page, you can also access the logs for each attempt of the task instance. This information is valuable for debugging purposes. | ||
|
|
||
|  | ||
|
|
||
| ## Dataset UI Enhancements | ||
|
|
||
| ### Dataset Events list | ||
| We now have Dataset Events list to show all dataset events across all datasets. | ||
|  | ||
|
|
||
| ### Dataset Detail | ||
| Users can now look at more details of a dataset like extra, consuming dags and producing tasks. | ||
|  | ||
|
|
||
|
|
||
| ### Toggle datasets in Graph | ||
|
|
||
| We can now toggle the datasets in the DAG graph | ||
|
|
||
|  | ||
|  | ||
|
|
||
| ### Dataset Conditions in DAG Graph view | ||
| We now display the graph view with logical gates. Datasets with actual events are highlighted with a different border, making it easier to see what triggered the selected run. | ||
|
|
||
|  | ||
|
|
||
| ### Dataset event info in DAG Graph | ||
| For a DAG run, users can now view the dataset events connected to it directly in the graph view. | ||
|
|
||
|  | ||
|
|
||
| ## Contributors | ||
| Thanks to everyone who contributed to this release, including Amogh Desai, Andrey Anshin, Brent Bovenzi, Daniel Standish, Ephraim Anierobi, Hussein Awala, Jarek Potiuk, Jed Cunningham, Jens Scheffler, Tzu-ping Chung, Vincent Beck, Wei Lee, and over 120 others! | ||
|
utkarsharma2 marked this conversation as resolved.
Outdated
|
||
|
|
||
| I hope you enjoy using Apache Airflow 2.10.0! | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The numbers seem off. I need to figure out how to get the proper data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We normally just get the counts from the release notes. Close enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should clarify, we get the feature/improvement/etc numbers from the release notes. The commit count comes directly from git, sorta like the query to find the top contributors. 135 is definitely low.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jedcunningham I updated the count using git compare. PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My gut tells me you've swung too far in the other direction. Trying to get just core commits, roughly.