Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-3607] fix scheduler bug related to task concurrency and depends on past #7402

Merged
merged 1 commit into from
Feb 14, 2020

Conversation

houqp
Copy link
Member

@houqp houqp commented Feb 12, 2020

commit 50efda5 introduced a bug that
prevents scheduler from scheduling tasks with the following properties:

  • has depends on past set to True
  • has custom concurrency limit

Issue link: AIRFLOW-3607

Make sure to mark the boxes below before creating PR: [x]

  • Description above provides context of the change
  • Commit message/PR title starts with [AIRFLOW-NNNN]. AIRFLOW-NNNN = JIRA ID*
  • Unit tests coverage for changes (not needed for documentation changes)
  • Commits follow "How to write a good git commit message"
  • Relevant documentation is updated including usage instructions.
  • I will engage committers as explained in Contribution Workflow Example.

* For document-only changes commit message can start with [AIRFLOW-XXXX].


In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

@boring-cyborg boring-cyborg bot added the area:Scheduler including HA (high availability) scheduler label Feb 12, 2020
@houqp houqp requested a review from ashb February 12, 2020 01:44
@houqp houqp changed the title [AIRFLOW-6779] fix scheduler bug related to concurrency and depends on past [AIRFLOW-6779] fix scheduler bug related to task concurrency and depends on past Feb 12, 2020
…n past

commit 50efda5 introduced a bug that
prevents scheduler from scheduling tasks with the following properties:

* has depends on past set to True
* has custom concurrency limit
@houqp
Copy link
Member Author

houqp commented Feb 12, 2020

@ashb updated PR

@codecov-io
Copy link

Codecov Report

Merging #7402 into master will decrease coverage by 0.45%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7402      +/-   ##
==========================================
- Coverage   86.61%   86.15%   -0.46%     
==========================================
  Files         873      874       +1     
  Lines       40757    40873     +116     
==========================================
- Hits        35300    35213      -87     
- Misses       5457     5660     +203
Impacted Files Coverage Δ
airflow/models/dagrun.py 96.44% <100%> (+0.08%) ⬆️
...w/providers/apache/hive/operators/mysql_to_hive.py 35.84% <0%> (-64.16%) ⬇️
airflow/kubernetes/volume_mount.py 44.44% <0%> (-55.56%) ⬇️
airflow/kubernetes/volume.py 52.94% <0%> (-47.06%) ⬇️
airflow/security/kerberos.py 30.43% <0%> (-45.66%) ⬇️
airflow/kubernetes/pod_launcher.py 47.18% <0%> (-45.08%) ⬇️
airflow/providers/mysql/operators/mysql.py 55% <0%> (-45%) ⬇️
...viders/cncf/kubernetes/operators/kubernetes_pod.py 69.38% <0%> (-25.52%) ⬇️
airflow/kubernetes/refresh_config.py 50.98% <0%> (-23.53%) ⬇️
airflow/config_templates/airflow_local_settings.py 65.38% <0%> (-6.36%) ⬇️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 967930c...16db3f1. Read the comment docs.

@ashb
Copy link
Member

ashb commented Feb 14, 2020

I've re-targeted this PR against AIRFLOW-3607 as it's a bug fix to a commit that has not yet been included in any release.

@ashb ashb changed the title [AIRFLOW-6779] fix scheduler bug related to task concurrency and depends on past [AIRFLOW-3607] fix scheduler bug related to task concurrency and depends on past Feb 14, 2020
@ashb ashb merged commit edcad79 into apache:master Feb 14, 2020
@houqp houqp deleted the scheduler_fix branch February 14, 2020 20:12
else:
# slow path
for ti in scheduleable_tasks:
if ti.are_dependencies_met(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use get_ready_tis?

galuszkak pushed a commit to FlyrInc/apache-airflow that referenced this pull request Mar 5, 2020
…n past (apache#7402)

commit 50efda5 introduced a bug that
prevents scheduler from scheduling tasks with the following properties:

* has depends on past set to True
* has custom concurrency limit
turbaszek pushed a commit to PolideaInternal/airflow that referenced this pull request Oct 1, 2020
…n past (apache#7402)

commit 50efda5 introduced a bug that
prevents scheduler from scheduling tasks with the following properties:

* has depends on past set to True
* has custom concurrency limit
turbaszek pushed a commit to PolideaInternal/airflow that referenced this pull request Oct 1, 2020
[AIRFLOW-3607] Only query DB once per DAG run for TriggerRuleDep (apache#4751)

This decreases scheduler delay between tasks by about 20% for larger DAGs,
sometimes more for larger or more complex DAGs.

The delay between tasks can be a major issue, especially when we have dags with
many subdags, figures out that the scheduling process spends plenty of time in
dependency checking, we took the trigger rule dependency which calls the db for
each task instance, we made it call the db just once for each dag_run

[AIRFLOW-3607] fix scheduler bug related to concurrency and depends on past (apache#7402)

commit 50efda5 introduced a bug that
prevents scheduler from scheduling tasks with the following properties:

* has depends on past set to True
* has custom concurrency limit

[AIRFLOW-3607] Optimize dep checking when depends on past set and concurrency limit (apache#7503)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Scheduler including HA (high availability) scheduler
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants