Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduled_duration metric is rarely sent, and with too many tags #45285

Open
1 of 2 tasks
noamst-monday opened this issue Dec 30, 2024 · 4 comments
Open
1 of 2 tasks

Scheduled_duration metric is rarely sent, and with too many tags #45285

noamst-monday opened this issue Dec 30, 2024 · 4 comments
Labels
area:core area:metrics kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet

Comments

@noamst-monday
Copy link

noamst-monday commented Dec 30, 2024

Apache Airflow version

2.10.4

If "Other Airflow 2 version" selected, which one?

No response

What happened?

Hello!
We are running Airflow 2.10.4 on EKS version 1.30.6.
I set up statsd metric collection as described here.
Using datadog agent running as a daemonset on each node to collect the metrics.
I am able to receive and search all metrics successfully in datadog, but task.scheduled_duration is very rarely sent, and when it's sent the numbers and tagging don't make sense to me.
For example, over the last week I only have a single datapoint at 3.26k, and it seems like it's tagged with several task_ids and dag_ids.

What you think should happen instead?

I expect to receive the metric every time a task is scheduled, and tagged correctly with only the relevant task_id and dag_id.

How to reproduce

Deployment details listed below, please let me know if there are any other missing information that might be relevant.

Operating System

Debian GNU/Linux 12 (bookworm)

Versions of Apache Airflow Providers

apache-airflow==2.10.4
apache-airflow-providers-amazon==9.1.0
apache-airflow-providers-celery==3.8.5
apache-airflow-providers-cncf-kubernetes==10.0.1
apache-airflow-providers-common-compat==1.2.2
apache-airflow-providers-common-io==1.4.2
apache-airflow-providers-common-sql==1.20.0
apache-airflow-providers-dbt-cloud==3.11.2
apache-airflow-providers-docker==3.14.1
apache-airflow-providers-elasticsearch==5.5.3
apache-airflow-providers-fab==1.5.1
apache-airflow-providers-ftp==3.11.1
apache-airflow-providers-google==11.0.0
apache-airflow-providers-grpc==3.6.0
apache-airflow-providers-hashicorp==3.8.0
apache-airflow-providers-http==4.13.3
apache-airflow-providers-imap==3.7.0
apache-airflow-providers-microsoft-azure==11.1.0
apache-airflow-providers-mysql==5.7.4
apache-airflow-providers-odbc==4.8.1
apache-airflow-providers-openlineage==1.14.0
apache-airflow-providers-postgres==5.14.0
apache-airflow-providers-redis==3.8.0
apache-airflow-providers-sendgrid==3.6.0
apache-airflow-providers-sftp==4.11.1
apache-airflow-providers-slack==8.9.2
apache-airflow-providers-smtp==1.8.1
apache-airflow-providers-snowflake==5.8.1
apache-airflow-providers-sqlite==3.9.1
apache-airflow-providers-ssh==3.14.0

Deployment

Official Apache Airflow Helm Chart

Deployment details

Airflow 2.10.4 on EKS 1.30, using official helm chart version 1.13.1.

Datadog version - datadog/agent:7.56.2

Airflow metrics configuration:

[metrics]
metrics_use_pattern_match = False
metrics_allow_list =
metrics_block_list =
statsd_on = true
statsd_host = 10.128.31.107
statsd_port = 8125
statsd_prefix = airflow
stat_name_handler =
statsd_datadog_enabled = True
statsd_datadog_tags = project:dbt
statsd_datadog_metrics_tags = True
statsd_disabled_tags = job_id,run_id
statsd_influxdb_enabled = False

Anything else?

Thank you, and please let me know if there are any additional details I can provide to help triage or reproduce this issue.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@noamst-monday noamst-monday added area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Dec 30, 2024
Copy link

boring-cyborg bot commented Dec 30, 2024

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

@dosubot dosubot bot added the area:metrics label Dec 30, 2024
@noamst-monday noamst-monday changed the title Scheduled_duration metric is rarely sent Scheduled_duration metric is rarely sent, and with too many tags Dec 30, 2024
@ferruzzi
Copy link
Contributor

ferruzzi commented Jan 9, 2025

I'm not really familiar with datadog, but what are you using to view the emitted metric? Is it possible to filter the metric by those tags? I suspect that the 3.26k you are seeing is the aggregate and you should be able to drill down into that using the tags?

Do you see other tagged timers working the way you expect them to?

@noamst-monday
Copy link
Author

Hey, thanks for replying
Other timers are working well, and I am able to filter them by tags.
For example, here is a graph showing scheduled_duration, and queued_duration.
Scheduled_duration is broken down by task_id, and queued_duration is without breakdown ( due to the large number of different tasks/dags)
image

You can see there's only a single data point for scheduled_duration, while queued_duration is graphed continuously.
Also, the data is available for just a single task and dag.

@ferruzzi
Copy link
Contributor

hm. Both of those are defined and emitted together, I think (here) so it is odd that one would work but not the other.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core area:metrics kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet
Projects
None yet
Development

No branches or pull requests

2 participants