-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Open Telemetry the default instead of StatsD for monitoring #40800
Comments
Hi @kaxil I'm interested in this issue, may I have a try for this, thanks! |
Awesome, assigned it to you |
Hi @josix |
also provide a simple grafana dashboard in the example stack ( cause the main airflow grafana dashboard open source is based on statsd ) would help a lot |
@howardyoo is also interested in working on this. He led AIP-49, so @josix if you need help please let Howard no. @howardyoo is equally interested in leading this effort too |
Yeah, I just checked out the AIP and the sharing in the Airflow Submit these days, and currently worked on studying the codebase around StatsD and OTel. I believe it would be a better choice for @howardyoo to lead this topic 🙂 Please feel free to unassign me and assign sub-items to me if possible, thanks for the coordination. |
I am interested too in this. Please assign me any tasks needed to be done. |
@howardyoo Could you add a comment, please? GitHub doesn't allow me to assign you an issue until you have commented on it since you aren't part of the Apache org |
Ok, done! |
Awesome, assigned it to you. |
Oh, I guess the above mentioned PR draft got released, which is nice - but looks like it may be on hold for now. |
Will StatsD support be dropped with AF 3 or will OTEL become primary for AF 3 with StatsD as an alternative/backup for N versions to ensure there is a smooth and safe migration path to OTEL? |
My opinion is not to drop statsD support immediately, but have OTEL switch over to statsD as primary, and statsD can still be alternative/backup. |
Ok but what are your arguments in favor of this policy? |
My favor to this policy, is that
|
Yes otel is better ,but why keep statsd ? |
I was originally promoting the idea of attempting to use I think there are certain limitations (metrics name length for one) that make it either difficult or impossible or at the very least imperfect. So I am in favour of getting OTEL as "default" interface, but leaving (at least for a long while) statsd as an optional legacy mechanism - available to those who already monitor their airflow with statsd. The thing is that it's not very difficult to keep both. Metrics change rarely, but most importantly - keeping statsd metrics in the code (since those are only metrics and we keep all the data to produce for OTEL anyway) is not a big overhead. Yes, it introduces some duplication and possibly the OTEL vs. STATSD metrics will - over time - diverge even more, but that's quite fine. If we focus on improving and making OTEL metrics more useful while treat Statsd as legacy, read-only that we do not actively maintain, this will make OTEL even more attractive over time and we could even recommend people - if you want to get better / improved metrics go OTEL if you are stil using statsd. But we should give people more time to do so and let them do it independently from the decision on migration to Airflow 3. I see statsd as a way how we can improve adoption of Airflow 3. If we expect future Airflow 3 users to migrate quickly, adding them "yet-another-thing-to-learn-and-configure" gives compounding effect on the difficulty of migrateion and might be a factor where they decide to defer migration decision. And since it does not cost us a lot of maintenance - this is a "cheap" overhead. |
@potiuk beat me into responding! |
My apologies for misrepresenting your stand on that. I shouldn't have spoken for you. I have mentioned my solution in the past but perhaps I need to codify it, or open an Issue and let someone else tackle it. Currently there are three metrics backends (called loggers) StasD, OTel, and Datadog (which as far as I can tell is a variant on StatsD which supports tagging). The biggest compatibility issue is that OTel has a much shorter max name length (63 characters), but supports tagging. StatsD has a much longer (300 character?) name length limit but does not support tagging. Solution: Add a "generate name" abstract method to base metrics logger, something like:
Each logger will then need to implement this and call it before actually emitting the metric. For otel, it would just return name. for StatsD it would return the name with the tags dict concatenated onto it, something more or less like this should work but there's likely a better way
Then we can stop double-emitting, remove the "we have to truncate this name" name length warning, and let each logger handle naming errors however it sees fit. [EDIT]
|
Yup, I agree with this |
Hi @ferruzzi ,i don't know if you see this , but opentelemetry specification goes from 63 characters to 255 open-telemetry/opentelemetry-specification#3648 .
|
Thanks for pointing it out. We have a fix for it coming in #43340 |
Currently, Apache Airflow uses StatsD for metrics collection and monitoring. To modernize our observability stack and align with industry standards, we should adopt OpenTelemetry as the primary metrics collection and monitoring tool for Airflow 3 while keeping Statsd as an option. This is made easier with AIP-49 Open Telemetry Support.
Backward Compatibility / Migration
Most of the enterprise Airflow deployments rely on StatsD for monitoring & alerting so we should make sure that there is a smooth migration path. Ideally, all the StatsD metrics mentioned in https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/logging-monitoring/metrics.html#metric-descriptions should have a like-by-like replacement.
High-level Objectives:
The text was updated successfully, but these errors were encountered: