-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optionally return error if two metrics would collide under escaping schemes #1638
Comments
Can agree, if that do not impact users which do not use advance characters.
Why not ONLY check for escaped duplicates? The non escaped collision will be caught when escaped, no? Also can I double check UTF-8 behaviour here: The escaped mechanism is only on scrape time, right? Like client would never escape for you, it's a scrape logic only, right?
Sorry I don't get it. What would collide in Japanese exactly? If any Japanese metrics/labels will cause collision it will cause scrape to literally ignore rest and take first one OR fail scrape anyway, no? |
I also don't get #1641 goal. I thought this issue is to fail on registration, which is generally best-effort (e.g. there are cases of unchecked collectors etc) and that's fine. However this PR is to fail /metrics call on this? Isn't that... wrong? I would rather have duplicate than your application fail to give any metric? Do we have a precedense? Do we fail handling /metrics in such collisions.. on scrape? |
This is on me, I've added feedback asking to not to this on registration because, as you said, this is a scrape problem and I wanted to move the solution closer to the scrape.
I had an understanding the we want to provide a scrape as a transaction. So we provide everything that was registered or we provide nothing, there's no partial result here.
I believe we started failing since Prometheus 2.52 |
yes, this only applies to scraping. I don't believe there is any content negotiation for remote write so it doesn't apply there.
I'm not sure what you mean... non-escaped collisions are caught at registration time in the existing code. But furthermore, the actual looking for collisions happens at registration time to avoid the CPU hit of checking for collisions on every scrape.
If the metric names are in Japanese characters, all the characters will be converted to underscores. Any two metrics of the same length of characters would collide because they will be all underscores. This applies to any non-english character set. I also think maybe we should go back to failing at registration time since that's when the other collisions are caught. The trouble is that creates error messages for people who don't care about these collisions. This is an edge case and I'm not sure how often this will happen. One option is we could only check for these collisions when an option is turned on, and mention that option in the documentation for users setting up a system where there is a mix of UTF-8 and legacy. At the end of the day maybe we should have skipped escaping and just refused to serve UTF-8 metrics to legacy systems :/. |
Thanks, makes sense!
Yea, sounds like some option to registry or considering this only for PedenaticRegistry or something.
Yea, I would love to check the impact of not doing this collision work on escaping, e.g. who will actually use escaping mechanism. |
I'd like to know more about the pedanticregistry and where it gets used. |
Hypothetically, if a system defined two metrics that are different in UTF-8 but the same when escaped, the escaped text exposition would show two entries for the "same" metric. (In general users shouldn't do this, but you know what they say about users.) We should help the user avoid doing that.
At registration time, we should check for not only exact duplicates but also escaped duplicates. Since the default UnderscoreEscaping scheme is the most collision-prone, we will always use that method for testing equivalency. If a collision is found, return an error by default. We can add an optional way to skip this check in case it is intended.
Why not always use this method of detection? If users create metrics in, say, Japanese, many metrics would escape to "____" and collide. So this needs to be optional.
The text was updated successfully, but these errors were encountered: