Skip to content

feat: Client awareness/telemetry: reject invalid values for library name/version#8934

Merged
BoD merged 13 commits intodevfrom
telemetry-restrict-values-client-library-name-version
Mar 20, 2026
Merged

feat: Client awareness/telemetry: reject invalid values for library name/version#8934
BoD merged 13 commits intodevfrom
telemetry-restrict-values-client-library-name-version

Conversation

@BoD
Copy link
Copy Markdown
Contributor

@BoD BoD commented Mar 2, 2026

We noticed a lot of 'bogus' values in the dashboard with library names/versions - people or tools trying injection hacks. While this isn't actually a security concern, these values are annoying as they pollute the dashboard. So let's restrict what's allowed for these values and ignore invalid ones.

Using the regex ^[ a-zA-Z0-9.@/_\-]{1,60}$ for now but open to suggestions if you think it's too restrictive or permissive (for instance no strong opinion on whether space should be allowed).

https://apollographql.atlassian.net/browse/GRAPHOS-124


Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

  • PR description explains the motivation for the change and relevant context for reviewing
  • PR description links appropriate GitHub/Jira tickets (creating when necessary)
  • Changeset is included for user-facing changes
  • Changes are compatible1
  • Documentation2 completed
  • Performance impact assessed and acceptable
  • Metrics and logs are added3 and documented
  • Tests added and passing4
    • Unit tests
    • Integration tests
    • Manual tests, as necessary

Exceptions

Note any exceptions here

Notes

Footnotes

  1. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this.

  2. Configuration is an important part of many changes. Where applicable please try to document configuration examples.

  3. A lot of (if not most) features benefit from built-in observability and debug-level logs. Please read this guidance on metrics best-practices.

  4. Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions.

@apollo-librarian
Copy link
Copy Markdown
Contributor

apollo-librarian bot commented Mar 2, 2026

✅ Docs preview has no changes

The preview was not built because there were no changes.

Build ID: 650226006a6cce00b42f7fdc
Build Logs: View logs


✅ AI Style Review — No Changes Detected

No MDX files were changed in this pull request.

Review Log: View detailed log

This review is AI-generated. Please use common sense when accepting these suggestions, as they may not always be accurate or appropriate for your specific context.

@github-actions

This comment has been minimized.

@BoD BoD force-pushed the telemetry-restrict-values-client-library-name-version branch from 66df8dc to 203f9a3 Compare March 2, 2026 15:55
@BoD BoD marked this pull request as ready for review March 2, 2026 16:12
@BoD BoD requested a review from a team as a code owner March 2, 2026 16:12
Copy link
Copy Markdown
Member

@calvincestari calvincestari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, with some questions.

Comment thread apollo-router/src/plugins/telemetry/valid_value.rs Outdated
Comment thread apollo-router/tests/apollo_reports.rs Outdated
Comment thread apollo-router/src/plugins/telemetry/valid_value.rs Outdated
@BoD BoD marked this pull request as draft March 3, 2026 10:27
@BoD
Copy link
Copy Markdown
Contributor Author

BoD commented Mar 3, 2026

Moving back to draft, after some discussions, rather than silently ignore the values, it makes more sense to disallow them and fail with a 400. That way we'll get some traces.

@carodewig
Copy link
Copy Markdown
Contributor

I'm curious about the idea that 'it makes more sense to disallow them and fail with a 400'. I would think it's better to allow them and sanitize them (ie strip non-matching chars from the regex)? Rejecting a request because of it being nonsense to our telemetry (not the user running the router) seems like overkill to me - but willing to be overruled

@BoD
Copy link
Copy Markdown
Contributor Author

BoD commented Mar 3, 2026

@carodewig The idea was that all the 'bad values' we see in the data are malicious (injecting scripts and such) - which we never want? So why fulfill the request?
Also when failing we can log the reason to later have an idea of how much this is happening (although we could also do that even if we don't fail the requests?)

@phryneas
Copy link
Copy Markdown
Member

phryneas commented Mar 3, 2026

The only outliers not matching these regular expressions were attempts at persisted XSS attacks like this:

image

If we fail those, the resulting 400 might be used as a good signal to find these attacks in logs, and to maybe trigger a WAF on a user triggering hundreds or thousands of failed requests in a very short amount of time.
Silently ignoring them would just mean more work on the server caused by malicious requests.

It also reduces the risk that other malicious payloads that might come in e.g. in the request body cause any harm.

@BoD BoD force-pushed the telemetry-restrict-values-client-library-name-version branch 2 times, most recently from 566a084 to ed5ad00 Compare March 9, 2026 17:47
@BoD BoD changed the title feat: Client awareness/telemetry: ignore invalid values for client/library name/version feat: Client awareness/telemetry: reject invalid values for library name/version Mar 9, 2026
@BoD BoD marked this pull request as ready for review March 9, 2026 20:43
@BoD
Copy link
Copy Markdown
Contributor Author

BoD commented Mar 10, 2026

I've updated this PR:

  • only look at library name/version - after looking at prod values for client name/version, there is a wide variety and we shouldn't assume anything
  • fail early with 400 for invalid values

@BoD BoD force-pushed the telemetry-restrict-values-client-library-name-version branch from 9654cd5 to a996ef8 Compare March 10, 2026 08:41
@calvincestari
Copy link
Copy Markdown
Member

I've updated this PR:

  • only look at library name/version - after looking at prod values for client name/version, there is a wide variety and we shouldn't assume anything
  • fail early with 400 for invalid values

Worth remembering:

  • client name/version can be set by users; yes, we should expect a wide variety and assume nothing. Since they can set it they have a valid expectation to see any value they set there.
  • library name/version is intentionally difficult to change. So wild variations of these are good candidates for controlling.

Something else worth asking is - do we want to disallow third-party clients that identify themselves? We don't document the library name/version extensions but there's nothing stopping other clients from sending them. Do we really want to fail these requests? I'm not arguing either way, just wanting to make sure we consider that question.

@BoD
Copy link
Copy Markdown
Contributor Author

BoD commented Mar 11, 2026

Something else worth asking is - do we want to disallow third-party clients that identify themselves? We don't document the library name/version extensions but there's nothing stopping other clients from sending them. Do we really want to fail these requests? I'm not arguing either way, just wanting to make sure we consider that question.

I don't think we want to disallow that - but we do want to disallow the 'hacky' values we see there. IMO the regex is permissive enough for most reasonable values for a library name/version.

Copy link
Copy Markdown
Contributor

@carodewig carodewig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still concerned about this. I think we are well within our rights to sanitize information that comes into our own telemetry collectors - but this PR outright rejects requests because of a header in a way that users can't control.

People might spoof the client library name against our infrastructure to attempt a DDOS, but I think it's reasonable for other enterprises to have their own differing standards for this field.

My preference would be to:

  • Use this regex at the telemetry exporter layer to send only the characters we accept to studio
  • Document that behavior so that it's understood that what is in studio might be a filter of what was actually provided

@abernix
Copy link
Copy Markdown
Member

abernix commented Mar 11, 2026

One other thing brought up in conversations was: the 4xx gets out ahead of something that is more difficult to back-pedal on later. Systems have limits, and relaxing limits is less destructive than putting them in later (in which cases they manifest as breaking changes and config flags). We can always relax it later when someone opens an issue; blocking things that seem ornery early drives feedback

Random example, should we let someone send a 1MB value in this field? Not for now! Let's cross that bridge with a use case.

@carodewig
Copy link
Copy Markdown
Contributor

@abernix I completely agree that it's better to relax limits than put them in later - the main reason I'm worried about this is because I thought this was 'adding a limit later'? If it's not, I withdraw my objection!

@phryneas
Copy link
Copy Markdown
Member

@carodewig I believe we can check these RegEx against values that have so far been sent to the service, and spot if there are any legitimate usages that would have been denied by it. Going forward, this would just be a ground rule for any new usages that is "already in place".

Would that suffice?

@BoD
Copy link
Copy Markdown
Contributor Author

BoD commented Mar 12, 2026

Document that behavior so that it's understood that what is in studio might be a filter of what was actually provided

Currently library names/versions are not visible in Studio, only the client names/versions are. (Right? I'm actually not 100% sure but I think that's the case 😅).

@abernix
Copy link
Copy Markdown
Member

abernix commented Mar 12, 2026

@abernix I completely agree that it's better to relax limits than put them in later - the main reason I'm worried about this is because I thought this was 'adding a limit later'? If it's not, I withdraw my objection!

@carodewig You're absolutely taking the right angle here. And you're right, this is adding a limit later — something we should avoid. My "what can we shift-left" thoughts here are that this pattern is a good thing to look for when we're reviewing new additions, rather than finding out about it later.

So, for right now? I think at this point it's early enough in this feature's lifetime to get this in place before it gets worse. This is a particularly comfy place for us right now because this particular feature produces data we can analyse — in fact, that's the entire motivation for this PR, which is a set of data that shows what the upper bounds of the current system are. If we tighten up those upper bounds right now, we do ourselves the service of stopping the bleeding.

If we didn't have that data, we'd actually probably have to be a bit more defeated and conservative with how we fix this. (Honestly, I think this ticket is more of a fix than a feat.)

@abernix
Copy link
Copy Markdown
Member

abernix commented Mar 12, 2026

Currently library names/versions are not visible in Studio,

I'm not sure if there's a perfect place in our docs to put this, but we should still document these limits somewhere. Folks will want to debug why they are getting a particular code, and — for example — having a short-link (go.apollo.dev/o/blah) in the server error log or attached to an error code that is emitted in metrics that points to the docs would be mint.

@BoD
Copy link
Copy Markdown
Contributor Author

BoD commented Mar 12, 2026

I could add to the header configuration description something like Valid values must match the regex <regex>. Invalid values result in a 400 response. And a tracing::warn pointing to it when we 400.

@abernix abernix requested a review from rohan-b99 March 17, 2026 09:38
Copy link
Copy Markdown
Contributor

@rohan-b99 rohan-b99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved based on implementation and earlier discussion, but I think adding a line to the docs as you mentioned above is worth doing

@carodewig
Copy link
Copy Markdown
Contributor

Appreciate the responses!

Since we can vet this against data we've already collected, I'm onboard now (provided the docs are updated per the discussion above)

@BoD BoD force-pushed the telemetry-restrict-values-client-library-name-version branch from a996ef8 to 3d31407 Compare March 19, 2026 16:49
@BoD
Copy link
Copy Markdown
Contributor Author

BoD commented Mar 20, 2026

Added a note to the configuration description and a warn trace. Merging now.

Thanks all! 🙏

@BoD BoD merged commit 053562b into dev Mar 20, 2026
15 checks passed
@BoD BoD deleted the telemetry-restrict-values-client-library-name-version branch March 20, 2026 09:12
@abernix abernix mentioned this pull request Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants