Skip to content

Split Apollo trace/metrics exporter configs#8258

Merged
bonnici merged 18 commits intodevfrom
njm/P-1488/split-apollo-telemetry-config
Sep 15, 2025
Merged

Split Apollo trace/metrics exporter configs#8258
bonnici merged 18 commits intodevfrom
njm/P-1488/split-apollo-telemetry-config

Conversation

@bonnici
Copy link
Contributor

@bonnici bonnici commented Sep 11, 2025

The config related to the exporting of Apollo metrics and traces has been separated so that the various configuration can be fine-tuned for each of the Apollo exporters. Each of Apollo OTLP traces, Apollo usage report traces, Apollo OTLP metrics, and Apollo usage report metrics now have their own config. The old telemetry.apollo.batch_processor config will be used if these new config values are not specified. The configuration used will be shown in an info-level log on router startup.


Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

  • PR description explains the motivation for the change and relevant context for reviewing
  • PR description links appropriate GitHub/Jira tickets (creating when necessary)
  • Changeset is included for user-facing changes
  • Changes are compatible1
  • Documentation2 completed
  • Performance impact assessed and acceptable
  • Metrics and logs are added3 and documented
  • Tests added and passing4
    • Unit tests
    • Integration tests
    • Manual tests, as necessary

Exceptions

Note any exceptions here

Notes

Footnotes

  1. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this.

  2. Configuration is an important part of many changes. Where applicable please try to document configuration examples.

  3. A lot of (if not most) features benefit from built-in observability and debug-level logs. Please read this guidance on metrics best-practices.

  4. Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions.

@apollo-librarian
Copy link

apollo-librarian bot commented Sep 11, 2025

✅ Docs preview ready

The preview is ready to be viewed. View the preview

File Changes

0 new, 1 changed, 0 removed
* graphos/routing/(latest)/graphos-reporting.mdx

Build ID: a7520cde1acd5e890e62f11d
Build Logs: View logs

URL: https://www.apollographql.com/docs/deploy-preview/a7520cde1acd5e890e62f11d

@github-actions

This comment has been minimized.

@bonnici bonnici marked this pull request as ready for review September 12, 2025 00:05
@bonnici bonnici requested a review from a team September 12, 2025 00:05
@bonnici bonnici requested a review from a team as a code owner September 12, 2025 00:05
@bonnici bonnici requested a review from a team September 12, 2025 00:05
Comment on lines +4 to +17
```
telemetry:
apollo:
batch_processor:
scheduled_delay: 5s
max_export_timeout: 30s
max_export_batch_size: 512
max_concurrent_exports: 1
max_queue_size: 2048
```

To:

```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add yaml to the right of the opening triple ticks? That'll let github format it as yaml. e.g.

here:
  is:
    number: 15

vs.

here:
  is:
    number: 15

# Config for Apollo OTLP metrics. Note that some metrics like config values have a non-configurable scheduled_delay.
otlp:
exporter:
scheduled_delay: 13s # only applies to realtime metrics (not config metrics)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make this more obvious in the config? e.g. metrics.otlp.realtime.exporter.scheduled_delay?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to split this config into 2 separate configs since we're already at 4 and that's a lot. And the max_export_timeout config applies to both realtime and non-realtime. I think I should just update the comment to be a bit more definite so it's obvious about what the scheduled_delay controls and doesn't control.

},
"max_queue_size": {
"default": 2048,
"description": "The maximum queue size to buffer spans for delayed processing. If the queue gets full it drops the spans. The default value of is 2048.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: The default value of is --> The default value is

Ok(builder.with_span_processor(
BatchSpanProcessor::builder(exporter, NamedTokioRuntime::new("apollo-tracing"))
.with_batch_config(self.batch_processor.clone().into())
.with_batch_config(self.traces.otlp.exporter.clone().into()) // todo?? which one to pick
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO here needs resolution

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tough question. Looking at this through the lens that by default most folks on Router 2.x will be using OTLP tracing I'd lean in that direction. You could have some logic that decides based on the sampler which one will win (like if 50% or more OTLP, than choose that one). Can't wait to sunset the old Apollo trace exporter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually do we need to distinguish between OTLP and legacy for this config currently? Like maybe we can just have one Apollo tracing configuration that covers traces for both OTLP & usage reporting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops - I did decide that the traces OTLP setting is the right one to go with, since this is used only for OTLP, but I just didn't remove the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've combined the two tracing configs now, so there shouldn't be any strange behaviour here.

Copy link
Contributor

@timbotnik timbotnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from tests failing and one small nit, LGTM.

bonnici and others added 2 commits September 15, 2025 17:49
Co-authored-by: timbotnik <tim@apollographql.com>
Copy link
Contributor

@mabuyo mabuyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs LGTM!

@bonnici bonnici merged commit 1d95f86 into dev Sep 15, 2025
15 checks passed
@bonnici bonnici deleted the njm/P-1488/split-apollo-telemetry-config branch September 15, 2025 23:16
@abernix abernix mentioned this pull request Sep 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants