Skip to content

[processor/elasticapmprocessor] align ECS OTLP metrics with apm-data#1122

Merged
lanre-ade merged 9 commits into
elastic:mainfrom
lanre-ade:fix-otlp-metric-discrepancies
Apr 10, 2026
Merged

[processor/elasticapmprocessor] align ECS OTLP metrics with apm-data#1122
lanre-ade merged 9 commits into
elastic:mainfrom
lanre-ade:fix-otlp-metric-discrepancies

Conversation

@lanre-ade

@lanre-ade lanre-ade commented Mar 30, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Align elasticapmprocessor ECS OTLP metric handling with apm-data by translating datapoint attributes to labels.* / numeric_labels.*, defaulting missing attributes (e.g telemetry.sdk.language), matching truncation behavior for preserved resource and metric fields etc
  • Refactor metric enrichment to follow the log-style enrichment flow
  • Preserve service.framework.name and service.framework.version for raw OTLP ECS metrics

Test plan

  • go test ./... in processor/elasticapmprocessor

@lanre-ade lanre-ade changed the title [processor/elasticapmprocessor] align ECS OTLP metric fallback with apm-data [processor/elasticapmprocessor] align ECS OTLP metrics with apm-data Apr 1, 2026
@lanre-ade lanre-ade marked this pull request as ready for review April 1, 2026 22:47
@lanre-ade lanre-ade requested review from a team as code owners April 1, 2026 22:47
@coderabbitai

coderabbitai Bot commented Apr 1, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: cf86a8d7-b299-4111-bd82-c55f5a0f6441

📥 Commits

Reviewing files that changed from the base of the PR and between 7fad27d and bbf7f5f.

📒 Files selected for processing (1)
  • processor/elasticapmprocessor/internal/ecs/ecs_translation.go

📝 Walkthrough

Walkthrough

Reworks ECS attribute translation from predicate/allowlist helpers to explicit per-key switch logic for resources and logs, and adds exported TranslateMetricDataPointAttributes for metric datapoints. Adds EnrichMetricDataPoints and updates EnrichMetrics to apply datapoint-level enrichment. Introduces config field Metric.TranslateUnsupportedAttributes and enables it by default for the non-intake ECS metric enricher (disabled for intake). Changes sanitize existing label keys in-place, truncate selected preserved string attributes, and move remaining keys into labels.* / numeric_labels.*. Tests and testdata expanded to cover translation, truncation, sanitization, and batch-origin behavior.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • 🛠️ Update Documentation: Commit on current branch
  • 🛠️ Update Documentation: Create PR

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@processor/elasticapmprocessor/processor.go`:
- Around line 216-222: The code creates two different configs (ecsEnricherConfig
and intakeECSEnricherConfig) but still selects a single enricher from
resourceMetrics.At(0) and applies it to the whole pmetric.Metrics batch; change
the enrichment flow so you either (A) select and apply the correct enricher per
ResourceMetrics (iterate over each ResourceMetrics and pick the enricher using
its Resource/Schema/Origin) or (B) partition the incoming pmetric.Metrics by
origin (OTLP vs intake), build an enricher per partition from
ecsEnricherConfig/intakeECSEnricherConfig, and run enrichment separately on each
partition; update the code paths that currently call the single-batch enricher
(the selection that reads resourceMetrics.At(0) and the apply step that operates
on pmetric.Metrics) so enrichment is done per-resource or per-partition,
ensuring enrichments/enricher.go is invoked with the matching config for each
ResourceMetrics.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5c4554d2-48c3-4c1b-a05d-957225a3b617

📥 Commits

Reviewing files that changed from the base of the PR and between 5a7856c and 416e868.

📒 Files selected for processing (12)
  • processor/elasticapmprocessor/internal/ecs/ecs_translation.go
  • processor/elasticapmprocessor/internal/ecs/ecs_translation_test.go
  • processor/elasticapmprocessor/internal/enrichments/config/config.go
  • processor/elasticapmprocessor/internal/enrichments/enricher.go
  • processor/elasticapmprocessor/internal/enrichments/metric.go
  • processor/elasticapmprocessor/internal/enrichments/metric_test.go
  • processor/elasticapmprocessor/internal/enrichments/resource.go
  • processor/elasticapmprocessor/processor.go
  • processor/elasticapmprocessor/processor_test.go
  • processor/elasticapmprocessor/testdata/ecs/elastic_hostname/metrics_output.yaml
  • processor/elasticapmprocessor/testdata/skip_enrichment/metrics_false_ecs_output.yaml
  • processor/elasticapmprocessor/testdata/skip_enrichment/metrics_true_ecs_output.yaml

Comment on lines +216 to +222
ecsEnricherConfig.Resource.DefaultServiceLanguage.Enabled = true
ecsEnricherConfig.Metric.TranslateUnsupportedAttributes.Enabled = true

intakeECSEnricherConfig := ecsEnricherConfig
intakeECSEnricherConfig.Resource.HostOSType.Enabled = false
intakeECSEnricherConfig.Resource.DefaultServiceLanguage.Enabled = false
intakeECSEnricherConfig.Metric.TranslateUnsupportedAttributes.Enabled = false

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

The new ECS metric config split is unsafe with batch-wide enricher selection.

With Lines 216-222, OTLP and intake ECS metrics now use different enrichment behavior. Lines 241-247 still choose one enricher from resourceMetrics.At(0), and Lines 271-272 apply it to the entire pmetric.Metrics; processor/elasticapmprocessor/internal/enrichments/enricher.go:92-112 confirms the full batch is processed with that single config. Any mixed ECS batch will therefore mis-enrich later resources (OTLP fallback skipped, or intake datapoints relabeled). Please split by origin or enrich per ResourceMetrics.

As per coding guidelines, "Highlight only issues that could cause runtime errors, data loss, or severe maintainability issues."

Also applies to: 241-247, 271-272

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@processor/elasticapmprocessor/processor.go` around lines 216 - 222, The code
creates two different configs (ecsEnricherConfig and intakeECSEnricherConfig)
but still selects a single enricher from resourceMetrics.At(0) and applies it to
the whole pmetric.Metrics batch; change the enrichment flow so you either (A)
select and apply the correct enricher per ResourceMetrics (iterate over each
ResourceMetrics and pick the enricher using its Resource/Schema/Origin) or (B)
partition the incoming pmetric.Metrics by origin (OTLP vs intake), build an
enricher per partition from ecsEnricherConfig/intakeECSEnricherConfig, and run
enrichment separately on each partition; update the code paths that currently
call the single-batch enricher (the selection that reads resourceMetrics.At(0)
and the apply step that operates on pmetric.Metrics) so enrichment is done
per-resource or per-partition, ensuring enrichments/enricher.go is invoked with
the matching config for each ResourceMetrics.

if isAggregatedMetricDataPointAttributes(attributes) {
return
}
if cfg.Metric.TranslateUnsupportedAttributes.Enabled {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of TranslateUnsupportedAttributes, when will we use this?

return true
}

switch k {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see telemetry.sdk.elastic_export_timestamp (ref), is it intentionally not handled?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lahsivjar! I missed this one

string(semconv.NetworkCarrierNameKey),
string(semconv.NetworkConnectionSubtypeKey),
string(semconv.NetworkConnectionTypeKey),
string(semconv.ProcessExecutableNameKey),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apm-data doesn't seem to handle process executable name and process parent pid, any reason we are handling them here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was already preserved by the previous ECS allowlist so I kept that behavior

string(semconv.ProcessExecutableNameKey),
string(semconv.ProcessParentPIDKey),
string(semconv.ProcessPIDKey),
string(semconv.ServiceNameKey),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think this is also handled differently here, shouldn't we be sanitizing this as we do in apm-data: https://github.com/elastic/apm-data/blob/7da222dcc0320f9c812c5d72f65f830c838aae11/input/otlp/metadata.go#L54

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sanitization still happens, but in EnrichResource() (with sanitizeServiceName) rather than in TranslateResourceMetadata()

A lot of the implementation differences are because we mutate the original OTLP resource in place, instead of translating it into a separate APM model like apm-data does. Using config flags to gate the behaviour also makes the location of the logic different even though the outcome for service.name is still the same.

elasticattr.CloudProjectName,
elasticattr.DataStreamDataset,
elasticattr.DataStreamNamespace,
elasticattr.DataStreamType,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Datastream type doesn't seem to be handled in apm-data(ref), any reason we are handling it here?

@lanre-ade lanre-ade Apr 10, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this predates my refactor, it was already part of the existing ECS allowlist so I kept that behavior but I think it's inclusion is intentional.apm-data can derive stream type later in its event pipeline but for the collector/exporter ECS path we rely on data_stream.* in the attributes for routing/index selection before the final document is emitted.

@lahsivjar lahsivjar left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving it to unblock. We might need to revisit some stuff later.

@lanre-ade lanre-ade merged commit 39abcbb into elastic:main Apr 10, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants