Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,8 @@ You can use the following arguments with `otelcol.connector.servicegraph`:
| `latency_histogram_buckets` | `list(duration)` | Buckets for latency histogram metrics. | `["2ms", "4ms", "6ms", "8ms", "10ms", "50ms", "100ms", "200ms", "400ms", "800ms", "1s", "1400ms", "2s", "5s", "10s", "15s"]` | no |
| `metrics_flush_interval` | `duration` | The interval at which metrics are flushed to downstream components. | `"60s"` | no |
| `store_expiration_loop` | `duration` | The time to expire old entries from the store periodically. | `"2s"` | no |
| `virtual_node_extra_label` | `bool` | Adds an extra `virtual_node` label with an optional value of `client` or `server`, indicating which node is the uninstrumented one. | `false` | no |
| `virtual_node_peer_attributes` | `list(string)` | The list of attributes used to identify virtual node peer. | `["peer.service", "db.name", "db.system"]` | no |

Service graphs work by inspecting traces and looking for spans with parent-children relationship that represent a request.
`otelcol.connector.servicegraph` uses OpenTelemetry semantic conventions to detect a myriad of requests.
Expand Down Expand Up @@ -114,6 +116,19 @@ When `metrics_flush_interval` is set to `0s`, metrics will be flushed on every r

The attributes in `database_name_attributes` are tried in order, selecting the first match.

`virtual_node_peer_attributes` is useful when an OTel-instrumented client sends a request to a service that isn't OTel-instrumented.
Normally, `otelcol.connector.servicegraph` can't pair the client span with the server span.
When an edge expires, `otelcol.connector.servicegraph` checks if it has peer attributes listed in `virtual_node_peer_attributes`.
If it finds an attribute, `otelcol.connector.servicegraph` aggregates the metrics with a virtual node.

If no client span is found and `virtual_node_peer_attributes` is not an empty list,
then the service span will be paired with a virtual node called `client="user"`.
This is useful when a client that isn't OTel-instrumented (like a web browser) sends a request to an OTel-instrumented service.
Without a virtual node, the client span is missing, and the server span expires without being paired.

Attributes configured in the `virtual_node_peer_attributes` argument are ordered by priority, with earlier attributes having higher priority.
An empty list disables the creation of a virtual node.

[Span Kind]: https://opentelemetry.io/docs/concepts/signals/traces/#span-kind

## Blocks
Expand Down
31 changes: 11 additions & 20 deletions internal/component/otelcol/connector/servicegraph/servicegraph.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ import (
"github.com/open-telemetry/opentelemetry-collector-contrib/connector/servicegraphconnector"
otelcomponent "go.opentelemetry.io/collector/component"
"go.opentelemetry.io/collector/pipeline"
semconv "go.opentelemetry.io/otel/semconv/v1.25.0"
)

func init() {
Expand Down Expand Up @@ -50,9 +51,9 @@ type Arguments struct {
// StoreExpirationLoop defines how often to expire old entries from the store.
StoreExpirationLoop time.Duration `alloy:"store_expiration_loop,attr,optional"`
// VirtualNodePeerAttributes the list of attributes need to match, the higher the front, the higher the priority.
//TODO: Add VirtualNodePeerAttributes when it's no longer controlled by
// the "processor.servicegraph.virtualNode" feature gate.
// VirtualNodePeerAttributes []string `alloy:"virtual_node_peer_attributes,attr,optional"`
VirtualNodePeerAttributes []string `alloy:"virtual_node_peer_attributes,attr,optional"`
// VirtualNodeExtraLabel enables the `virtual_node` label to be added to the spans.
VirtualNodeExtraLabel bool `alloy:"virtual_node_extra_label,attr,optional"`

// MetricsFlushInterval is the interval at which metrics are flushed to the exporter.
// If set to 0, metrics are flushed on every received batch of traces.
Expand Down Expand Up @@ -115,20 +116,11 @@ func (args *Arguments) SetToDefault() {
Dimensions: []string{},
CacheLoop: 1 * time.Minute,
StoreExpirationLoop: 2 * time.Second,
DatabaseNameAttributes: []string{"db.name"},
MetricsFlushInterval: 60 * time.Second,
//TODO: Add VirtualNodePeerAttributes when it's no longer controlled by
// the "processor.servicegraph.virtualNode" feature gate.
// VirtualNodePeerAttributes: []string{
// semconv.AttributeDBName,
// semconv.AttributeNetSockPeerAddr,
// semconv.AttributeNetPeerName,
// semconv.AttributeRPCService,
// semconv.AttributeNetSockPeerName,
// semconv.AttributeNetPeerName,
// semconv.AttributeHTTPURL,
// semconv.AttributeHTTPTarget,
// },
DatabaseNameAttributes: []string{string(semconv.DBNameKey)},
VirtualNodePeerAttributes: []string{
string(semconv.PeerServiceKey), string(semconv.DBNameKey), string(semconv.DBSystemKey),
},
MetricsFlushInterval: 60 * time.Second,
}
args.Store.SetToDefault()
args.DebugMetrics.SetToDefault()
Expand Down Expand Up @@ -170,12 +162,11 @@ func (args Arguments) Convert() (otelcomponent.Config, error) {
},
CacheLoop: args.CacheLoop,
StoreExpirationLoop: args.StoreExpirationLoop,
VirtualNodePeerAttributes: args.VirtualNodePeerAttributes,
VirtualNodeExtraLabel: args.VirtualNodeExtraLabel,
MetricsFlushInterval: &args.MetricsFlushInterval,
DatabaseNameAttributes: args.DatabaseNameAttributes,
ExponentialHistogramMaxSize: args.ExponentialHistogramMaxSize,
//TODO: Add VirtualNodePeerAttributes when it's no longer controlled by
// the "processor.servicegraph.virtualNode" feature gate.
// VirtualNodePeerAttributes: args.VirtualNodePeerAttributes,
}, nil
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,22 +50,12 @@ func TestArguments_UnmarshalAlloy(t *testing.T) {
MaxItems: 1000,
TTL: 2 * time.Second,
},
CacheLoop: 1 * time.Minute,
StoreExpirationLoop: 2 * time.Second,
DatabaseNameAttributes: []string{"db.name"},
MetricsFlushInterval: ptr(60 * time.Second),
//TODO: Add VirtualNodePeerAttributes when it's no longer controlled by
// the "processor.servicegraph.virtualNode" feature gate.
// VirtualNodePeerAttributes: []string{
// "db.name",
// "net.sock.peer.addr",
// "net.peer.name",
// "rpc.service",
// "net.sock.peer.name",
// "net.peer.name",
// "http.url",
// "http.target",
// },
CacheLoop: 1 * time.Minute,
StoreExpirationLoop: 2 * time.Second,
VirtualNodePeerAttributes: []string{"peer.service", "db.name", "db.system"},
VirtualNodeExtraLabel: false,
DatabaseNameAttributes: []string{"db.name"},
MetricsFlushInterval: ptr(60 * time.Second),
},
},
{
Expand All @@ -79,6 +69,8 @@ func TestArguments_UnmarshalAlloy(t *testing.T) {
}
cache_loop = "55m"
store_expiration_loop = "77s"
virtual_node_peer_attributes = ["attr1", "attr2"]
virtual_node_extra_label = true
metrics_flush_interval = "5s"
exponential_histogram_max_size = 160
output {}
Expand All @@ -96,12 +88,11 @@ func TestArguments_UnmarshalAlloy(t *testing.T) {
},
CacheLoop: 55 * time.Minute,
StoreExpirationLoop: 77 * time.Second,
VirtualNodePeerAttributes: []string{"attr1", "attr2"},
VirtualNodeExtraLabel: true,
DatabaseNameAttributes: []string{"db.name"},
MetricsFlushInterval: ptr(5 * time.Second),
ExponentialHistogramMaxSize: 160,
//TODO: Ad VirtualNodePeerAttributes when it's no longer controlled by
// the "processor.servicegraph.virtualNode" feature gate.
// VirtualNodePeerAttributes: []string{"attr1", "attr2"},
},
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -61,17 +61,26 @@ func toServicegraphConnector(state *State, id componentstatus.InstanceID, cfg *s
metricsFlushIntervalValue = *metricsFlushInterval
}

// TODO: Some default values upstream are not picked up correctly - fix this.
// Change the upstream code to set the default values in createDefaultConfig() in factory.go.
// Currently, some defaults are set in newConnector() in connector.go.
// For example, Alloy thinks the default for virtual_node_peer_attributes should be an empty list because that's what's in factory.go.
// For now the servicegraph converter tests are configured to explicitly set some values so that we don't see the wrong default value.

return &servicegraph.Arguments{
LatencyHistogramBuckets: cfg.LatencyHistogramBuckets,
Dimensions: cfg.Dimensions,
Store: servicegraph.StoreConfig{
MaxItems: cfg.Store.MaxItems,
TTL: cfg.Store.TTL,
},
CacheLoop: cfg.CacheLoop,
StoreExpirationLoop: cfg.StoreExpirationLoop,
MetricsFlushInterval: metricsFlushIntervalValue,
DatabaseNameAttributes: cfg.DatabaseNameAttributes,
CacheLoop: cfg.CacheLoop,
StoreExpirationLoop: cfg.StoreExpirationLoop,
MetricsFlushInterval: metricsFlushIntervalValue,
DatabaseNameAttributes: cfg.DatabaseNameAttributes,
VirtualNodeExtraLabel: cfg.VirtualNodeExtraLabel,
VirtualNodePeerAttributes: cfg.VirtualNodePeerAttributes,
ExponentialHistogramMaxSize: cfg.ExponentialHistogramMaxSize,
Comment thread
blewis12 marked this conversation as resolved.
Output: &otelcol.ConsumerArguments{
Metrics: ToTokenizedConsumers(nextMetrics),
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,11 @@ otelcol.connector.servicegraph "default" {
max_items = 10
ttl = "1s"
}
cache_loop = "2m0s"
store_expiration_loop = "5s"
metrics_flush_interval = "3m0s"
database_name_attributes = ["db_name3", "db_name4"]
cache_loop = "2m0s"
store_expiration_loop = "5s"
virtual_node_peer_attributes = ["example_attribute"]
metrics_flush_interval = "3m0s"
database_name_attributes = ["db_name3", "db_name4"]

output {
metrics = [otelcol.exporter.otlp.default.input]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ connectors:
store_expiration_loop: 5s
database_name_attributes: [db_name3, db_name4]
metrics_flush_interval: 3m
virtual_node_peer_attributes: [example_attribute]

exporters:
otlp:
Expand Down
Loading