Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Tail Based Sampling Processor From OTEL Collector Extension #5878

Merged
merged 30 commits into from
Aug 31, 2024
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
e55ddca
Add Dependencies
mahadzaryab1 Aug 22, 2024
822da09
Include Load Balancing Exporter And Tail Sampling Processor
mahadzaryab1 Aug 22, 2024
1d08303
Remove Load Balancing Exporter
mahadzaryab1 Aug 23, 2024
29d64f0
Set Up Tail Sampling Example
mahadzaryab1 Aug 24, 2024
191252c
Add Makefile To Example
mahadzaryab1 Aug 24, 2024
2cde4aa
Use Tracegen And Simplify Config
mahadzaryab1 Aug 24, 2024
f5d5d54
Add Makefile Target For Running Tail Sampling Integration Test
mahadzaryab1 Aug 30, 2024
776e054
Add Jaeger Config Files For Tail Sampling Integration Test
mahadzaryab1 Aug 30, 2024
15a5eaf
Flip Policies In Config Files
mahadzaryab1 Aug 30, 2024
c7e29d1
Add E2E Integration Test For Tail Sampling Processor
mahadzaryab1 Aug 30, 2024
1708da5
Add Github Workflow For Tail Sampling Processor Integration Test
mahadzaryab1 Aug 30, 2024
3c8ba97
Make Tracegen Generate Traces Based on Time in Compose Setup
mahadzaryab1 Aug 30, 2024
65d8275
Run Formatter
mahadzaryab1 Aug 30, 2024
f5fdec9
Address Feedback From PR Review
mahadzaryab1 Aug 31, 2024
3bb6791
Format YAML Files
mahadzaryab1 Aug 31, 2024
3255541
Clean Up Makefile
mahadzaryab1 Aug 31, 2024
e2f5e99
Add Healthcheck Extension To Config Files
mahadzaryab1 Aug 31, 2024
9d7ff20
Address Feedback From PR Review
mahadzaryab1 Aug 31, 2024
aee60fc
Bump Default To Latest Version of OTEL Collector Contrib Image
mahadzaryab1 Aug 31, 2024
9fa0de4
undo go.mod
yurishkuro Aug 31, 2024
44eb90e
Merge branch 'main' into tail-based-sampling
yurishkuro Aug 31, 2024
124ca49
update go.mod
yurishkuro Aug 31, 2024
bd6fd79
Merge branch 'main' into tail-based-sampling
yurishkuro Aug 31, 2024
91ae10c
Make Test Logging Unconditional
mahadzaryab1 Aug 31, 2024
dc9ab9b
Fix Docker Compose Setup
mahadzaryab1 Aug 31, 2024
1fb1d65
Remove Endpoints From Config That Are Not Needed
mahadzaryab1 Aug 31, 2024
6a00d65
Get Combined Output For Tracegen
mahadzaryab1 Aug 31, 2024
49a9c03
More Cleanup For Docker Setup
mahadzaryab1 Aug 31, 2024
603fb63
Address Feedback From PR Review
mahadzaryab1 Aug 31, 2024
c50ded3
Update README Documenting Setup Of Tail Sampling Processor
mahadzaryab1 Aug 31, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions .github/workflows/ci-e2e-tailsampling-processor.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
name: Test Tail Sampling Processor

on:
push:
branches: [main]

pull_request:
branches: [main]

concurrency:
group: ${{ github.workflow }}-${{ (github.event.pull_request && github.event.pull_request.number) || github.ref || github.run_id }}
cancel-in-progress: true

# See https://github.com/ossf/scorecard/blob/main/docs/checks.md#token-permissions
permissions: # added using https://github.com/step-security/secure-workflows
contents: read

jobs:
tailsampling-processor:
runs-on: ubuntu-latest
steps:
- name: Harden Runner
uses: step-security/harden-runner@0d381219ddf674d61a7572ddd19d7941e271515c # v2.9.0
with:
egress-policy: audit # TODO: change to 'egress-policy: block' after couple of runs

- uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7

- uses: actions/setup-go@0a12ed9d6a96ab950c8f026ed9f722fe0da7ef32 # v5.0.2
with:
go-version: 1.23.x

- name: Run Tail Sampling Processor Integration Test
run: |
make tail-sampling-integration-test

- name: Upload coverage to codecov
uses: ./.github/actions/upload-codecov
with:
files: cover.out
flags: tailsampling-processor
4 changes: 4 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,10 @@ index-cleaner-integration-test: docker-images-elastic
index-rollover-integration-test: docker-images-elastic
$(MAKE) storage-integration-test COVEROUT=cover-index-rollover.out

.PHONY: tail-sampling-integration-test
tail-sampling-integration-test:
SAMPLING=tail $(MAKE) jaeger-v2-storage-integration-test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this runs go test, but when do you start the docker compose environment?

All other e2e tests have a driver script that orchestrates all components of the test, e.g. scripts/es-integration-test.sh

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not using the docker-compose environment for my test. Calling e2eInitialize is enough to start the Jaeger collector. You can simply run this test by calling make tail-sampling-integration-test. Let me know if you want to change any of this setup though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. However, it means that the new docker compose file will begin to rot since it's not being exercised by the CI, something we tried to avoid (e.g. see e2e spm test). So it would be good to actually combine using docker compose with e2e test.

Copy link
Collaborator Author

@mahadzaryab1 mahadzaryab1 Aug 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see! What would you want this to look like? The current docker-compose set up generates load using tracegen which ideally we wouldn't want in the integration test so we can manually generate those. And the existing setup in the E2E tests does some nice things for us like flush the storage in between tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that's why I was asking from the start what your plan would be. The way you are using e2e_integration framework is very lightweight, and I could easily see an alternative setup where everything is just orchestrated from a shell script

  • run docker-compose with one config
    • maybe don't include tracegen in compose, run it manually
  • do a curl against query service to retrieve service names as JSON (trivial to write)
  • shut down docker-compose (to clear the storage) and run again with different config

If you are interested to pursue this approach, I would suggest still merging this PR first so that we already have something in place. Can you finish the README?

Copy link
Collaborator Author

@mahadzaryab1 mahadzaryab1 Aug 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yurishkuro That sounds good to me and I can pursue that approach in a follow-up PR. And yes, working on the README now. Will push it up soon.


.PHONY: cover
cover: nocover
bash -c "set -e; set -o pipefail; STORAGE=memory $(GOTEST) -timeout 5m -coverprofile $(COVEROUT) ./... | tee test-results.json"
Expand Down
38 changes: 38 additions & 0 deletions cmd/jaeger/config-tail-sampling-always-sample.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
service:
extensions: [jaeger_storage, jaeger_query, healthcheckv2]
pipelines:
traces:
receivers: [otlp]
processors: [tail_sampling]
exporters: [jaeger_storage_exporter]
telemetry:
logs:
level: DEBUG

extensions:
healthcheckv2:
use_v2: true
http:
jaeger_query:
trace_storage: some_storage
jaeger_storage:
backends:
some_storage:
memory:
max_traces: 100000

receivers:
otlp:
protocols:
grpc:
http:
endpoint: "0.0.0.0:4318"

processors:
tail_sampling:
decision_wait: 5s
policies: [{ name: test-policy-1, type: always_sample }]

exporters:
jaeger_storage_exporter:
trace_storage: some_storage
46 changes: 46 additions & 0 deletions cmd/jaeger/config-tail-sampling-service-name-policy.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
service:
extensions: [jaeger_storage, jaeger_query, healthcheckv2]
pipelines:
traces:
receivers: [otlp]
processors: [tail_sampling]
exporters: [jaeger_storage_exporter]
telemetry:
logs:
level: DEBUG

extensions:
healthcheckv2:
use_v2: true
http:
jaeger_query:
trace_storage: some_storage
jaeger_storage:
backends:
some_storage:
memory:
max_traces: 100000

receivers:
otlp:
protocols:
grpc:
http:
endpoint: "0.0.0.0:4318"

processors:
tail_sampling:
decision_wait: 5s
policies:
[
{
name: filter-by-attribute,
type: string_attribute,
string_attribute:
{ key: service.name, values: [tracegen-00, tracegen-03] },
},
]

exporters:
jaeger_storage_exporter:
trace_storage: some_storage
2 changes: 2 additions & 0 deletions cmd/jaeger/internal/components.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import (
"github.com/open-telemetry/opentelemetry-collector-contrib/exporter/kafkaexporter"
"github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter"
"github.com/open-telemetry/opentelemetry-collector-contrib/extension/healthcheckv2extension"
"github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor"
"github.com/open-telemetry/opentelemetry-collector-contrib/receiver/jaegerreceiver"
"github.com/open-telemetry/opentelemetry-collector-contrib/receiver/kafkareceiver"
"github.com/open-telemetry/opentelemetry-collector-contrib/receiver/zipkinreceiver"
Expand Down Expand Up @@ -104,6 +105,7 @@ func (b builders) build() (otelcol.Factories, error) {
// standard
batchprocessor.NewFactory(),
memorylimiterprocessor.NewFactory(),
tailsamplingprocessor.NewFactory(),
// add-ons
adaptivesampling.NewFactory(),
)
Expand Down
97 changes: 97 additions & 0 deletions cmd/jaeger/internal/integration/tailsampling_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
// // Copyright (c) 2024 The Jaeger Authors.
// // SPDX-License-Identifier: Apache-2.0

package integration

import (
"context"
"os"
"os/exec"
"sort"
"testing"
"time"

"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"

"github.com/jaegertracing/jaeger/plugin/storage/integration"
)

// TailSamplingIntegration contains the test components to perform an integration test
// for the Tail Sampling Processor.
type TailSamplingIntegration struct {
E2EStorageIntegration

// expectedServices contains a list of services that should be sampled in the test case.
expectedServices []string
}

// TestTailSamplingProcessor_EnforcesPolicies runs an A/B test to perform an integration test
// for the Tail Sampling Processor.
// - Test A uses a Jaeger config file with a tail sampling processor that has a policy for sampling
// all traces. In this test, we check that all services that are samples are stored.
// - Test B uses a Jaeger config file with a tail sampling processor that has a policy to sample
// traces using on the `service.name` attribute. In this test, we check that only the services
// listed as part of the policy in the config file are stored.
func TestTailSamplingProcessor_EnforcesPolicies(t *testing.T) {
if env := os.Getenv("SAMPLING"); env != "tail" {
t.Skipf("This test requires environment variable SAMPLING=tail")
}

expectedServicesA := []string{"tracegen-00", "tracegen-01", "tracegen-02", "tracegen-03", "tracegen-04"}
tailSamplingA := &TailSamplingIntegration{
E2EStorageIntegration: E2EStorageIntegration{
ConfigFile: "../../config-tail-sampling-always-sample.yaml",
StorageIntegration: integration.StorageIntegration{
CleanUp: purge,
},
},
expectedServices: expectedServicesA,
}

expectedServicesB := []string{"tracegen-00", "tracegen-03"}
tailSamplingB := &TailSamplingIntegration{
E2EStorageIntegration: E2EStorageIntegration{
ConfigFile: "../../config-tail-sampling-service-name-policy.yaml",
StorageIntegration: integration.StorageIntegration{
CleanUp: purge,
},
},
expectedServices: expectedServicesB,
}

t.Run("sample_all", tailSamplingA.testTailSamplingProccessor)
t.Run("sample_some", tailSamplingB.testTailSamplingProccessor)
}

// testTailSamplingProccessor performs the following steps:
// 1. Initialize the test case by starting the Jaeger V2 collector
// 2. Generate 5 traces using `tracegen` with one service per trace
// 3. Read the stored services from the memory store
// 4. Check that the sampled services match what is expected
func (ts *TailSamplingIntegration) testTailSamplingProccessor(t *testing.T) {
ts.e2eInitialize(t, "memory")
ts.generateTraces(t)

var actual []string
found := assert.Eventually(t, func() bool {
var err error
actual, err = ts.SpanReader.GetServices(context.Background())
require.NoError(t, err)
sort.Strings(actual)
return assert.ObjectsAreEqualValues(ts.expectedServices, actual)
}, 100*time.Second, 15*time.Second)

if !found {
t.Log("\t Expected:", ts.expectedServices)
t.Log("\t Actual :", actual)
}
}

// generateTraces generates 5 traces using `tracegen` with one service per trace
func (*TailSamplingIntegration) generateTraces(t *testing.T) {
tracegenCmd := exec.Command("go", "run", "../../../../cmd/tracegen", "-traces", "5", "-services", "5")
stdout, err := tracegenCmd.Output()
require.NoError(t, err)
t.Logf("tracegen completed: %s", stdout)
}
30 changes: 30 additions & 0 deletions docker-compose/tail-sampling/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Copyright (c) 2024 The Jaeger Authors.
# SPDX-License-Identifier: Apache-2.0

BINARY ?= jaeger

.PHONY: build
build: clean-jaeger
cd ../../ && make build-$(BINARY) GOOS=linux
cd ../../ && make create-baseimg PLATFORMS=linux/$(shell go env GOARCH)
cd ../../ && docker buildx build --target release \
--tag jaegertracing/$(BINARY):dev \
--build-arg base_image=localhost:5000/baseimg_alpine:latest \
--build-arg debug_image=not-used \
--build-arg TARGETARCH=$(shell go env GOARCH) \
--load \
cmd/$(BINARY)

.PHONY: dev
dev: export JAEGER_IMAGE_TAG = dev
dev: build
docker compose -f docker-compose.yml up $(DOCKER_COMPOSE_ARGS)

.PHONY: clean-jaeger
clean-jaeger:
# Also cleans up intermediate cached containers.
docker system prune -f

.PHONY: clean-all
clean-all: clean-jaeger
docker rmi -f otel/opentelemetry-collector-contrib:latest ;
35 changes: 35 additions & 0 deletions docker-compose/tail-sampling/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
services:
jaeger:
networks:
backend:
image: jaegertracing/jaeger:${JAEGER_IMAGE_TAG:-latest}
volumes:
- "./jaeger-v2-config.yml:/etc/jaeger/config.yml"
command: ["--config", "/etc/jaeger/config.yml"]
ports:
- "16686:16686"

otel_collector:
networks:
backend:
image: otel/opentelemetry-collector-contrib:${OTEL_IMAGE_TAG:-0.108.0}
volumes:
- ${OTEL_CONFIG_SRC:-./otel-collector-config-connector.yml}:/etc/otelcol/otel-collector-config.yml
command: --config /etc/otelcol/otel-collector-config.yml
depends_on:
- jaeger
ports:
- "8889:8889"

tracegen:
networks:
- backend
image: jaegertracing/jaeger-tracegen:latest
environment:
- OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://jaeger:4318/v1/traces
command: ["-workers", "3", "-pause", "250ms", "-services", "5", "-duration", "10s"]
depends_on:
- jaeger

networks:
backend:
46 changes: 46 additions & 0 deletions docker-compose/tail-sampling/jaeger-v2-config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
service:
extensions: [jaeger_storage, jaeger_query, healthcheckv2]
pipelines:
traces:
receivers: [otlp]
processors: [tail_sampling]
exporters: [jaeger_storage_exporter]
telemetry:
logs:
level: DEBUG

extensions:
healthcheckv2:
use_v2: true
http:
jaeger_query:
trace_storage: some_storage
jaeger_storage:
backends:
some_storage:
memory:
max_traces: 100000

receivers:
otlp:
protocols:
grpc:
http:
endpoint: "0.0.0.0:4318"

processors:
tail_sampling:
decision_wait: 15s
policies:
[
{
name: filter-by-attribute,
type: string_attribute,
string_attribute:
{ key: service.name, values: [tracegen-02, tracegen-04] },
},
]

exporters:
jaeger_storage_exporter:
trace_storage: some_storage
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318

exporters:
loadbalancing:
routing_key: "traceID"
protocol:
otlp:
timeout: 1s
tls:
insecure: true
resolver:
static:
hostnames:
- 0.0.0.0:4317

service:
pipelines:
traces:
receivers:
- otlp
processors: []
exporters:
- loadbalancing
Loading
Loading