[SDK] Better control of threads executed by opentelemetry-cpp #3175

marcalff · 2024-11-26T22:06:08Z

Changes

This feature provides a way for applications, when configuring the SDK and exporters,
to participate in the execution path of internal opentelemetry-cpp threads.

The opentelemetry-cpp library provides the following:

a new ThreadInstrumentation interface,
new runtime options structures, to optionally configure the SDK:
- BatchSpanProcessorRuntimeOptions
- PeriodicExportingMetricReaderRuntimeOptions
- BatchLogRecordProcessorRuntimeOptions
new runtime options structures, to optionally configure the OTLP HTTP exporters:
- OtlpHttpExporterRuntimeOptions
- OtlpHttpMetricExporterRuntimeOptions
- OtlpHttpLogRecordExporterRuntimeOptions
new ThreadInstrumentation parameters, to optionally configure the CURL HttpClient
new runtime options structures, to optionally configure the OTLP FILE exporters:
- OtlpFileExporterRuntimeOptions
- OtlpFileMetricExporterRuntimeOptions
- OtlpFileLogRecordExporterRuntimeOptions
new runtime options structure, to optionally configure the OTLP FILE client:
- OtlpFileClientRuntimeOptions

Using the optional runtime options structures,
an application can subclass the ThreadInstrumentation interface,
and be notified of specific events of interest during the execution of an internal opentelemetry-cpp thread.

This allows an application to call, for example:

pthread_setaffinity_np(), see related Add BatchSpanProcessor option to set CPU affinity on the worker thread #1822
setns(), to control the network namespace used by HTTP CURL connections
pthread_setname_np(), for better observability from the operating system
many more specific apis, as needed

See the documentation for ThreadInstrumentation for details.

A new example program, example_otlp_instrumented_http, shows how to use the feature,
and add application logic in the thread execution code path.

Note that this feature is experimental, protected by a WITH_THREAD_INSTRUMENTATION_PREVIEW flag in CMake.
Various runtime options structures, as well as the thread instrumentation interface, may change without notice before this feature is declared stable.

For significant contributions please make sure you have completed the following items:

CHANGELOG.md updated for non-trivial changes
Unit tests have been added
Changes in public API reviewed

Fixes open-telemetry#3174

netlify · 2024-11-26T22:06:28Z

✅ Deploy Preview for opentelemetry-cpp-api-docs canceled.

Name	Link
🔨 Latest commit	`2cd0fb4`
🔍 Latest deploy log	https://app.netlify.com/sites/opentelemetry-cpp-api-docs/deploys/677d0d50ca45dd0008bff906

codecov · 2024-11-26T22:18:19Z

Codecov Report

Attention: Patch coverage is 29.54545% with 31 lines in your changes missing coverage. Please review.

Project coverage is 87.72%. Comparing base (3b89346) to head (2cd0fb4).

Files with missing lines	Patch %	Lines
sdk/src/logs/batch_log_record_processor.cc	18.75%	13 Missing ⚠️
sdk/src/trace/batch_span_processor.cc	28.58%	10 Missing ⚠️
...metrics/export/periodic_exporting_metric_reader.cc	42.86%	8 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3175      +/-   ##
==========================================
- Coverage   88.16%   87.72%   -0.44%     
==========================================
  Files         198      198              
  Lines        6224     6258      +34     
==========================================
+ Hits         5487     5489       +2     
- Misses        737      769      +32

Files with missing lines	Coverage Δ
...pentelemetry/sdk/logs/batch_log_record_processor.h	`100.00% <ø> (ø)`
...ude/opentelemetry/sdk/trace/batch_span_processor.h	`100.00% <ø> (ø)`
...metrics/export/periodic_exporting_metric_reader.cc	`74.57% <42.86%> (-7.16%)`	⬇️
sdk/src/trace/batch_span_processor.cc	`87.84% <28.58%> (-6.27%)`	⬇️
sdk/src/logs/batch_log_record_processor.cc	`82.90% <18.75%> (-6.39%)`	⬇️

... and 1 file with indirect coverage changes

owent · 2024-12-18T03:01:30Z

Do you think we can implement a worker pool to share threads among multiple exporters and processors? Since most of them simply wait for a timeout, there is no need to create a separate thread for each component.

marcalff · 2024-12-18T09:57:46Z

Do you think we can implement a worker pool to share threads among multiple exporters and processors? Since most of them simply wait for a timeout, there is no need to create a separate thread for each component.

For some applications, running in a multi threaded environment, it is critical to control how a thread executes in finer details.

For example, I have:

(a) an OTLP HTTP trace exporter
(b) several OTLP HTTP metric exporters
(c) an OTLP HTTP log exporter

When the exporter uses CURL to post to the endpoint, it makes a TCP/IP connection.

(a), (b) and (c) do need to use different TCP/IP stacks, or different named networks, to connect to their respective endpoints.

This is achieved by calling setns(fd, CLONE_NEWNET) in each thread, where fd is a file descriptor describing the named network to use.

Having dedicated threads (as currently) makes this easier.

If execution of each exporter was to be multiplexed and executed in the same worker thread, the context will need to change for each exporter, calling setns() many more time, and introducing more complexity to counter the effect of a pool worker thread, so it won't help but instead get in the way.

setns() is just an example, there can be other use cases, like binding to a CPU, etc.

Overall, the need is for the main application to inject arbitrary code in the opentelemetry-cpp execution code path, for opentelemetry threads.

ext/src/http/client/curl/http_client_curl.cc

owent · 2024-12-19T08:25:28Z

Could please also add this instrumentation for OTLP file exporters?

marcalff · 2024-12-19T17:39:26Z

Could please also add this instrumentation for OTLP file exporters?

Sure, implemented.

Every place where opentelemetry-cpp starts a new std::thread is now instrumented.

marcalff · 2024-12-20T11:06:27Z

An example of using this feature, to set thread names to the operating system on Linux:

ps -Lp 199922 -o pid,tid,comm

    PID     TID COMMAND
 199922  199922 (redacted)
 ...
 199922  201152 otel_bsp-1
 199922  201153 otel_pmr-1
 199922  201154 otel_pmr-2
 199922  201157 otel_blrp-1
 199922  201163 otlp_traces
 199922  201165 otlp_logs
 199922  201179 otlp_metrics

In order, threads are:

the application main thread
the batch span processor
the periodic exporting meter reader (2 instances)
the batch log record processor
a trace otlp http curl thread
a logs otlp http curl thread
a metrics otlp http curl thread

owent

LGTM and thanks.

lalitb · 2025-01-07T19:23:45Z

sdk/include/opentelemetry/sdk/common/thread_instrumentation.h

+  virtual void BeforeWait() {}
+  virtual void AfterWait() {}
+  virtual void BeforeLoad() {}
+  virtual void AfterLoad() {}


Wondering if there's a way to avoid virtual calls for these methods to reduce their associated cost. One potential approach is to allow users to provide a callback that the SDK invokes during thread lifecycle events. For example:

class ThreadInstrumentation { ThreadInstrumentation( std::function<> callback = DefaultCallback) : callback_(std::move(callback)) {} bool OnOperation(ThreadOperation opr, ThreadContext &context) { return callback_(operation, context); } private: std::function<bool(ThreadOperation, const ThreadContext&)> callback_; }

ThreadContext could carry information like thread_id or be omitted if unnecessary. This approach lets the SDK call OnOperation internally while allowing the callback to define behavior for specific stages.

Not a blocker as I don't see these virtual methods to be called in the hot-path, but if this is something we need to consider?

Most of the time, the instrumentation will be absent (nullptr), so that:

if (worker_thread_instrumentation_ != nullptr) { worker_thread_instrumentation->OnXXX(); }

will not even enter the if block, so it is definitively not in the hot path.

When some instrumentation is provided, the virtual function will land in user code, or in the noop default implementation.

I expect that using a virtual function or a callback std::function to be equivalent, in term of overhead: the call will be dynamic (using a function pointer), and not static with the target known at compile time. A virtual method versus a callback std::function versus plain old C function pointers is just a different coding style, they all boil down to the same thing (a function pointer).

About passing thread context:

The subclass of ThreadInstrumentation is free to hold state, and can even keep per thread instance state in thread local storage if it really wants to, for complex cases.

This pushes complexity to the user (and disclosure, this is how I use it).

For calls that are paired, we could help a bit to propagate state like this:

void *opaque_wait_context = nullptr; if (worker_thread_instrumentation_ != nullptr) { opaque_wait_context = worker_thread_instrumentation->BeforeWait(); } // do something if (worker_thread_instrumentation_ != nullptr) { worker_thread_instrumentation->AfterWait(opaque_wait_context); }

and likewise for OnStart/OnEnd, and BeforeLoad/AfterLoad.

With this opaque pointer, the user code is free to use it (or not), and it avoids complexity with thread local storage.

@lalitb

I can implement this if you agree with it, or we can revise it later, let me know what you prefer.

[POC] Better control of threads executed by opentelemetry-cpp

3975855

Fixes open-telemetry#3174

marcalff added 12 commits November 27, 2024 09:58

warning cleanup

b594ee0

cleanup initializers in structs

88bd0bf

typo

abb9674

Merge branch 'main' into feat_thread_instrumentation

0214d95

cleanup

24816a8

Fix curl thread detach

d5f1075

Merge branch 'main' into feat_thread_instrumentation

eedde5c

Merge branch 'main' into feat_thread_instrumentation

809ade7

Moved runtime options to a dedicated structure.

2d223d8

Merge branch 'main' into feat_thread_instrumentation

9ea4223

cleanup

738f95f

bazel build

8650478

Merge branch 'main' into feat_thread_instrumentation

b8dbe39

marcalff added 4 commits December 18, 2024 17:37

Added example

bae36b8

fix bazel

941dd20

cleanup

85be1ca

Fixed bad merge

a2fcfc9

marcalff changed the title ~~[POC] Better control of threads executed by opentelemetry-cpp~~ [SDK] Better control of threads executed by opentelemetry-cpp Dec 18, 2024

marcalff marked this pull request as ready for review December 18, 2024 17:58

marcalff requested a review from a team as a code owner December 18, 2024 17:58

Merge branch 'main' into feat_thread_instrumentation

aec98dc

marcalff added the enhancement New feature or request label Dec 18, 2024

marcalff added 2 commits December 18, 2024 21:56

Add changelog.

fb3d60b

cleanup

715b76d

marcalff added the pr:please-review This PR is ready for review label Dec 18, 2024

owent reviewed Dec 19, 2024

View reviewed changes

ext/src/http/client/curl/http_client_curl.cc Show resolved Hide resolved

marcalff added 7 commits December 19, 2024 10:25

Merge branch 'main' into feat_thread_instrumentation

df04b03

Instrumented the OTLP FILE exporter.

e9b09a0

fix bazel

22048a8

Merge branch 'main' into feat_thread_instrumentation

943f079

iwyu

bf4e9ff

Document OTLP FILE in changelog

564fbad

Merge branch 'main' into feat_thread_instrumentation

3b6699f

owent approved these changes Dec 20, 2024

View reviewed changes

marcalff added 2 commits January 7, 2025 10:56

Merge branch 'main' into feat_thread_instrumentation

71d6535

Implement CMake WITH_THREAD_INSTRUMENTATION_PREVIEW flag.

2cd0fb4

marcalff mentioned this pull request Jan 7, 2025

[RELEASE] Prepare release opentelemetry-cpp 1.19.0 #3229

Open

lalitb reviewed Jan 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SDK] Better control of threads executed by opentelemetry-cpp #3175

[SDK] Better control of threads executed by opentelemetry-cpp #3175

marcalff commented Nov 26, 2024 •

edited

Loading

netlify bot commented Nov 26, 2024 •

edited

Loading

codecov bot commented Nov 26, 2024 •

edited

Loading

owent commented Dec 18, 2024

marcalff commented Dec 18, 2024

owent commented Dec 19, 2024

marcalff commented Dec 19, 2024

marcalff commented Dec 20, 2024 •

edited

Loading

owent left a comment

lalitb Jan 7, 2025 •

edited

Loading

marcalff Jan 7, 2025

marcalff Jan 7, 2025

[SDK] Better control of threads executed by opentelemetry-cpp #3175

Are you sure you want to change the base?

[SDK] Better control of threads executed by opentelemetry-cpp #3175

Conversation

marcalff commented Nov 26, 2024 • edited Loading

Changes

netlify bot commented Nov 26, 2024 • edited Loading

✅ Deploy Preview for opentelemetry-cpp-api-docs canceled.

codecov bot commented Nov 26, 2024 • edited Loading

Codecov Report

owent commented Dec 18, 2024

marcalff commented Dec 18, 2024

owent commented Dec 19, 2024

marcalff commented Dec 19, 2024

marcalff commented Dec 20, 2024 • edited Loading

owent left a comment

Choose a reason for hiding this comment

lalitb Jan 7, 2025 • edited Loading

Choose a reason for hiding this comment

marcalff Jan 7, 2025

Choose a reason for hiding this comment

marcalff Jan 7, 2025

Choose a reason for hiding this comment

marcalff commented Nov 26, 2024 •

edited

Loading

netlify bot commented Nov 26, 2024 •

edited

Loading

codecov bot commented Nov 26, 2024 •

edited

Loading

marcalff commented Dec 20, 2024 •

edited

Loading

lalitb Jan 7, 2025 •

edited

Loading