Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SDK] Better control of threads executed by opentelemetry-cpp #3175

Open
wants to merge 30 commits into
base: main
Choose a base branch
from

Conversation

marcalff
Copy link
Member

@marcalff marcalff commented Nov 26, 2024

Fixes #3174

Changes

This feature provides a way for applications, when configuring the SDK and exporters,
to participate in the execution path of internal opentelemetry-cpp threads.

The opentelemetry-cpp library provides the following:

  • a new ThreadInstrumentation interface,
  • new runtime options structures, to optionally configure the SDK:
    • BatchSpanProcessorRuntimeOptions
    • PeriodicExportingMetricReaderRuntimeOptions
    • BatchLogRecordProcessorRuntimeOptions
  • new runtime options structures, to optionally configure the OTLP HTTP exporters:
    • OtlpHttpExporterRuntimeOptions
    • OtlpHttpMetricExporterRuntimeOptions
    • OtlpHttpLogRecordExporterRuntimeOptions
  • new ThreadInstrumentation parameters, to optionally configure the CURL HttpClient
  • new runtime options structures, to optionally configure the OTLP FILE exporters:
    • OtlpFileExporterRuntimeOptions
    • OtlpFileMetricExporterRuntimeOptions
    • OtlpFileLogRecordExporterRuntimeOptions
  • new runtime options structure, to optionally configure the OTLP FILE client:
    • OtlpFileClientRuntimeOptions

Using the optional runtime options structures,
an application can subclass the ThreadInstrumentation interface,
and be notified of specific events of interest during the execution of an internal opentelemetry-cpp thread.

This allows an application to call, for example:

See the documentation for ThreadInstrumentation for details.

A new example program, example_otlp_instrumented_http, shows how to use the feature,
and add application logic in the thread execution code path.

Note that this feature is experimental, protected by a WITH_THREAD_INSTRUMENTATION_PREVIEW flag in CMake.
Various runtime options structures, as well as the thread instrumentation interface, may change without notice before this feature is declared stable.

For significant contributions please make sure you have completed the following items:

  • CHANGELOG.md updated for non-trivial changes
  • Unit tests have been added
  • Changes in public API reviewed

Copy link

netlify bot commented Nov 26, 2024

Deploy Preview for opentelemetry-cpp-api-docs canceled.

Name Link
🔨 Latest commit 2cd0fb4
🔍 Latest deploy log https://app.netlify.com/sites/opentelemetry-cpp-api-docs/deploys/677d0d50ca45dd0008bff906

Copy link

codecov bot commented Nov 26, 2024

Codecov Report

Attention: Patch coverage is 29.54545% with 31 lines in your changes missing coverage. Please review.

Project coverage is 87.72%. Comparing base (3b89346) to head (2cd0fb4).

Files with missing lines Patch % Lines
sdk/src/logs/batch_log_record_processor.cc 18.75% 13 Missing ⚠️
sdk/src/trace/batch_span_processor.cc 28.58% 10 Missing ⚠️
...metrics/export/periodic_exporting_metric_reader.cc 42.86% 8 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3175      +/-   ##
==========================================
- Coverage   88.16%   87.72%   -0.44%     
==========================================
  Files         198      198              
  Lines        6224     6258      +34     
==========================================
+ Hits         5487     5489       +2     
- Misses        737      769      +32     
Files with missing lines Coverage Δ
...pentelemetry/sdk/logs/batch_log_record_processor.h 100.00% <ø> (ø)
...ude/opentelemetry/sdk/trace/batch_span_processor.h 100.00% <ø> (ø)
...metrics/export/periodic_exporting_metric_reader.cc 74.57% <42.86%> (-7.16%) ⬇️
sdk/src/trace/batch_span_processor.cc 87.84% <28.58%> (-6.27%) ⬇️
sdk/src/logs/batch_log_record_processor.cc 82.90% <18.75%> (-6.39%) ⬇️

... and 1 file with indirect coverage changes

@owent
Copy link
Member

owent commented Dec 18, 2024

Do you think we can implement a worker pool to share threads among multiple exporters and processors? Since most of them simply wait for a timeout, there is no need to create a separate thread for each component.

@marcalff
Copy link
Member Author

Do you think we can implement a worker pool to share threads among multiple exporters and processors? Since most of them simply wait for a timeout, there is no need to create a separate thread for each component.

For some applications, running in a multi threaded environment, it is critical to control how a thread executes in finer details.

For example, I have:

  • (a) an OTLP HTTP trace exporter
  • (b) several OTLP HTTP metric exporters
  • (c) an OTLP HTTP log exporter

When the exporter uses CURL to post to the endpoint, it makes a TCP/IP connection.

(a), (b) and (c) do need to use different TCP/IP stacks, or different named networks, to connect to their respective endpoints.

This is achieved by calling setns(fd, CLONE_NEWNET) in each thread, where fd is a file descriptor describing the named network to use.

Having dedicated threads (as currently) makes this easier.

If execution of each exporter was to be multiplexed and executed in the same worker thread, the context will need to change for each exporter, calling setns() many more time, and introducing more complexity to counter the effect of a pool worker thread, so it won't help but instead get in the way.

setns() is just an example, there can be other use cases, like binding to a CPU, etc.

Overall, the need is for the main application to inject arbitrary code in the opentelemetry-cpp execution code path, for opentelemetry threads.

@marcalff marcalff changed the title [POC] Better control of threads executed by opentelemetry-cpp [SDK] Better control of threads executed by opentelemetry-cpp Dec 18, 2024
@marcalff marcalff marked this pull request as ready for review December 18, 2024 17:58
@marcalff marcalff requested a review from a team as a code owner December 18, 2024 17:58
@marcalff marcalff added the enhancement New feature or request label Dec 18, 2024
@marcalff marcalff added the pr:please-review This PR is ready for review label Dec 18, 2024
@owent
Copy link
Member

owent commented Dec 19, 2024

Could please also add this instrumentation for OTLP file exporters?

@marcalff
Copy link
Member Author

Could please also add this instrumentation for OTLP file exporters?

Sure, implemented.

Every place where opentelemetry-cpp starts a new std::thread is now instrumented.

@marcalff
Copy link
Member Author

marcalff commented Dec 20, 2024

An example of using this feature, to set thread names to the operating system on Linux:

ps -Lp 199922 -o pid,tid,comm
    PID     TID COMMAND
 199922  199922 (redacted)
 ...
 199922  201152 otel_bsp-1
 199922  201153 otel_pmr-1
 199922  201154 otel_pmr-2
 199922  201157 otel_blrp-1
 199922  201163 otlp_traces
 199922  201165 otlp_logs
 199922  201179 otlp_metrics

In order, threads are:

  • the application main thread
  • the batch span processor
  • the periodic exporting meter reader (2 instances)
  • the batch log record processor
  • a trace otlp http curl thread
  • a logs otlp http curl thread
  • a metrics otlp http curl thread

Copy link
Member

@owent owent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and thanks.

virtual void BeforeWait() {}
virtual void AfterWait() {}
virtual void BeforeLoad() {}
virtual void AfterLoad() {}
Copy link
Member

@lalitb lalitb Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if there's a way to avoid virtual calls for these methods to reduce their associated cost. One potential approach is to allow users to provide a callback that the SDK invokes during thread lifecycle events. For example:

    class ThreadInstrumentation {
        ThreadInstrumentation( std::function<> callback = DefaultCallback)
            : callback_(std::move(callback)) {}
        bool OnOperation(ThreadOperation opr, ThreadContext &context) {
             return callback_(operation, context);
        }
   
        private:
        std::function<bool(ThreadOperation, const ThreadContext&)> callback_;
    }

ThreadContext could carry information like thread_id or be omitted if unnecessary. This approach lets the SDK call OnOperation internally while allowing the callback to define behavior for specific stages.

Not a blocker as I don't see these virtual methods to be called in the hot-path, but if this is something we need to consider?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the time, the instrumentation will be absent (nullptr), so that:

if (worker_thread_instrumentation_ != nullptr)
{
  worker_thread_instrumentation->OnXXX();
}

will not even enter the if block, so it is definitively not in the hot path.

When some instrumentation is provided, the virtual function will land in user code, or in the noop default implementation.

I expect that using a virtual function or a callback std::function to be equivalent, in term of overhead: the call will be dynamic (using a function pointer), and not static with the target known at compile time. A virtual method versus a callback std::function versus plain old C function pointers is just a different coding style, they all boil down to the same thing (a function pointer).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About passing thread context:

The subclass of ThreadInstrumentation is free to hold state, and can even keep per thread instance state in thread local storage if it really wants to, for complex cases.

This pushes complexity to the user (and disclosure, this is how I use it).

For calls that are paired, we could help a bit to propagate state like this:

void *opaque_wait_context = nullptr;

if (worker_thread_instrumentation_ != nullptr)
{
  opaque_wait_context = worker_thread_instrumentation->BeforeWait();
}

// do something

if (worker_thread_instrumentation_ != nullptr)
{
  worker_thread_instrumentation->AfterWait(opaque_wait_context);
}

and likewise for OnStart/OnEnd, and BeforeLoad/AfterLoad.

With this opaque pointer, the user code is free to use it (or not), and it avoids complexity with thread local storage.

@lalitb

I can implement this if you agree with it, or we can revise it later, let me know what you prefer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request pr:please-review This PR is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feat] Better control of threads executed by opentelemetry-cpp
3 participants