-
Notifications
You must be signed in to change notification settings - Fork 438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Jaeger exporter #534
Add Jaeger exporter #534
Conversation
Codecov Report
@@ Coverage Diff @@
## main #534 +/- ##
==========================================
+ Coverage 94.75% 94.76% +0.01%
==========================================
Files 217 217
Lines 9923 9923
==========================================
+ Hits 9403 9404 +1
+ Misses 520 519 -1
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need help figuring out a bazel build? If so, ping me on gitter, happy to help get this out the door!
exporters/jaeger/include/opentelemetry/exporters/jaeger/jaeger_exporter.h
Outdated
Show resolved
Hide resolved
|
||
private: | ||
std::unique_ptr<AgentClient> agent_; | ||
std::shared_ptr<TTransport> socket_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these shared_ptr?
There's ref-counting overhead vs. unique_ptr. Are these actually shared?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Josh. Yes, I also prefer unique_ptr
if possible. For the cases of using shared_ptr
, that would be needed as it is required to be passed to other functions as shared_ptr
, so I just created it as shared_ptr
in the first place.
Are there plans to continue with this draft PR? We might need this for nginx and httpd instrumentation, would gladly help. |
@seemk thanks. I'll clean and push my remaining changes this week. Your help will be appreciated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding Jaeger support :)
core::SystemTimestamp timestamp, | ||
const common::KeyValueIterable &attributes) noexcept | ||
{ | ||
// TODO: convert event to Jaeger Log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have tickets for the TODOs if they are not planned in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll log issues for TODOs when this PR is ready to merge.
{ | ||
if (code == trace::StatusCode::kUnset) | ||
{ | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't return here, as the description should still be needed to set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conversion doc says the status (Jaeger span status?) should not be set for UNSET
, let me know if this is still needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, looks fine in that case.
|
||
using namespace jaegertracing; | ||
|
||
class Recordable final : public sdk::trace::Recordable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Not something which needs immediate attention, but probably we should have recordable names after transport types - eg TUDPRecordable
. This is an issue with Zipkin exporter too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. I logged the same comment for ETW exporter. I think we could create a task for the cleanup for all the exporters? I'll try the renaming anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created issue #735 for this.
struct JaegerExporterOptions | ||
{ | ||
// The endpoint to export to. | ||
std::string server_addr = "localhost"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps future proofing, but ff Thrift over HTTP is supported, perhaps rename it to endpoint and parse host, port from it? With HTTP it will be necessary to specify the full endpoint, e.g. localhost:1234/api/traces
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I suggest we refactor the option in later PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
const uint32_t span_size = CalcSizeOfSerializedThrift(jaeger_span); | ||
if (span_size > max_span_bytes) | ||
{ | ||
// TODO, log too large span error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be nice to have, but how does otel-cpp log in general? I.e. I don't think it should write to stdout or stderr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use otel-cpp logger, and let the application decide where to export it. There is ticket #408
to handle error logging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let's unify the SDK internal logging once we have it.
{ | ||
kThriftUdp, | ||
kThriftUdpCompact, | ||
kThriftHttp, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps only the currently supported transports should be listed here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that works too. But as this list is expected to be stable, I just leave it there to make the header file stable (later extension should not need to update the header file, just update the .cc
file). But let me know if removing the unimplemented formats is preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Sorry for keeping it waiting for long. I think we can move it to main-stream faster, and then evolve for any changes.
|
||
uint32_t TUDPTransport::read(uint8_t *buf, uint32_t len) | ||
{ | ||
uint32_t num_read = recvfrom(socket_, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this call block the exporter (and so the application) till any response is received from the server/agent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a block call but it is not supposed to be called in the export process (only sendto
is called). It is here because it is required to implement the full thrift Jaeger class.
{ | ||
if (code == trace::StatusCode::kUnset) | ||
{ | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, looks fine in that case.
const uint32_t span_size = CalcSizeOfSerializedThrift(jaeger_span); | ||
if (span_size > max_span_bytes) | ||
{ | ||
// TODO, log too large span error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use otel-cpp logger, and let the application decide where to export it. There is ticket #408
to handle error logging.
return 0; | ||
} | ||
|
||
std::lock_guard<std::mutex> guard{lock_}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to lock here? Also, can we use common::SpinLockMutex ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the mutex guarded operation is a little heavy (network IO is involved), I am not preferring using SpingLockMutex
to perform some busy wait here. Let me know your thoughts on switching to common::SpinLockMutext
.
return FlushWithLock(); | ||
} | ||
|
||
int ThriftSender::FlushWithLock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does FlushWithLock() and Append() run on separate threads ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, Append()
could be called from a different thread than the one calling FlushWithLock()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the spec (see below excerpt), I removed the lock on the thrift_sender which belong to the exporter instance.
Export() will never be called concurrently for the same exporter instance. Export() can be called again only after the current call returns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice.
span_buffer_.push_back(jaeger_span); | ||
if (byte_buffer_size_ < max_span_bytes) | ||
{ | ||
return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean if there are very few spans created, they may keep waiting for a long time in the queue. Should there be some timeout, or we somehow ensure that every Export() results in the export over the network?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. The few events could be there till the tracer object is destructed. We probably could create a separate thread to do flush periodically. But I haven't seen other implementations have the timeout mechanism. How about filing a ticket to follow up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lalitb great question!
@ThomsonTan this is what OpenTelemetry .NET is doing https://github.com/open-telemetry/opentelemetry-dotnet/blob/98055871511d0b56df75cb7443a1ed9a49918014/src/OpenTelemetry.Exporter.Jaeger/JaegerExporter.cs#L84.
This is the semantic we need to achieve https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#exportbatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably could create a separate thread to do flush periodically.
This doesn't sound correct to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sharing the spec and OpenTelemetry-DotNet link. The spec clearly requires each batch of spans to be serialized and sent to destination. Add a call to Flush()
at the end of Export(batch)
for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that seems a good approach.
Looks good. @jsuereth has requested a change. We can pull this in once he re-review it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good for the working first copy.
Agree with most comments to open follow-on bugs. Do you need help getting bazel working here? As for installation instructions for bazel, it'll likely be "Just add this dependency in your bazel build". I'm a bit swamped with stuff right now, but I'm happy to help with the time I have on bazel-related build issue if needed.
) | ||
|
||
# TODO: enable bazel build | ||
# cc_binary( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: anything I can do to help get this working?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jsuereth I will send out another PR with bazel build script then will need your help to make sure it is installing thrift dependence in the right way.
|
||
using namespace jaegertracing; | ||
|
||
class Recordable final : public sdk::trace::Recordable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
struct JaegerExporterOptions | ||
{ | ||
// The endpoint to export to. | ||
std::string server_addr = "localhost"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@ThomsonTan - Seems this is good to merge now. As I understand, you and/or @jsuereth will take care of bazel build as separate PR. |
Implementation of Jaeger exporter based on spec https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/sdk_exporters/jaeger.md.