-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable strict context check and fix some context issues. #2637
Conversation
@@ -168,6 +164,10 @@ public void onExecutionFailure( | |||
} | |||
|
|||
private void clearAttributes(ExecutionAttributes executionAttributes) { | |||
Scope scope = executionAttributes.getAttribute(SCOPE_ATTRIBUTE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😱
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get it. Both why this change is significant and your reaction to it :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a real context leak with big damage - if sync AWS SDK request failed and a subsequent client request of some form was made, it would parent to the AWS SDK request (well it'd be suppressed as a result usually)! Hooray for strict context checking
Context traceContext = tracer.startSpan(Context.current(), context.getMsgList()); | ||
ContextAndScope contextAndScope = new ContextAndScope(traceContext, traceContext.makeCurrent()); | ||
context.setMqTraceContext(contextAndScope); | ||
Context otelContext = tracer.startSpan(Context.current(), context.getMsgList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like similar netty libraries, there's no thread guarantee for before and end. Since there's no downstream instrumentation it's not a big deal anyways
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about e.g. user-provided message hooks that want to augment existing span?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need to provide something like RocketMqTracing.getOpenTelemetryContext(ConsumeMessageContext)
to let them extract it I guess. In the meantime, if someone had such code using Context.current()
in consumeMessageAfter
it would generally be buggy with the chance of Context.current()
not actually corresponding to the hook's message due to context leakage so this fix is important first.
I hope such hooks will be able to use our instrumentation-api instead of that though.
return false; | ||
} | ||
|
||
// Wrapper of tasks for dispatch - the wrapped task should have context already and this doesn't |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea what I'm talking about! But seems like a reasonable hypothesis for now, how akka flakes.
@@ -62,3 +62,7 @@ compileVersion101TestGroovy { | |||
classpath = classpath.plus(files(compileVersion101TestScala.destinationDir)) | |||
dependsOn compileVersion101TestScala | |||
} | |||
|
|||
tasks.withType(Test) { | |||
jvmArgs "-Dio.opentelemetry.javaagent.shaded.io.opentelemetry.context.enableStrictContext=false" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice to see where this fails
@@ -168,6 +164,10 @@ public void onExecutionFailure( | |||
} | |||
|
|||
private void clearAttributes(ExecutionAttributes executionAttributes) { | |||
Scope scope = executionAttributes.getAttribute(SCOPE_ATTRIBUTE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
@@ -14,6 +14,43 @@ | |||
/** Utils for concurrent instrumentations. */ | |||
public class ExecutorInstrumentationUtils { | |||
|
|||
private static final ClassValue<Boolean> NOT_INSTRUMENTED_RUNNABLE_ENCLOSING_CLASS = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, i'm trying to remember to look out for opportunities to use also 😄
Sorry for the churn, couldn't help but fix some of the leaks. The remaining will take some deeper dives so I think this is it for now |
@@ -168,6 +164,10 @@ public void onExecutionFailure( | |||
} | |||
|
|||
private void clearAttributes(ExecutionAttributes executionAttributes) { | |||
Scope scope = executionAttributes.getAttribute(SCOPE_ATTRIBUTE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get it. Both why this change is significant and your reaction to it :D
Context traceContext = tracer.startSpan(Context.current(), context.getMsgList()); | ||
ContextAndScope contextAndScope = new ContextAndScope(traceContext, traceContext.makeCurrent()); | ||
context.setMqTraceContext(contextAndScope); | ||
Context otelContext = tracer.startSpan(Context.current(), context.getMsgList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about e.g. user-provided message hooks that want to augment existing span?
Unfortunately not able to repro CI failures well but it's nice that it finds them. As far as I can tell there might be something fundamentally wrong with our netty instrumentation, will examine separately first. |
@@ -36,6 +37,13 @@ abstract class InstrumentationSpecification extends Specification { | |||
testRunner().clearAllExportedData() | |||
} | |||
|
|||
def cleanup() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, perhaps we should consider adding an afterTest()
method to InstrumentationTestRunner
and calling it here and in InstrumentationExtension
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing out InstrumentationExtension - adding afterTest seemed a bit weird since it's the same code for agent and library. For now I just copied the code since it isn't so much and I think we can revisit later.
@@ -27,20 +27,18 @@ public void consumeMessageBefore(ConsumeMessageContext context) { | |||
if (context == null || context.getMsgList() == null || context.getMsgList().isEmpty()) { | |||
return; | |||
} | |||
Context traceContext = tracer.startSpan(Context.current(), context.getMsgList()); | |||
ContextAndScope contextAndScope = new ContextAndScope(traceContext, traceContext.makeCurrent()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can ContextAndScope
be removed now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, thanks
protected Boolean computeValue(Class<?> enclosingClass) { | ||
// Avoid context leak on jetty. Runnable submitted from SelectChannelEndPoint is used to | ||
// process a new request which should not have context from them current request. | ||
if (enclosingClass.getName().equals("org.eclipse.jetty.io.nio.SelectChannelEndPoint")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an idea, but maybe those class names should be specified by instrumentation modules? Each instrumented library may bring its own set of excluded classes (also see AdditionalLibraryIgnoresMatcher
) and maybe it'd better to define them in the lib instrumentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah! That's the approach Datadog has implemented, it would be nice for us to follow it at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice - I originally thought about adding some void excludeClasses(ExcludeBuilder)
method to InstrumentationModule
@@ -20,3 +20,7 @@ dependencies { | |||
testImplementation group: 'com.sun.activation', name: 'jakarta.activation', version: '1.2.2' | |||
} | |||
} | |||
|
|||
tasks.withType(Test) { | |||
jvmArgs "-Dio.opentelemetry.javaagent.shaded.io.opentelemetry.context.enableStrictContext=false" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this mean that our ratpack instrumentation actually leaks context, or is it an artifact of something else? If we are really leaking contexts, should we create issues to fix all of these, so we make sure to circle back and fix them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, filed an issue
No description provided.