Skip to content

Define tracing API #36

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Oct 17, 2022
Merged

Define tracing API #36

merged 21 commits into from
Oct 17, 2022

Conversation

iRevive
Copy link
Contributor

@iRevive iRevive commented Jul 29, 2022

The point of the PR is to describe the tracing API and simplify the review process. The API mimics OpenTelemetry with slight differences

* the set of attributes to associate with the link
*/
def addLink(
spanContext: SpanContext,
Copy link
Contributor Author

@iRevive iRevive Jul 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, Span[F] can be used here. No, it should be SpanContext otherwise an external context cannot be linked.

build.sbt Outdated
@@ -33,6 +33,7 @@ lazy val core = crossProject(JVMPlatform, JSPlatform)
libraryDependencies ++= Seq(
"org.typelevel" %%% "cats-core" % "2.7.0",
"org.typelevel" %%% "cats-effect" % "3.3.13",
"co.fs2" %%% "fs2-core" % "3.2.11",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you only need ByteVector you can include scodec-bits directly. I just mentioned its in fs2 as a proof of stability/wide adoption 😉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Highly likely we will need fs2 in the classpath to provide a proper tracing of streams.

@iRevive
Copy link
Contributor Author

iRevive commented Jul 29, 2022

@janstenpickle could you have a look, please?

/** Returns the trace identifier associated with this [[SpanContext]] as
* 16-byte vector.
*/
def getTraceIdBytes: ByteVector
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method name claims it returns an array of bytes but it returns a ByteVector 🤔

What if we tweak names?

def getTraceId: ByteVector
def getTraceIdHex: String

@janstenpickle
Copy link

@janstenpickle could you have a look, please?

Thanks, will do. Do you mind if I look on Monday? I've got quite a busy weekend 😅

@iRevive
Copy link
Contributor Author

iRevive commented Jul 29, 2022

@janstenpickle could you have a look, please?

Thanks, will do. Do you mind if I look on Monday? I've got quite a busy weekend 😅

All good, there is no rush

This was referenced Jul 30, 2022
@iRevive iRevive requested a review from rossabaker July 31, 2022 08:24
Copy link

@janstenpickle janstenpickle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iRevive I've added some initial thoughts!

*
* Returns `None` if the span is invalid or no-op.
*/
def context: Option[SpanContext]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In t4c we have an invalid instance of SpanContext. I think I like using an Option here better, but we might find that the Java implementation needs an invalid instance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm scratching my head about this topic.

Official spec https://opentelemetry.io/docs/reference/specification/compatibility/opentracing/#get-the-active-span.

According to the OpenTelemetry specification, the invalid context can be described as

SpanContext { 
  traceId = "00000000000000000000000000000000", 
  spanId = "0000000000000000"
} 

If I get it right, invalid and noop spans are identical from the OpenTelemetry perspective. They should not produce child spans at all.

I see three states of the SpanContext:

  1. Valid
  2. Invalid (i.e. cannot create a span from remote traceId and spanId)
  3. Noop

Perhaps it's better to stick with OpenTelemetry design and always return SpanContext because, well, the context is always there:

def context: SpanContext

Then a library user should verify the validity of the context if necessary.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I like the Option, you're right it's probably a good idea to stay close to the spec.

Then a library user should verify the validity of the context if necessary.

This hasn't hurt us in t4c as far as I'm aware. No one has needed to differentiate between a no-op and invalid context.

Comment on lines 21 to 56
trait SpanContext {

/** Returns the trace identifier associated with this [[SpanContext]] as
* 16-byte vector.
*/
def traceId: ByteVector

/** Returns the trace identifier associated with this [[SpanContext]] as 32
* character lowercase hex String.
*/
def traceIdHex: String

/** Returns the span identifier associated with this [[SpanContext]] as 8-byte
* vector.
*/
def spanId: ByteVector

/** Returns the span identifier associated with this [[SpanContext]] as 16
* character lowercase hex String.
*/
def spanIdHex: String

/** Returns the sampling strategy of this [[SpanContext]]. Indicates whether
* the span in this context is sampled.
*/
def samplingDecision: SamplingDecision

/** Returns `true` if this [[SpanContext]] is valid.
*/
def isValid: Boolean

/** Returns `true` if this [[SpanContext]] was propagated from a remote
* parent.
*/
def isRemote: Boolean
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be a good idea to make this sealed to provde "safe" and "unsafe" constructors for SpanContexts with ByteVectors. There should probably also be a constructor that safely generates a context from Random.

This might make isValid redundant of course. I suppose the question is do we want to pass through invalid contexts parsed/converted from upstream system and whether we want to differentiate between a no-op and invalid data. My hunch is that it's probably a good idea to represent a no-op as a None and invalid as false here, to aid debugging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about a simple wrapper for java implementation.

Safe and unsafe constructors make sense to me. They are useful for third-party integrations (i.e. creating SpanContext from HTTP request headers). I will add them.

Comment on lines 25 to 58
object SpanFinalizer {

type Strategy = PartialFunction[Resource.ExitCase, SpanFinalizer]

object Strategy {
def empty: Strategy = PartialFunction.empty

def reportAbnormal: Strategy = {
case Resource.ExitCase.Errored(e) =>
SpanFinalizer.multiple(
SpanFinalizer.RecordException(e),
SpanFinalizer.SetStatus(Status.Error, None)
)

case Resource.ExitCase.Canceled =>
SpanFinalizer.SetStatus(Status.Error, Some("canceled"))
}
}

final case class RecordException(throwable: Throwable) extends SpanFinalizer

final case class SetStatus(status: Status, description: Option[String])
extends SpanFinalizer

final case class SetAttributes(attributes: Seq[Attribute[_]])
extends SpanFinalizer

final case class Multiple(finalizers: NonEmptyList[SpanFinalizer])
extends SpanFinalizer

def multiple(head: SpanFinalizer, tail: SpanFinalizer*): Multiple =
Multiple(NonEmptyList.of(head, tail: _*))

}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! 😄

}
}

final case class RecordException(throwable: Throwable) extends SpanFinalizer
Copy link
Contributor Author

@iRevive iRevive Aug 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the case classes be hidden? Is it less painful in terms of bincompat?

private final class RecordException(...) extends SpanFinalizer

def recordException(throwable: Throwable): SpanFinalizer = 
  RecordException(throwable)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you avoid the case and make the constructor private it should be fine. Unless in the future you may want to remove the RecordException type, then better not to expose it.


/** A decision on whether a span should be recorded or dropped.
*/
sealed abstract class SamplingDecision(val isSampled: Boolean)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Java client there is a RECORD_ONLY type as well for client based sampling it looks like. Is that something we wanted to have here? https://github.com/open-telemetry/opentelemetry-java/blob/50408d499f85d5761d0a5ed9bf9d77d5ff01fff5/sdk/trace/src/main/java/io/opentelemetry/sdk/trace/samplers/SamplingDecision.java#L17

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a look at those.

*
* @param attributes
* the set of attributes to add to the span
*/
Copy link
Member

@zmccoy zmccoy Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting pretty specific to the spec, but the Otel spec requires the API to provide a single attribute setter, and then optionally a setter for multiple attributes at once. https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#set-attributes

Would it make sense to make two different methods to separate those use cases out rather than just a vararg solution for both?

Copy link
Contributor Author

@iRevive iRevive Aug 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I'm fine with varargs.
span.setAttributes(Attribute(...), Attribute(...)) is handier than span.setAttributes(Seq(Attribute(...), Attribute(...)))

We still can add def setAttribute[A](attribute: Attribute[A]): F[Unit], it will not hurt for sure.

Copy link
Member

@zmccoy zmccoy Aug 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think adding setAttribute just to fulfill the requirement and make it easy to communicate about api shapes to other folks using Otel would be great.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the method

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks great.

@iRevive
Copy link
Contributor Author

iRevive commented Sep 25, 2022

@rossabaker could you review this one, please?

Copy link
Member

@rossabaker rossabaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fantastic work.

* @see
* default finalization strategy [[SpanFinalizer.Strategy.reportAbnormal]]
*/
def createManual: Resource[F, Span.Manual[F]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a Resource instead of an F[Span.Manual[F]]?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this even need to exist at all? Is it different from createAuto.allocated? Only thing I can think of is that it gives us a named end.

Copy link
Contributor Author

@iRevive iRevive Sep 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resource is needed to apply the finalization strategy during release.

def withFinalizationStrategy(strategy: SpanFinalizer.Strategy): SpanBuilder[F]

Also, a manual span is not ended upon resource release. This should be done manually.

The question is the following: do we have a scenario where a span should outlive the scope and be ended somewhere after? Nothing comes to my head right away.

We can do the following:

  1. Replace Span.Auto and Span.Manual with Span
  2. Add the end method to Span
  3. Rename SpanBuilder#createAuto to create

That way it is possible to end a span earlier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After end,

Implementations SHOULD ignore all subsequent calls to End and any other Span methods, i.e. the Span becomes non-recording by being ended

For the finalizers to be useful, they need to be sequenced before end. And I don't know why we'd want to call the finalizers, allow other work, and then end. If that's all true, I think all we need is a resource, and "manual" is just an allocated resource.

What gives me pause is that in the spec, end can take:

(Optional) Timestamp to explicitly set the end timestamp.

I like the idea of hiding end, because it changes the state of the span and what's still legal on it, and embedding it in the resource finalizer makes it clear where this happens. But I'm not sure how to do that and support the explicit end.

What if we had end that triggered the finalizer? "Manual" just returns an F[Span[F]], and automatic returns a Resource[F, Span[F]] that invokes end? Then there's still just one kind of Span.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable. There is no sense in invoking finalizers when the span has been terminated. I updated API definitions.

@iRevive iRevive requested a review from rossabaker October 17, 2022 09:27
Copy link
Member

@rossabaker rossabaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me!

@iRevive iRevive merged commit 6cabd1b into typelevel:main Oct 17, 2022
chr12c pushed a commit to chr12c/otel4s that referenced this pull request Jun 26, 2023
@iRevive iRevive deleted the tracing-api branch November 29, 2024 09:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants