-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposed Attribute Conventions for CI/CD #915
Comments
Is the idea that these are just for describing the CD events themselves? or is there a way that we'll be able to use these to also describe the built artefact in, say, production logs? I would love to have semantics for things like branch, sha, build-time for everything relating to the deployed artefact, and it strikes me that there's a lot of overlap. |
a possible tie-in to the OpenTelemetry profiling work could be to define CI/CD semantic conventions for sending symbols for native frames, which backends could then use when rendering profiles |
@jundai-godaddy That's the current state of CDEvents as I understand them, yes. The intent here is to leverage the current state, and iterate/add such that this metadata would be included in metrics, logs, and traces within OpenTelemetry so that you would have those semantic in place.
@trask that's good to know! thanks for that callout. |
What is the strategy for existing tools as listed on the CICD WG project page (GitHub Actions Receiver, Jenkins Opentelemetry plugin, …)? In any case when OTel SemConv defines CICD conventions, tools that want to follow OTel SemConv will most likely have to adapt. The question is how much effort this will be. |
@christophe-kamphaus-jemmic that's a great question that I just don't have a complete answer to. I would think that part of that answer would have to come from the maintainers of each of the tools based on their decision to support the convention. I'm sure this migration wouldn't be trivial, and the CDEvents conventions aren't all encompassing. Questions like yours are the reason I wanted to open this as an issue first instead of just pushing a pull request with attributes. For example, looking at the githubactionsreceiver attributes mirrors what GitHub provides in terms of event metadata which doesn't match CDEvents. Most of those attributes would be translatable, but CDEvents doesn't have an attribute for Another example would be the Jenkins work. From a cursory glance at their code base, their semantic attributes could be updated to reflect these attributes. I think adoption of existing things isn't going to be easy, but I don't think it's going to be ridiculously hard either. What I'd like to see is an easy way to translate/map, add missing attributes (either via extension or directly in OTEL), and extend (I think both CDEvents & CloudEvents data/customData fields already hit the mark). On the note of translation, I'm concerned about explicitly adopting CDEvents and excluding Eiffel. From a mapping perspective, I'd love to see make the interoperability and support easy between the two so a common language can be found. I'd be curious to hear your thoughts on that as well @e-backmark-ericsson @afrittoli |
@adrielp , I'm thrilled to read this issue and the conversation in it! I'm a maintainer both of the Eiffel event protocol and of the CDEvents event protocol. We've been looking for a way to interact/connect between those event protocols and OTel for some time now. Both Eiffel and CDEvents aim at solving mostly the same needs, and those include both interoperability and observability. I believe that an event-driven architecture, with a distributed and broadcasting event system with publish/subscribe functionality, is superior to achieve interoperability between components/services in a CI/CD setup compared to a point-to-point oriented architecture. For the observability use case on the other hand, I believe that solutions like OTel could play a crucial role. And that's the use case that I think most people involved in OTel are primarily concerned about. Eiffel and CDEvents can provide observability of a CI/CD system, but as we know that many of the tools involved in CI/CD also has the possibility to propagate data over OTLP it would be great to find a way to connect these two worlds. I envision that Eiffel and/or CDEvents would provide observability of the top-level of a CI/CD pipeline, or the full SDLC, which could include events notifying about new requirements or bugs, through source change commits, PRs, pre-merge builds&tests, production builds, component tests, system tests, deployments, rolling upgrades and beyond. And for the tools/systems involved in this process, that are capable of emitting OTel data, there should be a way to relate that OTel data to Eiffel/CDEvents data and vice versa. One important aspect of observability, is the ability to observe the full pipeline live. One use case is that developers pushing code to CI/CD would like to know how far their changes has come in the full CI/CD flow, for example using a live 'follow-your-commit' visualization. To handle that I believe it's crucial to be able to notify about activities (pipeline steps) being put in queue or started, and not just about finished activities. That also enables the overall system to take action on certain steps that take longer than expected, even if the service executing those steps has no timeout functionality in itself. I currently don't see that OTel can handle that use case, but I'd be glad to be informed otherwise. My knowledge and understanding of OTel is still too limited to clearly see the right way forward when it comes to how to relate OTel data with Eiffel/CDEvents data, but I hope we'll manage to sort that out as part of this OTel CI/CD observability initiative. |
Some of the attributes, IMO need to be more generi. They are too tied to how Tekton defines a CI/CD workflow; if you are not familiar with Tekton, that naming is weird/odd.
For example, Jenkins uses The attributes are plain. I mean, all are at the same level. There is no grouping (pipeline, test, scm, ...). We saw in other Opentelemetry conventions the benefits of grouping the attributes by categories; it makes the search and the implementation of transformers to other kinds of data, graphs, and so on easier. If the attributes are not well categorized at the beginning, it causes continuous refactoring and breaking changes. I suggest to change to something like:
Finally, I miss attributes referencing the Agent/runner/worker where the build takes place. Most of that info would go as system info not related to CI/CD, but we need a way to relate an Agent/runner/worker with a build |
Names are different all over the place indeed, we collected a lot of different names from different tools as part of the CDF SIG Interoperability work. The We took a similar approach for tests and test suites - events for their executions are Regardless of the name we pick for the standard, to help with adoption we could documenting how that name maps to the tool specific name.
I'm not sure I understand this, but I may be lacking OpenTelemetry context. We have groups of events in CDEvents, each group hosts several subjects, each subject has several predicates, each subject/predicate combination has several attributes - however those attributes are also grouped at subject level and can sometimes be referenced across events too. For example
There is definitely space for adding more attributes to the existing schema. We took the minimalistic approach of adding new attributes as needed, when we have a use case for them. I can definitely see info about the agent/runner/worker to belong to the build events. Feel free to create an issue about that, we could include the new field in the next release if we agree on the format. |
CDEvents includes several of what we call "subjects", such as We can have individual discussions about each specific proposed attribute. From your list:
|
+1
It may be possible to map CDEvents to corresponding Eiffel events (and vice-versa), maybe an adapter SDK would be sufficient to solve the dilemma?
|
I'm not very familiar with the cost of storing attribute names, but if reducing these
Furthermore, IIUC,
That's similar to what was identified here. |
They are not really generic to me. The
I am talking about changes in fields; these changes usually break all the historical data you have in some way, as an example here, you have one of the latest changes in the JVM fields for java instrumentation . I am trying to say that choosing the right fields in the right hierarchy is critical; on those decisions, people will build all their apps, graphs, UI, and processes, ... |
One might want to build automation associated with changes to the definitions of
Yeah, I totally agree, changes to fields may break historical data. We've been doing our best to pick the right fields and hierarchy on CDEvents side, however there is no perfect one as some structures might fit better for certain tools than others. We expect there'll be a bit of churn in the beginning and plan to switch to 1.x releases once initial adoption gives us confidence that we won't have to do backwards incompatible changes. |
I feel I need to go back to the issue description that started all this.
This, together with a few other things being said, sounds like OTel spans are basically made into carriers of CDEvents events. If so, why?
I'm not sure we'd have to make that choice. Shouldn't we aim at defining attributes that stand on their own, taking mere inspiration from prior art? Regardless of what choice we make, mappings to e.g. event protocols will have to be made and they will be imperfect. |
A few thoughts here:
As such my recommendation would be to do the following in order:
|
From the experience we have using the OpenTelemetry Jenkins plugin for the last three years or so, Distributed tracing fits well to represent the execution of CI/CD pipelines, it helps to chain spans from different tools (CI, maven, pytest, Ansible,...) and see the whole picture of the execution in a CI/CD context.
I am unsure if it is relevant; we are talking about CI/CD in general, not a particular implementation of how to represent the execution of a CI pipeline. CDEvents is nice as a starting point for getting ideas, but what matters is the representation of the information received more than how it is sent.
There is a long discussion about how to propagate the context. Most of the implementations use an environment variable to pass the context between applications, that context is used to configure distribute tracing in Otel on each tools. |
The representation of the information is the key aspect of CDEvents as well. The CDEvent spec defines how CDEvents are transported over the network in the CloudEvents binding, which describes how a CDEvent can be transported in a CloudEvents payload. The CloudEvent binding is separate from the core CDEvent spec by design.
|
Semantic Conventions define a common set of (semantic) attributes which provide meaning to data when collecting, producing and consuming it. At this point, we are not talking about how information is transported or the rules for interoperating between systems. |
@kuisathaverat Maybe I'm missing something here, but I would expect the specification to allow me to associate an event in the CI/CD with the execution of a |
Recently, Opentemetry added the concept of event; it is experimental. The specification to map an event already exists. We have to enrich the CI/CD part. To trigger CDEvents you must define how to fill the event fields, but I think that is something particular for CDEvent mapping/implementation in OpenTelemetry more than semantic conventions about entities and information related to CI/CD in general. OpenTelemetry is a vendor/implementation agnostic solution, or at least try to be agnostic. |
Thanks for all the feedback here, this has been a really good conversation. Based on the feedback I think the following things should be true:
I think there's going to be a balance between the use of events -> converting to spans vs emitting spans with SpanEvents. Some attributes defined may more appropriately show up in Events instead of Spans. I think a good example of this would be the
In both scenarios, it would impractical to try and build a span, across what might be multiple systems. Especially since it would be an unknown amount of time between the I'm thinking the general attributes themselves should not be beholden to the Signal or means of propagation, though it's certainly important to think about. Based on the above conversation, I think the new set of common attributes might look like this, if we want them to be OpenTelemetry defined and specific, yet mappable.
Underneath this set of attributes, others could come. {
"body": {
"issue_id": 1243,
"issue_name": "Rebels attacking the death star",
"issue_url": "https://example.com/issue/1243"
},
"attributes": {
"incident.name": "Rebels attacking the death star",
"incident.id": 1243,
"incident.severity": "Critical",
"incident.created": "<created timestamp>",
"incident.closed": "<resolved timestamp>"
}
...
} In this case the Additionally, we'll be able to identify where these attributes map to CDEvents/Eiffel attributes as well as pointing directly to some of them as supplementary in the context of Events, etc. That could be in some cases events within Spans, or Events emitted outside of spans due to the nature of the system This leaves me with three questions:
Also, if you're curious, I'm already leveraging Events from GitHub for DORA metrics. The events come into the WebHook OTEL receiver, run through the transform processor and end up looking like this (for deployments). {
"body": {
"deployment": {
"created_at": "2024-05-16T17:00:36Z",
"environment": "development",
"id": 1518950571,
"ref": "my-feature-branch",
"sha": "f29f5f4b306dbc961ea3ce76ff884931471ec4b6",
"task": "deploy",
"updated_at": "2024-05-16T17:02:37Z",
"url": "https://api.github.com/repos/org/repo/deployments/1518950571"
},
"deployment_status": {
"environment": "development",
"state": "success",
"url": "https://api.github.com/repos/org/repo/deployments/1518950571/statuses/3744913701"
},
"repository": {
"full_name": "org/repo",
"name": "repo",
"owner": {
"login": "org"
}
},
"workflow": {
"name": "Build/Push/Test Repo Docker Image",
"path": ".github/workflows/build-test-push.yml",
"url": "https://api.github.com/repos/org/repo/actions/workflows/96982907"
}
},
"attributes": {
"repository.name": "repo",
"repository.owner": "org"
},
"instrumentation_scope": {
"name": "otlp/webhookevent",
"version": "1.0.0",
"attributes": {
"receiver": "webhookevent",
"source": "webhookevent"
}
}
} Right the attributes you're not seeing are turned into labels (because of Loki conventions) but I think the byproduct of having a common convention, with signal specifics is going to enable wider adoption with methods already in play. |
At first pass I assumed that |
@jundai-godaddy - My intent was the checksum of the artifact. I |
Hello, this is Cyrille, I co-maintain the Jenkins OTel Plugin with @kuisathaverat and I also maintain the OTel Maven Extension.
Regarding the question of modeling pipeline executions as events or as traces, @jsuereth is right that OTel traces are not designed for long running processes and it causes CI/CD pipeline traces to sometimes look a bit weird but the pros&cons of using traces for CI/CD has proven to be extremely positive for both Jenkins and Maven use cases and we see a growing number of CI/CD tools that embrace OTel traces so I think we should continue in this direction. |
Thanks @adrielp this looks very good. Edit: I posted these questions as part of the PR review |
Great discussion, thanks for starting @adrielp . Some ideas on fields, using draft taxonomy, based on past implementations of pipeline emitted events I found helpful:
|
A couple of thought's from my side would be:
Obviously i wouldn't do a one to one mapping of the events but instead leverage a span to record the entire action an use attributes to signify the result ie Happy to go through and do more of a mapping of these CD events to attributes. |
Area(s)
area:new, area:cloudevents, area:deployment
Relates to #832 #833
Is your change request related to a problem? Please describe.
This is an issue being opened for broader discussion within the CI/CD Working Group and Semantic Conventions WG to gauge direction on the addition of conventions for CI and CD. This proposal details out at a moderately level the direction to evolve our support of CloudEvents and leverage extensions in order to define the exact attributes we support, where they come from in the community, etc.
Describe the solution you'd like
Below are an incomplete set extended attributes for
subject.type
for CI/CD. These attributes primarily come fromv0.3.0
of CDEvents.taskRun
# from Core EventspipelineRun
# from Core Eventsbuild
# from CIartifact
# from CIrepository
# from Source Codebranch
# from Source Codechange
# from Source Codeenvironment
# from CDservice
# from CDincident
# from COtestCaseRun
# from TeststestSuiteRun
# from TeststestOutput
# from TestsAn example of what CDEvents bound to CloudEvents looks like can be found here and is copied below.
Each one of these
subjects
, would be associated with a predicate which is what happens to the subject in an occurrence. For example,taskRun
would be followed bystarted
. This does need more conversation around timestamps. Based on one of the WG, one of the key questions was surrounding start & stop times. Because of the nature of event predicates in CDEvents and the event definitions for Eiffel, events denote what type they are (ie. start / finished) and have corresponding timestamps the event was created. Due to the nature of distributed tracing with regards to the CloudEvents specification, this shouldn't conflict with the current tracing specification.An example event workflow within a CI system may look like this:
The example CI system above would send event data over OTLP with the attribute examples listed above.
I've leaned towards these attributes for these reasons:
Describe alternatives you've considered
Eiffel could be made to extend CloudEvents just like CDEvents, which would enable choice selection and interoperability between conventions. Trace propagation will occur then as per the CloudEvents spec defined in opentelemetry with the addition of attributes aligning with CI/CD.
Additional context
The one currently identified divergence between CloudEvents Distributed Tracing and CI/CD systems in the method of propagation. This is for the traceparent, which can be propagated within CI/CD systems to provide inter-process context propagation. Environment carries as context and baggage propagators is going to be key for batch systems like CI to be able to emit events with correct lineage.
Current outstanding thoughts and concerns:
The text was updated successfully, but these errors were encountered: