Definition for a basic end-user app crash event by bidetofevil · Pull Request #3448 · open-telemetry/semantic-conventions

bidetofevil · 2026-02-17T21:40:56Z

Changes

Add a new event in the app namespace for a crash of an end-user facing app. Further specification of what a crash is defined as are provided in the event definition. There are implementations abound (async: Embrace native crash impl, synchronous: OTel Android JVM crash impl), but none strictly follow the convention as stated.

Merge requirement checklist

[x ] CONTRIBUTING.md guidelines followed.
[x ] Change log entry added, according to the guidelines in When to add a changelog entry.
- If your PR does not need a change log, start the PR title with [chore]
[x ] Links to the prototypes or existing instrumentations (when adding or changing conventions)

LikeTheSalad

Thanks @bidetofevil, I think it's a great start! I just have one comment regarding app.crash.id, which, to me, is the only attribute that's not too clear what it's for. Apart from that, I think it's good to go.

LikeTheSalad · 2026-02-18T06:02:42Z

+      by those attributes caused the crash. When used, these attributes SHOULD be
+      usable for debugging without additional data not specified in the event
+      (e.g. stacktrace already deobfuscated). If that is not the case, additional
+      information like `app.build.id` SHOULD be provided to make the data useful for


Suggested change

information like `app.build.id` SHOULD be provided to make the data useful for

information like `app.build_id` SHOULD be provided to make the data useful for

LikeTheSalad · 2026-02-18T06:18:10Z

+        requirement_level:
+          conditionally_required: Recommended if `app.async` is set, opt_in otherwise.
+      - ref: app.crash.id
+        requirement_level: required


I'm curious why this attribute is required? Based on its description, it seems like it could be used to reference it in other telemetry data. Is that for context purposes, similar to what the trace id value does?

Yeah, this should probably be more "opt_in". It's meant to be a canonical ID for that crashed instance used for deduping and cross-referencing, like if you were getting this data from a source outside the app and don't want to report this more than once. But if the instrumentation is just capturing this live and creating an event for it, it's not strictly necessary.

Got it. I agree it shouldn't be required, so thanks for updating it. I also think it could be very useful for cases like grouping crashes, as mentioned by @thompson-tomo in another comment, but I want to make sure I fully understand it first. The description summarised reads that:

It's a unique identifier for a crash.

It could come from outside of the instrumentation.

It "COULD be meaningful" and referenced elsewhere.

It can facilitate deduplication.

That info seems too generic for me to get a grasp of its usage. For example, regarding the "uniqueness" of it, does it mean that it must be globally unique or unique for the same or similar type of crash occurrences? If it's the latter, then it sounds like it could be used to group occurrences of the same crash?

Regarding "It COULD come from a source external to the instrumentation", can you mention an example of an external source?

What do you mean when you say that its value "COULD be meaningful"? For example, meaningful to group similar crashes as mentioned above? Or is it mostly meaningful to avoid exporting the same crash more than once? (in reference to the "facilitate deduplication" point)? Or both?

thompson-tomo · 2026-02-18T00:41:09Z

+      - ref: app.async
+        requirement_level: opt_in


Suggested change

- ref: app.async

requirement_level: opt_in

I think it would be cleaner & simpler if we removed this attribute. I would then focus app.crash on the sync use case including removing the unnecessary attributes. We could then introduce a process.crash to handle the async use case.

The async use case is very common, like with native crashes on Android. I think any app crash event needs to include this use case, as it compries a large swath of them.

And the reason it's in an attribute of its own is because while it's most prominent for crashes, other types of async-report telemetry exists, so we should define a framework for it outside of crashes.

I have no problem with the use case of async capturing hence the proposal of using app.* events for sync use case which we would eventually associate with an app entity and potentially the process. Then for async we use process.* events which we would eventually associate with a procees entity. This way the async events have access to a reproducible identity etc and we reduce the need for the conditionally attributes.

This gives you a framework of:

app.* events to describe the inner workings of a process providing user access to an app by including app specific attributes

process.* events to describe something related to a process running within a system

system.* events to describe the something related to the system

This continues what we already have for metrics between system & process.

So the async crash may not have to do with the process at all, just a different mode of information-getting and telemetry recording. Making this an attribute that cuts across all events make more sense to me than carving out a special case in another namespace for which other async events have to duplicate as well.

I get that and i understand that it is a different mode of information-getting and telemetry recording which is why I am thinking about how this telemetry can be associated with entities.

This also reduces the need for so many conditional attributes based on the use case by having dedicated events

Hi, native is indeed overloaded, if you talk to a react native developer who develops the code in ts or js then for them native is java iteself

Sorry @LikeTheSalad and @atulgpt - just getting back to this.

"Native", in this case, means anything (e.g. C++, Rust, etc.) compiled into a shared library file that is "native" to the device architecture that can be called through JNI (Java-Native Interface) and developed using the Android NDK (Native Development Kit). This is why the word native is used (on Android): it's literally what we call it.

But this is just the most common example: this is a boolean flag because it doesn't talk about the specific contents of the crash (e.g. C++ stacktraces), but rather the manner from which the data for the crash is obtained.

JVM crashes are the exception, where you can catch a crash before it happens (i.e. live), and record it synchronously with the same OTel SDK and app instance that is initialized with the CURRENT state of the app when the crash is taking place. Whereas async is the more common case, where you obtain a tombstone of information AFTER the app instance where the crash has taken place died, so you can't just use the reporting OTel SDK instance to provide the details of the app during the crash because it will be different, as the reporting is happening some time after.

Here's an example:

App version 1.1 produces a JVM crash, which is cause with the UncaughtExceptionHandler and written to an OTel event before the app crashes. In the OTLP, it will report that app.version of the crash is 1.1 because that's what the Resources say.

But say the same app version produced a C++ crash. The process dies, and the tombstone is written asynchronously to disk before the OTel event can be recorded. Before the app starts up next time, it is upgrade to 1.2, so when it starts up again, app.version of the new OTel instance will be 1.2.

Having this flag will allow us to distinguish between these two types of crashes when they are processed in the backend, not only for crashes, but for other types of telemetry as well.

Thanks for the thorough explanation, @bidetofevil. After going back through the thread, I think the main concern raised earlier is not about whether async crashes are a valid use case, but about whether this is the right design for them.

Your latest comment, combined with the rest of the thread, helped me get a better idea of the overall proposal and the reason for the questions and suggestions on the async flag. The issue is that this proposal seems to combine two different use cases into a single event. The question for me is whether app.crash should carry a boolean flag plus 5 conditional attributes that become recommended when it is enabled. That starts to feel like two schemas inside one event, which is harder to reason about.

It's also hard to evaluate confidently at this stage because we haven't implemented this flow in OTel Android yet. So far, we've focused on JVM stacktraces, which are still the most common type of crash Android developers tend to deal with. From my perspective, that makes this feel a bit early to standardize, especially as part of a "basic" crash event that doesn't otherwise need these attributes.

The more I look at it, the more it seems like the async case may need its own event. That would keep app.crash (or even something like app.crash.sync) simple and focused, while giving the async use case a dedicated schema that can evolve independently.

Let me know if you have concerns with the approach of having a separate event for each use case.

As discussed at the Android SIG meeting, I'll move the definition of app.async to another PR. I've modified the wording of the description to remove references to the attribute, but kept in the description the possibly of the event being logged async and included the opt_in attributes that are relevant in that case.

Let me know if this look OK @LikeTheSalad. We can have a discussion about the async stuff in a different PR :-)

Thank you, Hanson!

thompson-tomo · 2026-02-19T12:27:05Z

+          Further, any related attributes not specified in the signal MUST be taken from the Resource (e.g.
+          if `service.version` is specified but `service.name` is not, it is assume it will be in the Resource).
+          For telemetry that omits this attribute, it SHOULD be assumed that it was NOT recorded asynchronously.
+      - id: app.crash.id


What about

Suggested change

- id: app.crash.id

- id: exception.occurance.id

That way it can be better reused for things such as Grouping nested exceptions into a group using a shared attribute.

A few reasons I want this in the app namespace:

I want to restrict this to end-user facing apps, which the namespaces does. Crash has a special meaning in this space and I want to keep the semantics

Exceptions aren't the only causes of crashes -- at least not ones that we can know the details of.

There are additional things that will be put on this. When part 2 of this PR comes out with additional details about the crash, it will become more apparent

github-actions · 2026-03-07T03:52:39Z

This PR has been labeled as stale due to lack of activity. It will be automatically closed if there is no further activity over the next 7 days.

bidetofevil · 2026-03-10T18:21:08Z

So who can approve this or give me guidance on what is blocking it from approval? @open-telemetry/semconv-client-approvers?

LikeTheSalad · 2026-03-11T07:36:50Z

So who can approve this or give me guidance on what is blocking it from approval? @open-telemetry/semconv-client-approvers?

There are a couple of pending questions, at least from my side:

Some regarding the crash.id attribute: Definition for a basic end-user app crash event #3448 (comment)
Some regarding the async flag, here: Definition for a basic end-user app crash event #3448 (comment)

Can you take a look at those, @bidetofevil ?

bidetofevil · 2026-03-11T16:13:07Z

Oh shoot. Sorry I missed these @LikeTheSalad. I'll take a look.

github-actions · 2026-03-26T04:08:00Z

This PR has been labeled as stale due to lack of activity. It will be automatically closed if there is no further activity over the next 7 days.

linux-foundation-easycla · 2026-04-14T16:10:45Z

bidetofevil · 2026-04-14T16:21:27Z

I eff the commit history. Will fix this.

bidetofevil · 2026-04-14T16:35:54Z

Reopen with proper rebasing and updates per discussion

LikeTheSalad

Thanks!

LikeTheSalad · 2026-04-15T06:50:32Z

@bidetofevil, I think you need to regenerate the markdown files to sort the CI check error.

bidetofevil · 2026-04-21T14:45:17Z

Ugh, I thought I checked already. Will try again.

bidetofevil · 2026-04-21T16:18:05Z

Removed service.name as an optional attribute for now given that the markdown generated refers to a markdown file that doesn't exist. Having a different service name is really truly an async use case that needs to be called out, though, so removing it for now isn't a bad idea anyway.

thompson-tomo · 2026-04-30T04:54:39Z

+      - ref: service.version
+        requirement_level: opt_in


Should this also be removed given we removed service.name via ff0fe1c & we already have app.build_id?

Build IDs aren't often used to track different version. There's also no sem-ver so you can only tell if one is different than the other, whereas version gives you a lot richer information. This is almost always preferred when folks are tagging telemetry from mobile apps as a point of code differentiation.

Yes but that meaning/understanding is only possible when you know what it corresponds to. In the case here we already removed service.name hence we don't know what it is for.

The removal of service.name was because it was relatively low value and there's an unrelated issue that prevented the MD from being generated property (it contains text with a broken reference link when ported to this event). But the idea that there could be a different service.version still exists even if we don't explicitly say it was from an async recording of the event. In fact, it is still an important attribute - to know a crash came in a version that is different than the one producing the event.

fractalwrench

LGTM

LikeTheSalad · 2026-05-04T11:59:01Z

@lmolkova is there anyone specifically that we should ask for approval?

bidetofevil requested review from a team as code owners February 17, 2026 21:40

github-project-automation Bot added this to Semantic Conventions Triage Feb 17, 2026

github-project-automation Bot moved this to Untriaged in Semantic Conventions Triage Feb 17, 2026

github-actions Bot added enhancement New feature or request area:app labels Feb 17, 2026

bidetofevil mentioned this pull request Feb 17, 2026

Mobile crash event #3441

Closed

3 tasks

LikeTheSalad reviewed Feb 18, 2026

View reviewed changes

thompson-tomo reviewed Feb 18, 2026

View reviewed changes

thompson-tomo mentioned this pull request Feb 19, 2026

Add support for exception cause #3458

Open

thompson-tomo reviewed Feb 19, 2026

View reviewed changes

breedx-splk mentioned this pull request Feb 20, 2026

Rename the crash in CrashReporterInstrumentation.name from crash to java_crash as in future we might support native crash from android open-telemetry/opentelemetry-android#1484

Open

lmolkova moved this from Untriaged to Awaiting codeowners approval in Semantic Conventions Triage Feb 20, 2026

github-actions Bot added the Stale label Mar 7, 2026

bidetofevil removed the Stale label Mar 10, 2026

LikeTheSalad mentioned this pull request Mar 24, 2026

Android SDK (alpha) production readiness & crash tracking open-telemetry/opentelemetry-android#1666

Closed

github-actions Bot added the Stale label Mar 26, 2026

github-actions Bot closed this Apr 3, 2026

bidetofevil reopened this Apr 7, 2026

github-actions Bot removed the Stale label Apr 8, 2026

bidetofevil requested review from a team as code owners April 14, 2026 16:10

bidetofevil requested review from a team and tigrannajaryan as code owners April 14, 2026 16:10

bidetofevil closed this Apr 14, 2026

bidetofevil force-pushed the basic-crash-event branch from cdb06b4 to f4fdde1 Compare April 14, 2026 16:22

bidetofevil added 6 commits April 14, 2026 09:33

SemConv for a basic app crash event

0ac5ff9

Strip down basic attributes for app.crash

8f50d0f

Add changelog yaml

4a00390

Add guidence of how to use exception attributes

1e6d5e0

Update after review comments

bbe47d7

Remove app.async attribute definition for now

4a1a5b7

bidetofevil reopened this Apr 14, 2026

LikeTheSalad approved these changes Apr 14, 2026

View reviewed changes

LikeTheSalad mentioned this pull request Apr 20, 2026

Renaming crash reporter event open-telemetry/opentelemetry-android#1691

Open

bidetofevil and others added 3 commits April 21, 2026 07:53

Merge branch 'open-telemetry:main' into basic-crash-event

c2f8cb4

Update markdown

6b6413c

Remove service.name as an optional attribute for now

ff0fe1c

breedx-splk approved these changes Apr 21, 2026

View reviewed changes

lmolkova moved this from Awaiting codeowners approval to Needs More Approval in Semantic Conventions Triage Apr 27, 2026

thompson-tomo reviewed Apr 30, 2026

View reviewed changes

$fractalwrench$

fractalwrench approved these changes Apr 30, 2026

View reviewed changes

	information like `app.build.id` SHOULD be provided to make the data useful for
	information like `app.build_id` SHOULD be provided to make the data useful for

Conversation

bidetofevil commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Merge requirement checklist

Uh oh!

LikeTheSalad left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Mar 7, 2026

Uh oh!

bidetofevil commented Mar 10, 2026

Uh oh!

LikeTheSalad commented Mar 11, 2026

Uh oh!

bidetofevil commented Mar 11, 2026

Uh oh!

github-actions Bot commented Mar 26, 2026

Uh oh!

linux-foundation-easycla Bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bidetofevil commented Apr 14, 2026

Uh oh!

bidetofevil commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LikeTheSalad left a comment

Choose a reason for hiding this comment

Uh oh!

LikeTheSalad commented Apr 15, 2026

Uh oh!

bidetofevil commented Apr 21, 2026

Uh oh!

bidetofevil commented Apr 21, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fractalwrench left a comment

Choose a reason for hiding this comment

bidetofevil commented Feb 17, 2026 •

edited

Loading

linux-foundation-easycla Bot commented Apr 14, 2026 •

edited

Loading

bidetofevil commented Apr 14, 2026 •

edited

Loading

$@fractalwrench$ fractalwrench left a comment