-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log record receipt timestamp #1875
Comments
@bogdandrutu, following discussion during the specs SIG, here's a bunch of cases
Consider following log:
While this is a common format, lets assume that the parsing rule missed the timezone information and the timestamp was parsed as: If we would have receipt timestamp, its value could have been e.g.
It is possible to get into even bigger timestamp parsing errors in some edge cases. E.g. when the raw log does not include real date/time but the regex could still catch it and attempt parsing it or when the date present in message does not describe the event timestamp but something else. This can be blamed to some extent on the ill-configuration of parsing rules, but having an ability to match the parsed timestamp against the receipt timestamp could be helpful in detecting such issues
Additionally, in certain cases the user might prefer to leave the parsing to the backend (vendor). This is currently not well supported since we do not have information if the timestamp associated with the record is either:
This might be solved by the second of the alternatives I listed (storing info/flag how the timestamp was associated) |
Sorry for posting this late. We discussed this in last SIG meeting but the discussion was not captured anywhere.
I think adding a second timestamp field can be justified if we have a strong evidence that it will be populated in significant portion of cases. I am not certain that is going to be the case. Also, it is possible to imagine more than 2 types of timestamps. Are we going to add all those possible types of timestamps as fields and have them almost always unspecified?
This does sound interesting. We could define any number of semantic conventions for interesting timestamps and they could be set as optional attributes on the log record.
Would this flag be an enumeration, a choice from a fixed list or something more complicated? How would a parsing rule be described? |
I believe that it depends on the log source. I have prepared a summary below.
Yes, that's serving the same goal as the proposed solution. It's just that in certain workloads all logs might have this field set so perhaps it could be more efficient to keep it as a field rather than attribute (especially since it would be present for each record).
I think it would be an enumeration. Some possible values:
|
I am in favor of adding
|
@pmm-sumo It is not clear why this is so. If the parsing failed with an error then we will not set the timestamp field, so it will be empty. If the parsing failed silently and produced some value how will we know it is incorrect even if it is recorded in a different field? What is “incorrect” in this context? |
Let's consider today's date, which I would write down as Having the receipt timestamp (and observing it's so far away from the parsed date) would make it much easier to spot such incorrect parsing instances |
A slightly different approach is to re-define the current This One possible downside of this approach is that it is not possible to tell if the timestamp was natively generated at the source or was "assigned" by the Otel component that first observed the event, which can be much later than the original generation time). Question: does it matter? And if it matters is it not solved by having the The other timestamp field can be called Question: is there any other timestamp kind other than these 2 that we want? -- The reason I suggest this alternate approach is because otherwise I am having a hard time defining the semantics of the |
Side note: we have |
I like the alternative approach. I think it is an elegant solution. I don't have any other common timestamp fields on my mind. We can ask for those during the SIG. I those would be present, they could be also made part of semantic conventions. |
From Log SIG today: We probably want 2 fields: ObservedTimestamp and SourceTimestamp (names preliminary). ObservedTimestamp is the time when event was first observed in Otel (generated by Otel SDK or collected by Otel Collector). SourceTimestamp is what we could extract from the source (e.g. parse from file, copy from another protocol, etc). For backends the best timestamp to use would be:
If SourceTimestamp and ObservedTimestamp differ significantly it can be an indication that something went wrong (lagging collection, incorrect parsing, etc). (Perhaps we keep the current name of the field |
In addition to what Tigran captured, I think it's important to note that first party applications would directly populate the SourceTimestamp as the log is created.
If we are to keep the current name of the |
@pmm-sumo @djaglowski should we move forward with this? I think we are in virtually in agreement, except perhaps the naming of the fields. If we want to change the name and clarify the meaning of the |
@tigrannajaryan this solves my original problem I mentioned in the issue so green light from my end. I don't have strong opinion on names. Would be nice to keep BTW, I am looking at ECS and I see three kinds of timestamp are listed there. They seem to follow model used for file log collection (which simplifies the options):
|
What do you think about these definitions for our data model:
We can also provide guidance to systems which need to receive OpenTelemetry logs but support only one timestamp (also when translating to a non-OTLP format):
|
@tigrannajaryan, I agree we should resolve this before the data model is declared stable, and I support your latest proposed design. |
@pmm-sumo what do you think? |
Plus one on the recent proposal. I would be actually good with any variation that solves the above issue. I think it fits the use cases we're having |
Contributes to open-telemetry#1875 This is part 1 of the change. Part 2 will add the observed timestamp field. See the issue for the discussion and the description of the source vs observed timestamps.
Contributes to open-telemetry#1875 This is part 1 of the change. Part 2 will add the observed timestamp field. See the issue for the discussion and the description of the source vs observed timestamps.
Resolves open-telemetry#1875 See the issue for the discussion and the description of the source vs observed timestamps.
Resolves open-telemetry#1875 See the issue for the discussion and the description of the source vs observed timestamps.
Resolves open-telemetry#1875 See the issue for the discussion and the description of the source vs observed timestamps.
Resolves open-telemetry#1875 See the issue for the discussion and the description of the source vs observed timestamps.
* Add ObservedTimestamp to the Log Data Model Resolves #1875 See the issue for the discussion and the description of the source vs observed timestamps. * Fixed based on PR comments Co-authored-by: Joshua MacDonald <[email protected]>
Contributes to #1875 This is part 1 of the change. Part 2 will add the observed timestamp field. See the issue for the discussion and the description of the source vs observed timestamps.
* Add ObservedTimestamp to the Log Data Model Resolves open-telemetry/opentelemetry-specification#1875 See the issue for the discussion and the description of the source vs observed timestamps. * Fixed based on PR comments Co-authored-by: Joshua MacDonald <[email protected]>
Contributes to open-telemetry/opentelemetry-specification#1875 This is part 1 of the change. Part 2 will add the observed timestamp field. See the issue for the discussion and the description of the source vs observed timestamps.
What are you trying to achieve?
Currently, logs data model describes a single (optional)
Timestamp
field which describes when the event occurredUnlike for e.g. spans, there are at least three ways a log record can have timestamp associated:
While the first two can be considered mutually exclusive (if the record had timestamp already assigned, it does not make much sense to attempt parsing log body for it), this is not true with the last one. However, there's only one field where all of them can fit.
This brings several problems, such as:
E.g. currently
filelogreceiver
setsTimestamp
to receipt time - by default, or parsed timestamp - when timestamp operator is used. This makes it impossible to tell later if the log timestamp was parsed or not.Proposed solution
Perhaps we could add another field to the log record which would store the receipt timestamp (
ReceiptTimestamp
). It would be filled in the record on the first receiver that retrieved it and then passed through.Timestamp
field use would be limited only to storing the result of timestamp parsing or the value set explicitly in an OTLP-compatible library.Alternatives
The text was updated successfully, but these errors were encountered: