Log record receipt timestamp #1875

pmm-sumo · 2021-08-18T18:58:45Z

What are you trying to achieve?

Currently, logs data model describes a single (optional) Timestamp field which describes when the event occurred

Unlike for e.g. spans, there are at least three ways a log record can have timestamp associated:

It can be set by a library supporting OTLP natively
It can be parsed out from the log Body or Attributes (e.g. using timestamp operator for filelogreceiver)
It can be set to receipt timestamp (which is a good fallback when the others are not available or fail)

While the first two can be considered mutually exclusive (if the record had timestamp already assigned, it does not make much sense to attempt parsing log body for it), this is not true with the last one. However, there's only one field where all of them can fit.

This brings several problems, such as:

Inability to fallback to receipt timestamp when parsing yielded incorrect value (also, no easy ability to find records with potential timestamp parsing errors)
No way to to tell using what method the timestamp was assigned to the record
Lack of data to measure delay between log occurrence and log being collected

E.g. currently filelogreceiver sets Timestamp to receipt time - by default, or parsed timestamp - when timestamp operator is used. This makes it impossible to tell later if the log timestamp was parsed or not.

Proposed solution

Perhaps we could add another field to the log record which would store the receipt timestamp (ReceiptTimestamp). It would be filled in the record on the first receiver that retrieved it and then passed through.

Timestamp field use would be limited only to storing the result of timestamp parsing or the value set explicitly in an OTLP-compatible library.

Alternatives

the receipt timestamp could be instead stored as an optional attribute
we could keep single Timestamp but add flags that would describe how it was created (also, attributes describing which parsing rule/template/regex was picked)

The text was updated successfully, but these errors were encountered:

pmm-sumo · 2021-08-24T16:22:56Z

@bogdandrutu, following discussion during the specs SIG, here's a bunch of cases

Timestamp parsing mistakes

Consider following log:

2021-08-19T11:55:15.498-0600	info	service/collector.go:242	Loading configuration..

While this is a common format, lets assume that the parsing rule missed the timezone information and the timestamp was parsed as: 2021-08-19 11:55:15.498 UTC (while it should have been 2021-08-19 17:55:15.498 UTC).

If we would have receipt timestamp, its value could have been e.g. 2021-08-19 17:55:16.200 UTC. Having such data, the user could run a query to find records where both values differ significantly and would notice there must be some issue with the parsing rules, making it easier to spot such errors and fix them.

Logs without date/time

It is possible to get into even bigger timestamp parsing errors in some edge cases. E.g. when the raw log does not include real date/time but the regex could still catch it and attempt parsing it or when the date present in message does not describe the event timestamp but something else. This can be blamed to some extent on the ill-configuration of parsing rules, but having an ability to match the parsed timestamp against the receipt timestamp could be helpful in detecting such issues

Last successful login at Aug 12th, 8:45:11

Understanding how the timestamp was assigned to know if it could be overwritten with (re)parsing

Additionally, in certain cases the user might prefer to leave the parsing to the backend (vendor). This is currently not well supported since we do not have information if the timestamp associated with the record is either:

a receipt timestamp (set as default and acting as a fallback)
a parsed timestamp
timestamp coming from a first-party application (which should be never overwritten)

This might be solved by the second of the alternatives I listed (storing info/flag how the timestamp was associated)

tigrannajaryan · 2021-09-15T00:16:30Z

Sorry for posting this late. We discussed this in last SIG meeting but the discussion was not captured anywhere.

Proposed solution

Perhaps we could add another field to the log record which would store the receipt timestamp (ReceiptTimestamp). It would be filled in the record on the first receiver that retrieved it and then passed through.

Timestamp field use would be limited only to storing the result of timestamp parsing or the value set explicitly in an OTLP-compatible library.

I think adding a second timestamp field can be justified if we have a strong evidence that it will be populated in significant portion of cases. I am not certain that is going to be the case. Also, it is possible to imagine more than 2 types of timestamps. Are we going to add all those possible types of timestamps as fields and have them almost always unspecified?

Alternatives

the receipt timestamp could be instead stored as an optional attribute

This does sound interesting. We could define any number of semantic conventions for interesting timestamps and they could be set as optional attributes on the log record.

we could keep single Timestamp but add flags that would describe how it was created (also, attributes describing which parsing rule/template/regex was picked)

Would this flag be an enumeration, a choice from a fixed list or something more complicated? How would a parsing rule be described?

pmm-sumo · 2021-09-15T07:36:35Z

Proposed solution
(...)
Timestamp field use would be limited only to storing the result of timestamp parsing or the value set explicitly in an OTLP-compatible library.

I think adding a second timestamp field can be justified if we have a strong evidence that it will be populated in significant portion of cases. I am not certain that is going to be the case. Also, it is possible to imagine more than 2 types of timestamps. Are we going to add all those possible types of timestamps as fields and have them almost always unspecified?

I believe that it depends on the log source. I have prepared a summary below. ReceiptTimestamp would come largely from file receiver whenever there's a chance that timestamp might be not accurately determined (the parsed value could be empty or incorrect)

	Timestamp	ReceiptTimestamp
System Logs via File	parsed	yes
System Logs via wire (e.g. Windows Event Logs)	set	maybe
Infra Logs (e.g. Kubernetes Events)	set	maybe
3rd-party App Logs via File/Stdout	parsed	yes
3rd-party App Logs via structured format (e.g. Windows Event Log)	set	maybe
Legacy 1st-party App Logs via File/Stdout	parsed	yes
Legacy 1st-party App Logs via Direct to Collector	set	no
New First-Party Application Logs	set	no

Alternatives

the receipt timestamp could be instead stored as an optional attribute

This does sound interesting. We could define any number of semantic conventions for interesting timestamps and they could be set as optional attributes on the log record.

Yes, that's serving the same goal as the proposed solution. It's just that in certain workloads all logs might have this field set so perhaps it could be more efficient to keep it as a field rather than attribute (especially since it would be present for each record).

we could keep single Timestamp but add flags that would describe how it was created (also, attributes describing which parsing rule/template/regex was picked)

Would this flag be an enumeration, a choice from a fixed list or something more complicated? How would a parsing rule be described?

I think it would be an enumeration. Some possible values:

Set (e.g. for new first-party log source or transforming from other structured format)
Parsed (e.g. when the value was parsed by filelogreceiver)
Receipt (e.g. when timestamp parsing didn't match anything)

djaglowski · 2021-09-15T18:32:56Z

I am in favor of adding ReceiptTimestamp as a top level field on the log data model, but I propose the following:

OTel components should generally consider both Timestamp and ReceiptTimestamp to be immutable.
ReceiptTimestamp is required. It should represent the first time that an OTel component interacted with the log.
Timestamp is optional. It may be initialized by a first-party source, or by any component in the pipeline that can extract a meaningful timestamp from the log.

tigrannajaryan · 2021-11-10T16:16:47Z

• Inability to fallback to receipt timestamp when parsing yielded incorrect value (also, no easy ability to find records with potential timestamp parsing errors)

@pmm-sumo It is not clear why this is so. If the parsing failed with an error then we will not set the timestamp field, so it will be empty. If the parsing failed silently and produced some value how will we know it is incorrect even if it is recorded in a different field? What is “incorrect” in this context?

pmm-sumo · 2021-11-10T16:35:38Z

• Inability to fallback to receipt timestamp when parsing yielded incorrect value (also, no easy ability to find records with potential timestamp parsing errors)

@pmm-sumo It is not clear why this is so. If the parsing failed with an error then we will not set the timestamp field, so it will be empty. If the parsing failed silently and produced some value how will we know it is incorrect even if it is recorded in a different field? What is “incorrect” in this context?

Let's consider today's date, which I would write down as 10/11/21. In some countries it could be written as 11/10/21. If the parser is set incorrectly, it might yield October 11th rather than November 10th. This will not produce error but would store incorrect date otherwise.

Having the receipt timestamp (and observing it's so far away from the parsed date) would make it much easier to spot such incorrect parsing instances

tigrannajaryan · 2021-11-10T16:38:19Z

A slightly different approach is to re-define the current Timestamp field to mean "the time when Otel first observed this event". For Otel SDK-based sources this will be the generation time. For other source this will be the receipt/collection time by Collector. This is similar to what @djaglowski describes as "It should represent the first time that an OTel component interacted with the log".

This Timestamp field will be always present in any OTLP data, in any log records coming out of Otel SDK and any log records coming out of Otel Collector. With such re-definition we can also entertain the idea to make this field mandatory. Essentially anyone who creates an OTLP record MUST populate this field to some value.

One possible downside of this approach is that it is not possible to tell if the timestamp was natively generated at the source or was "assigned" by the Otel component that first observed the event, which can be much later than the original generation time). Question: does it matter? And if it matters is it not solved by having the ParsedTimestamp (see next paragraph)?

The other timestamp field can be called ParsedTimestamp and will be optional. It can be set by the Collector when reading from filelog or by anyone who sends OTLP if they happen to do parsing. So, this is a field that is specifically there for better support of legacy log formats, which is an explicit goal that we have, so it sounds justified to me.

Question: is there any other timestamp kind other than these 2 that we want?

--

The reason I suggest this alternate approach is because otherwise I am having a hard time defining the semantics of the Timestamp field precisely and in a concise manner. We should not expect to provide a table like the one @pmm-sumo posted above so that we can have a precise of definition of what Timestamp field represents.

tigrannajaryan · 2021-11-10T16:43:06Z

Side note: we have flags field in OTLP which has a lot of reserved bits, so if we wanted to record any indicators about the timestamps that would be fairly cheap to do. For example if we need to know whether the Timestamp is generated by Otel SDK or populated by Collector when it saw the event the first time that would be a single bit.

pmm-sumo · 2021-11-10T16:44:51Z

I like the alternative approach. I think it is an elegant solution. I don't have any other common timestamp fields on my mind. We can ask for those during the SIG. I those would be present, they could be also made part of semantic conventions.

tigrannajaryan · 2021-11-10T19:22:40Z

From Log SIG today:

We probably want 2 fields: ObservedTimestamp and SourceTimestamp (names preliminary). ObservedTimestamp is the time when event was first observed in Otel (generated by Otel SDK or collected by Otel Collector). SourceTimestamp is what we could extract from the source (e.g. parse from file, copy from another protocol, etc).

For backends the best timestamp to use would be:

SourceTimestamp if present.
ObservedTimestamp otherwise.

If SourceTimestamp and ObservedTimestamp differ significantly it can be an indication that something went wrong (lagging collection, incorrect parsing, etc).

(Perhaps we keep the current name of the field Timestamp and it is the ObservedTimestamp described above).

djaglowski · 2021-11-10T19:59:20Z

In addition to what Tigran captured, I think it's important to note that first party applications would directly populate the SourceTimestamp as the log is created.

Perhaps we keep the current name of the field Timestamp and it is the ObservedTimestamp described above

If we are to keep the current name of the Timestamp field, I think it should represent the time the log/event was emitted, to the best of our knowledge. (e.g. it should mean SourceTimestamp). My rationale is that the SourceTimestamp is fundamentally associated with the event itself, but that the ObservedTimestamp is only context about how the event was processed.

tigrannajaryan · 2021-12-02T16:05:26Z

@pmm-sumo @djaglowski should we move forward with this? I think we are in virtually in agreement, except perhaps the naming of the fields. If we want to change the name and clarify the meaning of the Timestamp field I believe we should do it before we declare the data model stable.

pmm-sumo · 2021-12-02T17:58:48Z

@tigrannajaryan this solves my original problem I mentioned in the issue so green light from my end.

I don't have strong opinion on names. Would be nice to keep Timestamp, but I am not sure which of the two is a better fit to be honest.

BTW, I am looking at ECS and I see three kinds of timestamp are listed there. They seem to follow model used for file log collection (which simplifies the options):

@timestamp - (...) typically representing when the event was generated by the source. If the event source has no original timestamp, this value is typically populated by the first time the event was received by the pipeline

event.created - (...) when the event was first read by an agent, or by your pipeline (...) This can be used to monitor your agent’s or pipeline’s ability to keep up with your event source.

event.ingested - when an event arrived in the central data store

Assuming no tampering, the timestamps should chronologically look like this:
@timestamp < event.created < event.ingested.

tigrannajaryan · 2021-12-02T18:20:48Z

What do you think about these definitions for our data model:

Timestamp - Time when the event occurred measured by the origin clock, i.e. the time at the source. This field is optional, it may be missing if the source timestamp is unknown.
ObservedTimestamp - Time when the event was observed by the collection system. For events that originate in OpenTelemetry this timestamp is typically set at the generation time and is equal to Timestamp. For events originating externally and collected by OpenTelemetry (e.g. using Collector) this is the time when any of OpenTelemetry's code observed the event measure by the clock of the OpenTelemetry code. This field SHOULD be set once the event is observed by OpenTelemetry.

We can also provide guidance to systems which need to receive OpenTelemetry logs but support only one timestamp (also when translating to a non-OTLP format):

Use Timestamp if it is present, otherwise use ObservedTimestamp.

djaglowski · 2021-12-02T18:24:42Z

@tigrannajaryan, I agree we should resolve this before the data model is declared stable, and I support your latest proposed design.

tigrannajaryan · 2021-12-02T19:10:43Z

@pmm-sumo what do you think?

pmm-sumo · 2021-12-02T20:51:27Z

@pmm-sumo what do you think?

Plus one on the recent proposal. I would be actually good with any variation that solves the above issue. I think it fits the use cases we're having

Contributes to open-telemetry#1875 This is part 1 of the change. Part 2 will add the observed timestamp field. See the issue for the discussion and the description of the source vs observed timestamps.

Resolves open-telemetry#1875 See the issue for the discussion and the description of the source vs observed timestamps.

* Add ObservedTimestamp to the Log Data Model Resolves #1875 See the issue for the discussion and the description of the source vs observed timestamps. * Fixed based on PR comments Co-authored-by: Joshua MacDonald <[email protected]>

Contributes to #1875 This is part 1 of the change. Part 2 will add the observed timestamp field. See the issue for the discussion and the description of the source vs observed timestamps.

* Add ObservedTimestamp to the Log Data Model Resolves open-telemetry/opentelemetry-specification#1875 See the issue for the discussion and the description of the source vs observed timestamps. * Fixed based on PR comments Co-authored-by: Joshua MacDonald <[email protected]>

Contributes to open-telemetry/opentelemetry-specification#1875 This is part 1 of the change. Part 2 will add the observed timestamp field. See the issue for the discussion and the description of the source vs observed timestamps.

pmm-sumo added the spec:logs Related to the specification/logs directory label Aug 18, 2021

github-actions bot assigned carlosalberto Aug 18, 2021

tigrannajaryan unassigned carlosalberto Oct 13, 2021

tigrannajaryan added the help wanted Extra attention is needed label Oct 13, 2021

tigrannajaryan added the release:required-logdatamodel-ga Required for declaring log data model stable label Dec 2, 2021

tigrannajaryan mentioned this issue Dec 2, 2021

Clarify the semantics of the Log Timestamp field #2183

Merged

tigrannajaryan added a commit to tigrannajaryan/opentelemetry-specification that referenced this issue Dec 2, 2021

Add ObservedTimestamp to the Log Data Model

f14d635

Resolves open-telemetry#1875 See the issue for the discussion and the description of the source vs observed timestamps.

tigrannajaryan added a commit to tigrannajaryan/opentelemetry-specification that referenced this issue Dec 2, 2021

Add ObservedTimestamp to the Log Data Model

1a8751c

Resolves open-telemetry#1875 See the issue for the discussion and the description of the source vs observed timestamps.

tigrannajaryan mentioned this issue Dec 2, 2021

Add ObservedTimestamp to the Log Data Model #2184

Merged

tigrannajaryan added a commit to tigrannajaryan/opentelemetry-specification that referenced this issue Dec 2, 2021

Add ObservedTimestamp to the Log Data Model

ff3f5f5

Resolves open-telemetry#1875 See the issue for the discussion and the description of the source vs observed timestamps.

tigrannajaryan added a commit to tigrannajaryan/opentelemetry-specification that referenced this issue Dec 10, 2021

Add ObservedTimestamp to the Log Data Model

e7ba2e7

Resolves open-telemetry#1875 See the issue for the discussion and the description of the source vs observed timestamps.

jmacd closed this as completed in #2184 Dec 10, 2021

tigrannajaryan mentioned this issue Jan 18, 2022

Add ObservedTimestamp to LogRecord open-telemetry/opentelemetry-proto#351

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Log record receipt timestamp #1875

Log record receipt timestamp #1875

pmm-sumo commented Aug 18, 2021 •

edited

Loading

pmm-sumo commented Aug 24, 2021 •

edited

Loading

tigrannajaryan commented Sep 15, 2021

pmm-sumo commented Sep 15, 2021

djaglowski commented Sep 15, 2021

tigrannajaryan commented Nov 10, 2021

pmm-sumo commented Nov 10, 2021

tigrannajaryan commented Nov 10, 2021

tigrannajaryan commented Nov 10, 2021 •

edited

Loading

pmm-sumo commented Nov 10, 2021

tigrannajaryan commented Nov 10, 2021

djaglowski commented Nov 10, 2021

tigrannajaryan commented Dec 2, 2021 •

edited

Loading

pmm-sumo commented Dec 2, 2021

tigrannajaryan commented Dec 2, 2021

djaglowski commented Dec 2, 2021

tigrannajaryan commented Dec 2, 2021

pmm-sumo commented Dec 2, 2021

Log record receipt timestamp #1875

Log record receipt timestamp #1875

Comments

pmm-sumo commented Aug 18, 2021 • edited Loading

pmm-sumo commented Aug 24, 2021 • edited Loading

tigrannajaryan commented Sep 15, 2021

pmm-sumo commented Sep 15, 2021

djaglowski commented Sep 15, 2021

tigrannajaryan commented Nov 10, 2021

pmm-sumo commented Nov 10, 2021

tigrannajaryan commented Nov 10, 2021

tigrannajaryan commented Nov 10, 2021 • edited Loading

pmm-sumo commented Nov 10, 2021

tigrannajaryan commented Nov 10, 2021

djaglowski commented Nov 10, 2021

tigrannajaryan commented Dec 2, 2021 • edited Loading

pmm-sumo commented Dec 2, 2021

tigrannajaryan commented Dec 2, 2021

djaglowski commented Dec 2, 2021

tigrannajaryan commented Dec 2, 2021

pmm-sumo commented Dec 2, 2021

pmm-sumo commented Aug 18, 2021 •

edited

Loading

pmm-sumo commented Aug 24, 2021 •

edited

Loading

tigrannajaryan commented Nov 10, 2021 •

edited

Loading

tigrannajaryan commented Dec 2, 2021 •

edited

Loading