Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

log: Add design doc #4809

Merged
merged 90 commits into from
Feb 9, 2024
Merged
Show file tree
Hide file tree
Changes from 27 commits
Commits
Show all changes
90 commits
Select commit Hold shift + click to select a range
993ebd8
log: Add design doc
pellared Jan 2, 2024
1f35f95
[chore] Fix lint-markdown target
pellared Jan 2, 2024
524032c
[chore] Remove dead hyperlinks
pellared Jan 2, 2024
f095e0f
[chore] Remove dead hyperlink
pellared Jan 2, 2024
87f03a4
Refine docs
pellared Jan 3, 2024
6d1481c
Fix LoggerProvider snippet
pellared Jan 3, 2024
49826ff
Add hyperlink to prototype
pellared Jan 3, 2024
efe740e
Update log/DESIGN.md
pellared Jan 5, 2024
b7a4302
Merge branch 'main' into logs-bridge-part-1
pellared Jan 8, 2024
4137781
Merge branch 'main' into logs-bridge-part-1
pellared Jan 9, 2024
3113402
Add a reference to the prototype in benchmakrs section
pellared Jan 9, 2024
90b3857
Clarify sync.Pool usage
pellared Jan 9, 2024
cbc3a94
Update log/DESIGN.md
pellared Jan 9, 2024
4588071
log: Add design doc
pellared Jan 2, 2024
ba7bc62
[chore] Fix lint-markdown target
pellared Jan 2, 2024
b9c0fc9
[chore] Remove dead hyperlinks
pellared Jan 2, 2024
8cebce5
[chore] Remove dead hyperlink
pellared Jan 2, 2024
c66a91b
Refine docs
pellared Jan 3, 2024
a37b716
Fix LoggerProvider snippet
pellared Jan 3, 2024
0e82c0f
Add hyperlink to prototype
pellared Jan 3, 2024
8b45561
Update log/DESIGN.md
pellared Jan 5, 2024
621be00
Add a reference to the prototype in benchmakrs section
pellared Jan 9, 2024
2f8c6e4
Clarify sync.Pool usage
pellared Jan 9, 2024
c20bcc2
Update log/DESIGN.md
pellared Jan 9, 2024
2f0c54d
Make design decision more explicit
pellared Jan 9, 2024
c503b48
Attributes are defined as a slice
pellared Jan 9, 2024
57b5bdd
Define SeverityNumber
pellared Jan 9, 2024
61f9683
Define LoggerOptions
pellared Jan 10, 2024
fa72f4c
Define noop package
pellared Jan 10, 2024
e126afa
Define all parameters
pellared Jan 10, 2024
e525210
Clarify Record.Body
pellared Jan 10, 2024
ec43527
Document Record.StructuredBody alternative
pellared Jan 10, 2024
917a9a2
Fix typos
pellared Jan 10, 2024
8a33bb4
Merge branch 'main' into logs-bridge-part-1
pellared Jan 10, 2024
2cd7cae
Fix typo
pellared Jan 10, 2024
20ecb0e
Merge branch 'logs-bridge-part-1' of https://github.com/pellared/open…
pellared Jan 10, 2024
51a9bb7
Apply suggestions from code review
pellared Jan 10, 2024
3f25db7
Clarify trace context correlation
pellared Jan 10, 2024
b7a96bc
Merge branch 'main' into logs-bridge-part-1
pellared Jan 11, 2024
8d0da66
Merge branch 'main' into logs-bridge-part-1
pellared Jan 16, 2024
363accf
Merge branch 'main' into logs-bridge-part-1
pellared Jan 16, 2024
90dfb5e
Reference Displaying Severity recommendation
pellared Jan 16, 2024
0cd0677
Add Severity[Level]1 consts
pellared Jan 17, 2024
955ccb8
Simplify severity const block
pellared Jan 17, 2024
d3c0af9
Merge branch 'main' into logs-bridge-part-1
pellared Jan 18, 2024
35ff8f7
Record attributes based on slog.Record
pellared Jan 18, 2024
68986f6
Rejected proposal: Severity type encapsulating number and text
pellared Jan 18, 2024
33bea2e
Merge branch 'main' into logs-bridge-part-1
pellared Jan 18, 2024
60cb10a
The caller must not subsequently mutate the record passed to Emit
pellared Jan 18, 2024
544b371
Refine implementation requirements
pellared Jan 18, 2024
655a44a
Fix grammar
pellared Jan 18, 2024
23dc70d
Merge branch 'main' into logs-bridge-part-1
pellared Jan 19, 2024
bd23513
Update DESIGN.md
pellared Jan 24, 2024
516bc0c
Update DESIGN.md
pellared Jan 24, 2024
a9aeba4
Update DESIGN.md
pellared Jan 24, 2024
c570d87
Merge branch 'main' into logs-bridge-part-1
pellared Jan 24, 2024
cdb4b30
Fix typo
pellared Jan 24, 2024
1b802bd
Update footnotes
pellared Jan 24, 2024
81f9953
Merge branch 'main' into logs-bridge-part-1
pellared Jan 24, 2024
11ac173
Merge branch 'main' into logs-bridge-part-1
pellared Jan 25, 2024
249ecb6
Pass by value
pellared Jan 26, 2024
499da4e
Record with pointer receivers only
pellared Jan 26, 2024
3b526cc
Fix typo
pellared Jan 26, 2024
16a5fbf
Value has As[Kind] methods
pellared Jan 26, 2024
6a47a3c
On fmt.Stringer
pellared Jan 26, 2024
0556320
Merge branch 'main' into logs-bridge-part-1
pellared Jan 29, 2024
ce4751b
Merge branch 'main' into logs-bridge-part-1
pellared Jan 30, 2024
3744193
Update DESIGN.md
pellared Jan 30, 2024
342ebd8
Update DESIGN.md
pellared Jan 30, 2024
f18ce5c
Update DESIGN.md
pellared Jan 30, 2024
67aa37d
Merge branch 'main' into logs-bridge-part-1
pellared Feb 1, 2024
baaa596
Remove KeyValue.Invalid
pellared Feb 1, 2024
9c513c6
Add comments to value snippet
pellared Feb 1, 2024
9ab4546
Update DESIGN.md
pellared Feb 1, 2024
adaab51
Merge branch 'main' into logs-bridge-part-1
pellared Feb 1, 2024
c6e46ff
Fixes
pellared Feb 2, 2024
3f0f1bf
Rewrite Record.Body as any
pellared Feb 2, 2024
bda50d1
Use any instead of defining Value
pellared Feb 2, 2024
82d506b
Update Record attributes as slice
pellared Feb 2, 2024
98bdf81
Fix typo
pellared Feb 2, 2024
ce1dc3b
application -> bridge implementation
pellared Feb 2, 2024
0a4a05e
Merge branch 'main' into logs-bridge-part-1
pellared Feb 6, 2024
ad00cc8
Remove addressed issues
pellared Feb 7, 2024
16c316e
Add Add XYZ method to Logger rejected proposal
pellared Feb 7, 2024
fe8362c
Remove Record.Clone as it is not necessary
pellared Feb 7, 2024
7571081
Rejected proposal: Rename KeyValue to Attr
pellared Feb 7, 2024
b08fead
Fix typo
pellared Feb 7, 2024
a459c58
Merge branch 'main' into logs-bridge-part-1
pellared Feb 7, 2024
f0b9123
Merge branch 'main' into logs-bridge-part-1
pellared Feb 7, 2024
f72a6ed
Merge branch 'main' into logs-bridge-part-1
pellared Feb 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -315,4 +315,4 @@ add-tags: | $(MULTIMOD)

.PHONY: lint-markdown
lint-markdown:
docker run -v "$(CURDIR):$(WORKDIR)" docker://avtodev/markdown-lint:v1 -c $(WORKDIR)/.markdownlint.yaml $(WORKDIR)/**/*.md
docker run -v "$(CURDIR):$(WORKDIR)" avtodev/markdown-lint:v1 -c $(WORKDIR)/.markdownlint.yaml $(WORKDIR)/**/*.md
380 changes: 380 additions & 0 deletions log/DESIGN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,380 @@
# Logs Bridge API

## Abstract

`go.opentelemetry.io/otel/log` provides
[Logs Bridge API](https://opentelemetry.io/docs/specs/otel/logs/bridge-api/).

The initial version of the design and the prototype
was created in [#4725](https://github.com/open-telemetry/opentelemetry-go/pull/4725).

## Background

The key challenge is to create a performant API compliant with the [specification](https://opentelemetry.io/docs/specs/otel/logs/bridge-api/)
with an intuitive and user friendly design.
Performance is seen as one of the most important characteristics of logging libraries in Go.

## Design

This proposed design aims to:

- be specification compliant,
- be similar to Trace and Metrics API,
- take advantage of both OpenTelemetry and `slog` experience to achieve acceptable performance.

### Module structure

The API is published as a single `go.opentelemetry.io/otel/log` Go module.

The module name is compliant with
[Artifact Naming](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/bridge-api.md#artifact-naming)
and the package structure is the same as for Trace API and Metrics API.

The Go module consits of the following packages:
pellared marked this conversation as resolved.
Show resolved Hide resolved

- `go.opentelemetry.io/otel/log`
- `go.opentelemetry.io/otel/log/embedded`
- `go.opentelemetry.io/otel/log/noop`
pellared marked this conversation as resolved.
Show resolved Hide resolved

Rejected alternative:

- [Reuse slog](#reuse-slog)

### LoggerProvider

The [`LoggerProvider` abstraction](https://opentelemetry.io/docs/specs/otel/logs/bridge-api/#loggerprovider)
is defined as an interface:

```go
type LoggerProvider interface {
embedded.LoggerProvider
Logger(name string, options ...LoggerOption) Logger
}
```

The specification may add new operations to `LoggerProvider`.
The interface may have methods added without a package major version bump.
The embedds `embedded.LoggerProvider` to help inform an API implementation
pellared marked this conversation as resolved.
Show resolved Hide resolved
author about this non-standard API evolution.
This approach is already used in Trace API and Metrics API.

#### LoggerProvider.Logger

The `Logger` method implements the [`Get a Logger` operation](https://opentelemetry.io/docs/specs/otel/logs/bridge-api/#get-a-logger).

Implementation requirements:

- The [specification requires](https://opentelemetry.io/docs/specs/otel/logs/bridge-api/#concurrency-requirements)
the method to be safe to be called concurrently.

- The method should use some default name if the passed name is empty
in order to meet the [specification's SDK requirement](https://opentelemetry.io/docs/specs/otel/logs/sdk/#logger-creation)
to return a working logger when an invalid name is passed
as well as to resemble the behavior of getting tracers and meters.

`Logger` can be extended by adding new `LoggerOption` options
and adding new exported fields to the `LoggerConfig` struct.
This design is already used in Trace API for getting tracers
and in Metrics API for getting meters.

Rejected alternative:

- [Passing struct as parameter to LoggerProvider.Logger](#passing-struct-as-parameter-to-loggerproviderlogger).

pellared marked this conversation as resolved.
Show resolved Hide resolved
### Logger

The [`Logger` abstraction](https://opentelemetry.io/docs/specs/otel/logs/bridge-api/#logger)
is defined as an interface:

```go
type Logger interface {
embedded.Logger
pellared marked this conversation as resolved.
Show resolved Hide resolved
Emit(ctx context.Context, record Record)
pellared marked this conversation as resolved.
Show resolved Hide resolved
}
```

The specification may add new operations to `Logger`.
The interface may have methods added without a package major version bump.
The embedds `embedded.Logger` to help inform an API implementation
pellared marked this conversation as resolved.
Show resolved Hide resolved
author about this non-standard API evolution.
This approach is already used in Trace API and Metrics API.

### Logger.Emit

The `Emit` method implements the [`Emit a LogRecord` operation](https://opentelemetry.io/docs/specs/otel/logs/bridge-api/#emit-a-logrecord).

Calls to `Emit` are supposed to be on the hot path.
Therefore, in order to reduce the number of heap allocations,
the [`LogRecord` abstraction](https://opentelemetry.io/docs/specs/otel/logs/bridge-api/#emit-a-logrecord),
is defined as a `struct`:

```go
type Record struct {
Timestamp time.Time
ObservedTimestamp time.Time
Severity Severity
SeverityText string
pellared marked this conversation as resolved.
Show resolved Hide resolved
Body string
pellared marked this conversation as resolved.
Show resolved Hide resolved
Attributes []attribute.KeyValue
pellared marked this conversation as resolved.
Show resolved Hide resolved
}
```

The API defines [`SeverityNumber`](https://opentelemetry.io/docs/specs/otel/logs/data-model/#field-severitynumber)
as a type and constants:

```go
type Severity int
pellared marked this conversation as resolved.
Show resolved Hide resolved

const (
SeverityTrace Severity = iota + 1
pellared marked this conversation as resolved.
Show resolved Hide resolved
SeverityTrace2
SeverityTrace3
SeverityTrace4
SeverityDebug
SeverityDebug2
SeverityDebug3
SeverityDebug4
SeverityInfo
SeverityInfo2
SeverityInfo3
SeverityInfo4
SeverityWarn
SeverityWarn2
SeverityWarn3
SeverityWarn4
SeverityError
SeverityError2
SeverityError3
SeverityError4
SeverityFatal
SeverityFatal2
SeverityFatal3
SeverityFatal4
)
```

[Log record attributes](https://opentelemetry.io/docs/specs/otel/logs/data-model/#field-attributes)
are defined a regular slice of `attribute.KeyValue`.
The users can use [`sync.Pool`](https://pkg.go.dev/sync#Pool)
for reducing the number of allocations when passing attributes.
pellared marked this conversation as resolved.
Show resolved Hide resolved

Implementation requirements:

- The [specification requires](https://opentelemetry.io/docs/specs/otel/logs/bridge-api/#concurrency-requirements)
the method to be safe to be called concurrently.

- The method should not interrupt the record processing if the context is canceled
per ["ignoring context cancellation" guideline](../CONTRIBUTING.md#ignoring-context-cancellation).

- The method handle the trace context passed via `ctx` argument in order to meet the
[specification's SDK requirement](https://opentelemetry.io/docs/specs/otel/logs/sdk/#readablelogrecord)
to populate the trace context fields from the resolved context.

- The method should not modify the record's attribute as it may be reused by caller.

- The method should copy the record's attributes in case of asynchronous processing
pellared marked this conversation as resolved.
Show resolved Hide resolved
as otherwise it may lead to data races as the user may modify the passed attributes
e.g. because of using `sync.Pool` to reduce the number of allocation.

- The [specification requires](https://opentelemetry.io/docs/specs/otel/logs/bridge-api/#emit-a-logrecord)
use the current time as observed timestamp if the passed is empty.

`Emit` can be extended by adding new exported fields to the `Record` struct.

Rejected alternatives:

- [Record as interface](#record-as-interface)
- [Options as parameter to Logger.Emit](#options-as-parameter-to-loggeremit)
- [Passing record as pointer to Logger.Emit](#passing-record-as-pointer-to-loggeremit)
- [Logger.WithAttributes](#loggerwithattributes)
- [Record attributes like in slog.Record](#record-attributes-like-in-slogrecord)

## Trace context correlation

The bridge implementation should do its best to pass
the `ctx` containing the trace context from the caller
so it can later passed via `Logger.Emit`.
pellared marked this conversation as resolved.
Show resolved Hide resolved
Re-constructing a `context.Context` with [`trace.ContextWithSpanContext`](https://pkg.go.dev/go.opentelemetry.io/otel/trace#ContextWithSpanContext)
and [`trace.NewSpanContext`](https://pkg.go.dev/go.opentelemetry.io/otel/trace#NewSpanContext)
would usually involve more memory allocations.
MrAlias marked this conversation as resolved.
Show resolved Hide resolved

The logging libraries which have recording methods that accepts `context.Context`,
such us [`slog`](https://pkg.go.dev/log/slog),
[`logrus`](https://pkg.go.dev/github.com/sirupsen/logrus)
[`zerolog`](https://pkg.go.dev/github.com/rs/zerolog),
makes passing the trace context trivial.

However, some libraries do not accept a `context.Context` in their recording methods.
Structured logging libraries,
such as [`logr`](https://pkg.go.dev/github.com/go-logr/logr)
and [`zap`](https://pkg.go.dev/go.uber.org/zap),
offer passing `any` type as a log attribute/field.
Therefore, their bridge implementations can define a "special" log attributes/field
that will be used to capture the trace context.

## Benchmarking

The benchmarks take inspiration from [`slog`](https://pkg.go.dev/log/slog),
because for the Go team it was also critical to create API that would be fast
and interoperable with existing logging packages.[^1][^2]
pellared marked this conversation as resolved.
Show resolved Hide resolved

The benchmark results can be found in [the prototype](https://github.com/open-telemetry/opentelemetry-go/pull/4725).

## Rejected Alternatives

### Reuse slog

The API must not be coupled to [`slog`](https://pkg.go.dev/log/slog),
nor any other logging library.

The API needs to evolve orthogonally to `slog`.

`slog` is not compliant with the [Logs Bridge API](https://opentelemetry.io/docs/specs/otel/logs/bridge-api/).
and we cannot expect the Go team to make `slog` compliant with it.

The interoperabilty can be achieved using [a log bridge](https://opentelemetry.io/docs/specs/otel/glossary/#log-appender--bridge).

You can read more about OpenTelemetry Logs design on [opentelemetry.io](https://opentelemetry.io/docs/concepts/signals/logs/).

### Record as interface

`Record` is defined as a `struct` because of the following reasons.

Log record is a value object without any behavior.
It is used as data input for Logger methods.

The log record resembles the instrument config structs like [metric.Float64CounterConfig](https://pkg.go.dev/go.opentelemetry.io/otel/metric#Float64CounterConfig).

Using `struct` instead of `interface` improves the performance as e.g.
indirect calls are less optimized,
usage of interfaces tend to increase heap allocations.[^2]

### Options as parameter to Logger.Emit

One of the initial ideas was to have:

```go
type Logger interface{
embedded.Logger
Emit(ctx context.Context, options ...RecordOption)
}
```

The main reason was that design would be similar
to the [Meter API](https://pkg.go.dev/go.opentelemetry.io/otel/metric#Meter)
for creating instruments.

However, passing `Record` directly, instead of using options,
is more performant as it reduces heap allocations.[^3]

Another advantage of passing `Record` is that API would not have functions like `NewRecord(options...)`,
which would be used by the SDK and not by the users.

At last, the definition would be similar to [`slog.Handler.Handle`](https://pkg.go.dev/log/slog#Handler)
that was designed to provide optimization opportunities.[^1]

### Passing record as pointer to Logger.Emit

So far the benchmarks do not show differences that would
favor passing the record via pointer (and vice versa).

Passing via value feels safer because of the following reasons.

It follows the design of [`slog.Handler`](https://pkg.go.dev/log/slog#Handler).

It should reduce the possibility of a heap allocation.

The user would not be able to pass `nil`.
Therefore, it reduces the possiblity to have a nil pointer dereference.

### Passing struct as parameter to LoggerProvider.Logger

Similarly to `Logger.Emit`, we could have something like:

```go
type LoggerProvider interface{
embedded.LoggerProvider
Logger(name context.Context, config LoggerConfig)
}
```

The drawback of this idea would be that this would be
a different design from Trace and Metrics API.

The performance of acquiring a logger is not as critical
as the performance of emitting a log record. While a single
HTTP/RPC handler could write hundreds of logs, it should not
create a new logger for each log entry.
The application should reuse loggers whenever possible.

### Logger.WithAttributes

We could add `WithAttributes` to the `Logger` interface.
Then `Record` could be a simple struct with only exported fields.
The idea was that the SDK would implement the performance improvements
instead of doing it in the API.
This would allow having different optimisation strategies.
pellared marked this conversation as resolved.
Show resolved Hide resolved

During the analysis[^4], it occurred that the main problem of this proposal
is that the variadic slice passed to an interface method is always heap allocated.

Moreover, the logger returned by `WithAttribute` was allocated on the heap.

At last, the proposal was not specification compliant.
pellared marked this conversation as resolved.
Show resolved Hide resolved

### Record attributes like in slog.Record

To reduce the number of allocations of the attributes,
the `Record` could be modeled similarly to [`slog.Record`](https://pkg.go.dev/log/slog#Record).
`Record` could have `WalkAttributes` and `AddAttributes` methods,
like [`slog.Record.Attrs`](https://pkg.go.dev/log/slog#Record.Attrs)
and [`slog.Record.AddAttrs`](https://pkg.go.dev/log/slog#Record.AddAttrs),
in order to achieve high-performance when accessing and setting attributes efficiently.
`Record` would have a `AttributesLen` method that returns
the number of attributes to allow slice preallocation
when converting records to a different representation.

However, during the analysis[^5] we decided that having
a simple slice in `Record` is more flexible.

It is possible to achieve better performance, by using [`sync.Pool`](https://pkg.go.dev/sync#Pool).
MrAlias marked this conversation as resolved.
Show resolved Hide resolved

Having a simple `Record` without any logic makes it possible
that the optimisations can be done in API implementation
and bridge implementations.
For instance, in order to reduce the heap allocations of attributes,
the bridge implementation can use a `sync.Pool`.
In such case, the API implementation (SDK) would need to copy the attributes
when the records are processed asynchronously,
in order to avoid use after free bugs and race conditions.

For reference, here is the reason why `slog` does not use `sync.Pool`[^2]:

> We can use a sync pool for records though we decided not to.
You can but it's a bad idea for us. Why?
Because users have control of Records.
Handler writers can get their hands on a record
and we'd have to ask them to free it
or try to free it magically at some some point.
But either way, they could get themselves in trouble by freeing it twice
or holding on to one after they free it.
That's a use after free bug and that's why `zerolog` was problematic for us.
`zerolog` as as part of its speed exposes a pool allocated value to users
if you use `zerolog` the normal way, that you'll see in all the examples,
you will never encounter a problem.
But if you do something a little out of the ordinary you can get
use after free bugs and we just didn't want to put that in the standard library.

We took a different decision, because the key difference is that `slog`
pellared marked this conversation as resolved.
Show resolved Hide resolved
is a logging library and Logs Bridge API is only a logging abstraction.
We want to provide more flexibility and offer better speed.
Unlike `zerolog`, we are not using a `sync.Pool` in the logger implementation,
which would expose to the issue described above.
Instead, `sync.Pool` is used by bridges, which are the users,
rather than implementers, of the API.

[^1]: Jonathan Amsterdam, [The Go Blog: Structured Logging with slog](https://go.dev/blog/slog)
[^2]: Jonathan Amsterdam, [GopherCon Europe 2023: A Fast Structured Logging Package](https://www.youtube.com/watch?v=tC4Jt3i62ns)
[^3]: [Emit definition discussion with benchmarks](https://github.com/open-telemetry/opentelemetry-go/pull/4725#discussion_r1400869566)
[^4]: [Logger.WithAttributes analysis](https://github.com/pellared/opentelemetry-go/pull/3)
[^5]: [Record attributes as field and use sync.Pool for reducing allocations analysis](https://github.com/pellared/opentelemetry-go/pull/4)
Loading