Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/workflows/rust-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -326,6 +326,9 @@ jobs:
- name: cargo xtask ${{ matrix.folder }}
run: cargo xtask structure-check
working-directory: ./rust/${{ matrix.folder }}
- name: check telemetry macro usage
run: ./scripts/check-direct-telemetry-macros.sh
working-directory: ./rust/${{ matrix.folder }}

document-build-options:
if: github.event_name == 'pull_request'
Expand Down
8 changes: 8 additions & 0 deletions rust/otap-dataflow/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,14 @@ project-level [CONTRIBUTING][] document.

Run `cargo xtask check` to check the structure of the project.

## Telemetry and logging

All internal logging MUST use the `otel_*` macros from `otap_df_telemetry`
(not `tracing::info!` or `println!`). See the
[Events Guide](docs/telemetry/events-guide.md) details.

TODO: Add metrics information

## Building a Docker image

Run `docker build --build-context otel-arrow=../../ -f Dockerfile -t df_engine .`
91 changes: 84 additions & 7 deletions rust/otap-dataflow/docs/telemetry/events-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,13 +36,89 @@ Exception rule (traces):
- If you are recording an actual exception on a span, the regular event name
MUST be `exception` and the standard exception attributes MUST be used.

In this project, events are preferred to unstructured logs. Event names are
codified (see below), and their attributes consist of the attributes of the
relevant entity or entities (stable context), combined with event-specific
attributes (dynamic context).
## How to emit events in code

Treat event names as schema identifiers. Evolution rules are defined in
[Stability and Compatibility Guide](stability-compatibility-guide.md).
All events MUST be emitted using the `otel_*` macros from the
`otap_df_telemetry` crate. **Do not** use `tracing::info!`, `log::info!`, or
`println!` directly. This rule is enforced by
`scripts/check-direct-telemetry-macros.sh` (run in CI).

**Why wrappers instead of raw `tracing` macros?**

- **Mandatory event name.** The first argument to every `otel_*!` macro is the
event name. Raw `tracing` macros do not require one, and their default name
includes the file path and line number -- which is not durable and breaks
filtering, alerting, and dashboards whenever code is moved or reformatted.
- **Automatic `target`.** The wrappers set the tracing `target` field to the
crate name (`env!("CARGO_PKG_NAME")`) automatically. With raw `tracing`
macros the default target is the module path, which is an internal
implementation detail and can change without notice.

### Available macros

| Macro | Severity |
|---|---|
| `otel_debug!` | DEBUG |
| `otel_info!` | INFO |
| `otel_warn!` | WARN |
| `otel_error!` | ERROR |

### Basic usage

The first argument is always the **event name** (a string literal). Optional
key-value pairs follow as structured attributes.

```rust
use otap_df_telemetry::otel_info;

// Event name only (no attributes):
otel_info!("pipeline.run.start");

// Event name with attributes:
otel_info!("receiver.grpc.start",
endpoint = %addr,
);
```

### The `message` attribute

Use an attribute named **`message`** when the event name alone is not sufficient
to convey what happened. This value is mapped to the OTel LogRecord **body**,
making it the primary text shown in log viewers, consoles, and observability UIs.

Not every event needs a `message` -- if the event name is self-explanatory,
omit it. Avoid messages that just restate the event name; they add no value.

```rust
// Bad -- message just restates the event name:
otel_info!("pipeline.run.start",
message = "Pipeline run started",
);

// Good -- event name says it all, no message needed:
otel_info!("pipeline.run.start");

// Good -- message explains consequences beyond what the event name conveys:
otel_warn!("core_affinity.set_failed",
message = "Failed to set core affinity for pipeline thread. Performance may be less predictable.",
);
```

### Attribute formatting

The macros support `tracing`-style formatting hints:

- `%value` -- Display formatting (`fmt::Display`)
- `?value` -- Debug formatting (`fmt::Debug`)
- `value` -- passed directly (integers, booleans, etc.)

```rust
otel_info!("node.connect",
endpoint = %addr,
config = ?node_config,
count = 42,
);
```
Comment on lines +111 to +121
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!


## Event naming

Expand Down Expand Up @@ -71,7 +147,8 @@ Where:
- `<verb>` is the action or occurrence (e.g. `start`, `complete`, `fail`,
`reload`, `shutdown`).

Note: OpenTelemetry Events are represented as LogRecords with an event name.
Note: OpenTelemetry Events are represented as LogRecords with an
[event name](https://github.com/open-telemetry/opentelemetry-specification/blob/v1.50.0/specification/logs/data-model.md#field-eventname).
In OTLP, this is carried in the LogRecord `event_name` field (not in the body).

## Attributes and context
Expand Down
111 changes: 111 additions & 0 deletions rust/otap-dataflow/scripts/check-direct-telemetry-macros.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
#!/bin/bash
Copy link
Copy Markdown
Member

@albertlockett albertlockett Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this script catch the case where someone uses the macro in a way where, for example, the case tracing::warn! doesn't actually appear in the file? e.g.:

use tracing::warn;
// ...

warn!("...");

Would it be easier or more effective to just use the clippy rule for banned macros? We already do this for lazy_static:

disallowed-macros = [
{ path = "lazy_static::lazy_static", reason = "Please use `std::sync::LazyLock` instead." },
]

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started with clippy for this, but clippy wasn't able to distinguish "the developer wrote tracing::info!" from "the developer wrote otel_info! which expanded to tracing::info!". The script here was my cheap way to do some enforcement, while I can find better ways!

# Script to detect direct use of tracing/log macros instead of otel_* wrappers.
# All events MUST be emitted using the otel_* macros from otap_df_telemetry.
# See docs/telemetry/events-guide.md
#
# Usage: ./scripts/check-direct-telemetry-macros.sh

set -e

echo "🔍 Checking for direct use of tracing/log macros (should use otel_* macros instead)..."

# Function to check for a banned pattern in .rs files
check_banned_pattern() {
local pattern="$1"
local description="$2"
local files

# Find all .rs files, excluding:
# - target/ build artifacts
# - .git/ git internals
# - crates/telemetry/ implements the otel_* wrappers (legitimate tracing usage)
# - crates/quiver/ will move out of this repo (temporary exemption)
# - crates/quiver-e2e/ quiver end-to-end tests (temporary exemption)
# - benchmarks/ benchmarks may legitimately use tracing directly
files=$(find . -name "*.rs" -type f \
| grep -v target/ \
| grep -v .git/ \
| grep -v crates/telemetry/ \
| grep -v crates/quiver/ \
| grep -v crates/quiver-e2e/ \
| grep -v benchmarks/ \
|| true)

if [ -z "$files" ]; then
return 0
fi

local matches
matches=$(echo "$files" | xargs grep -n "$pattern" 2>/dev/null || true)

if [ -n "$matches" ]; then
echo "⚠️ Found direct $description:"
echo "$matches"
echo ""
return 1
fi
return 0
}

checks_passed=0

# Check for fully-qualified tracing macro calls
if ! check_banned_pattern "tracing::info!" "tracing::info! usage (use otel_info! instead)"; then
checks_passed=1
fi

if ! check_banned_pattern "tracing::warn!" "tracing::warn! usage (use otel_warn! instead)"; then
checks_passed=1
fi

if ! check_banned_pattern "tracing::error!" "tracing::error! usage (use otel_error! instead)"; then
checks_passed=1
fi

if ! check_banned_pattern "tracing::debug!" "tracing::debug! usage (use otel_debug! instead)"; then
checks_passed=1
fi

if ! check_banned_pattern "tracing::trace!" "tracing::trace! usage (use otel_debug! instead)"; then
checks_passed=1
fi

# Check for fully-qualified log crate macro calls
if ! check_banned_pattern "log::info!" "log::info! usage (use otel_info! instead)"; then
checks_passed=1
fi

if ! check_banned_pattern "log::warn!" "log::warn! usage (use otel_warn! instead)"; then
checks_passed=1
fi

if ! check_banned_pattern "log::error!" "log::error! usage (use otel_error! instead)"; then
checks_passed=1
fi

if ! check_banned_pattern "log::debug!" "log::debug! usage (use otel_debug! instead)"; then
checks_passed=1
fi

if ! check_banned_pattern "log::trace!" "log::trace! usage (use otel_debug! instead)"; then
checks_passed=1
fi

if [ $checks_passed -eq 0 ]; then
echo "✅ No direct tracing/log macro usage found!"
echo "ℹ️ All events use otel_* macros as required by docs/telemetry/events-guide.md."
else
echo "❌ Found direct tracing/log macro usage. Use otel_* macros from otap_df_telemetry instead."
echo ""
echo "How to fix:"
echo " • tracing::info!(...) → otel_info!(\"event.name\", ...)"
echo " • tracing::warn!(...) → otel_warn!(\"event.name\", ...)"
echo " • tracing::error!(...) → otel_error!(\"event.name\", ...)"
echo " • tracing::debug!(...) → otel_debug!(\"event.name\", ...)"
echo ""
echo " Import from otap_df_telemetry:"
echo " use otap_df_telemetry::otel_info;"
echo ""
echo " See docs/telemetry/events-guide.md for full details."
exit 1
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a test that there are no space characters in the event name? (Is that a weaver thing?)
(Can it be checked in the macro?)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we need to enforce as much as possible in compile time, and then validate against published schema in CI time (via weaver).

This PR is one baby step towards that Goal.

Loading