diff --git a/.github/workflows/rust-ci.yml b/.github/workflows/rust-ci.yml index 59259d262b..f777d3bdd4 100644 --- a/.github/workflows/rust-ci.yml +++ b/.github/workflows/rust-ci.yml @@ -326,6 +326,9 @@ jobs: - name: cargo xtask ${{ matrix.folder }} run: cargo xtask structure-check working-directory: ./rust/${{ matrix.folder }} + - name: check telemetry macro usage + run: ./scripts/check-direct-telemetry-macros.sh + working-directory: ./rust/${{ matrix.folder }} document-build-options: if: github.event_name == 'pull_request' diff --git a/rust/otap-dataflow/CONTRIBUTING.md b/rust/otap-dataflow/CONTRIBUTING.md index f43fb27570..41ab304aa3 100644 --- a/rust/otap-dataflow/CONTRIBUTING.md +++ b/rust/otap-dataflow/CONTRIBUTING.md @@ -10,6 +10,14 @@ project-level [CONTRIBUTING][] document. Run `cargo xtask check` to check the structure of the project. +## Telemetry and logging + +All internal logging MUST use the `otel_*` macros from `otap_df_telemetry` +(not `tracing::info!` or `println!`). See the +[Events Guide](docs/telemetry/events-guide.md) details. + +TODO: Add metrics information + ## Building a Docker image Run `docker build --build-context otel-arrow=../../ -f Dockerfile -t df_engine .` diff --git a/rust/otap-dataflow/docs/telemetry/events-guide.md b/rust/otap-dataflow/docs/telemetry/events-guide.md index f096e39a67..db74fa8df6 100644 --- a/rust/otap-dataflow/docs/telemetry/events-guide.md +++ b/rust/otap-dataflow/docs/telemetry/events-guide.md @@ -36,13 +36,89 @@ Exception rule (traces): - If you are recording an actual exception on a span, the regular event name MUST be `exception` and the standard exception attributes MUST be used. -In this project, events are preferred to unstructured logs. Event names are -codified (see below), and their attributes consist of the attributes of the -relevant entity or entities (stable context), combined with event-specific -attributes (dynamic context). +## How to emit events in code -Treat event names as schema identifiers. Evolution rules are defined in -[Stability and Compatibility Guide](stability-compatibility-guide.md). +All events MUST be emitted using the `otel_*` macros from the +`otap_df_telemetry` crate. **Do not** use `tracing::info!`, `log::info!`, or +`println!` directly. This rule is enforced by +`scripts/check-direct-telemetry-macros.sh` (run in CI). + +**Why wrappers instead of raw `tracing` macros?** + +- **Mandatory event name.** The first argument to every `otel_*!` macro is the + event name. Raw `tracing` macros do not require one, and their default name + includes the file path and line number -- which is not durable and breaks + filtering, alerting, and dashboards whenever code is moved or reformatted. +- **Automatic `target`.** The wrappers set the tracing `target` field to the + crate name (`env!("CARGO_PKG_NAME")`) automatically. With raw `tracing` + macros the default target is the module path, which is an internal + implementation detail and can change without notice. + +### Available macros + +| Macro | Severity | +|---|---| +| `otel_debug!` | DEBUG | +| `otel_info!` | INFO | +| `otel_warn!` | WARN | +| `otel_error!` | ERROR | + +### Basic usage + +The first argument is always the **event name** (a string literal). Optional +key-value pairs follow as structured attributes. + +```rust +use otap_df_telemetry::otel_info; + +// Event name only (no attributes): +otel_info!("pipeline.run.start"); + +// Event name with attributes: +otel_info!("receiver.grpc.start", + endpoint = %addr, +); +``` + +### The `message` attribute + +Use an attribute named **`message`** when the event name alone is not sufficient +to convey what happened. This value is mapped to the OTel LogRecord **body**, +making it the primary text shown in log viewers, consoles, and observability UIs. + +Not every event needs a `message` -- if the event name is self-explanatory, +omit it. Avoid messages that just restate the event name; they add no value. + +```rust +// Bad -- message just restates the event name: +otel_info!("pipeline.run.start", + message = "Pipeline run started", +); + +// Good -- event name says it all, no message needed: +otel_info!("pipeline.run.start"); + +// Good -- message explains consequences beyond what the event name conveys: +otel_warn!("core_affinity.set_failed", + message = "Failed to set core affinity for pipeline thread. Performance may be less predictable.", +); +``` + +### Attribute formatting + +The macros support `tracing`-style formatting hints: + +- `%value` -- Display formatting (`fmt::Display`) +- `?value` -- Debug formatting (`fmt::Debug`) +- `value` -- passed directly (integers, booleans, etc.) + +```rust +otel_info!("node.connect", + endpoint = %addr, + config = ?node_config, + count = 42, +); +``` ## Event naming @@ -71,7 +147,8 @@ Where: - `` is the action or occurrence (e.g. `start`, `complete`, `fail`, `reload`, `shutdown`). -Note: OpenTelemetry Events are represented as LogRecords with an event name. +Note: OpenTelemetry Events are represented as LogRecords with an +[event name](https://github.com/open-telemetry/opentelemetry-specification/blob/v1.50.0/specification/logs/data-model.md#field-eventname). In OTLP, this is carried in the LogRecord `event_name` field (not in the body). ## Attributes and context diff --git a/rust/otap-dataflow/scripts/check-direct-telemetry-macros.sh b/rust/otap-dataflow/scripts/check-direct-telemetry-macros.sh new file mode 100755 index 0000000000..c242b22758 --- /dev/null +++ b/rust/otap-dataflow/scripts/check-direct-telemetry-macros.sh @@ -0,0 +1,111 @@ +#!/bin/bash +# Script to detect direct use of tracing/log macros instead of otel_* wrappers. +# All events MUST be emitted using the otel_* macros from otap_df_telemetry. +# See docs/telemetry/events-guide.md +# +# Usage: ./scripts/check-direct-telemetry-macros.sh + +set -e + +echo "🔍 Checking for direct use of tracing/log macros (should use otel_* macros instead)..." + +# Function to check for a banned pattern in .rs files +check_banned_pattern() { + local pattern="$1" + local description="$2" + local files + + # Find all .rs files, excluding: + # - target/ build artifacts + # - .git/ git internals + # - crates/telemetry/ implements the otel_* wrappers (legitimate tracing usage) + # - crates/quiver/ will move out of this repo (temporary exemption) + # - crates/quiver-e2e/ quiver end-to-end tests (temporary exemption) + # - benchmarks/ benchmarks may legitimately use tracing directly + files=$(find . -name "*.rs" -type f \ + | grep -v target/ \ + | grep -v .git/ \ + | grep -v crates/telemetry/ \ + | grep -v crates/quiver/ \ + | grep -v crates/quiver-e2e/ \ + | grep -v benchmarks/ \ + || true) + + if [ -z "$files" ]; then + return 0 + fi + + local matches + matches=$(echo "$files" | xargs grep -n "$pattern" 2>/dev/null || true) + + if [ -n "$matches" ]; then + echo "âš ī¸ Found direct $description:" + echo "$matches" + echo "" + return 1 + fi + return 0 +} + +checks_passed=0 + +# Check for fully-qualified tracing macro calls +if ! check_banned_pattern "tracing::info!" "tracing::info! usage (use otel_info! instead)"; then + checks_passed=1 +fi + +if ! check_banned_pattern "tracing::warn!" "tracing::warn! usage (use otel_warn! instead)"; then + checks_passed=1 +fi + +if ! check_banned_pattern "tracing::error!" "tracing::error! usage (use otel_error! instead)"; then + checks_passed=1 +fi + +if ! check_banned_pattern "tracing::debug!" "tracing::debug! usage (use otel_debug! instead)"; then + checks_passed=1 +fi + +if ! check_banned_pattern "tracing::trace!" "tracing::trace! usage (use otel_debug! instead)"; then + checks_passed=1 +fi + +# Check for fully-qualified log crate macro calls +if ! check_banned_pattern "log::info!" "log::info! usage (use otel_info! instead)"; then + checks_passed=1 +fi + +if ! check_banned_pattern "log::warn!" "log::warn! usage (use otel_warn! instead)"; then + checks_passed=1 +fi + +if ! check_banned_pattern "log::error!" "log::error! usage (use otel_error! instead)"; then + checks_passed=1 +fi + +if ! check_banned_pattern "log::debug!" "log::debug! usage (use otel_debug! instead)"; then + checks_passed=1 +fi + +if ! check_banned_pattern "log::trace!" "log::trace! usage (use otel_debug! instead)"; then + checks_passed=1 +fi + +if [ $checks_passed -eq 0 ]; then + echo "✅ No direct tracing/log macro usage found!" + echo "â„šī¸ All events use otel_* macros as required by docs/telemetry/events-guide.md." +else + echo "❌ Found direct tracing/log macro usage. Use otel_* macros from otap_df_telemetry instead." + echo "" + echo "How to fix:" + echo " â€ĸ tracing::info!(...) → otel_info!(\"event.name\", ...)" + echo " â€ĸ tracing::warn!(...) → otel_warn!(\"event.name\", ...)" + echo " â€ĸ tracing::error!(...) → otel_error!(\"event.name\", ...)" + echo " â€ĸ tracing::debug!(...) → otel_debug!(\"event.name\", ...)" + echo "" + echo " Import from otap_df_telemetry:" + echo " use otap_df_telemetry::otel_info;" + echo "" + echo " See docs/telemetry/events-guide.md for full details." + exit 1 +fi