Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenTelemetry support #104

Merged
merged 8 commits into from
Jun 28, 2024
Merged

OpenTelemetry support #104

merged 8 commits into from
Jun 28, 2024

Conversation

JoshMock
Copy link
Member

@JoshMock JoshMock commented Jun 18, 2024

Adds automatic instrumentation for OpenTelemetry, tracking the lifecycle of each Elasticsearch request. Follows all documented semantic conventions for Elasticsearch, excluding db.query.text, which we do not have a simple way to sanitize.

If someone wants to start shipping request spans to an OpenTelemetry endpoint without making any code changes, they must:

  • Add @opentelemetry/api and @opentelemetry/auto-instrumentations-node as Node.js dependencies

  • Add appropriate environment variable values for OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_EXPORTER_OTLP_HEADERS, OTEL_RESOURCE_ATTRIBUTES and OTEL_SERVICE_NAME

  • require the auto instrumentation registration package at run time:

    node --require '@opentelemetry/auto-instrumentations-node/register' index.js

Full documentation will be included in a separate PR to elastic/elasticsearch-js soon.

This change will depend on an improvement to elastic/elasticsearch-js to include an optional meta object when calling transport.request(...), which will include both the endpoint name (db.operation.name) and any dynamic values in the path (db.elasticsearch.path_parts.<key>).

See elastic/elasticsearch-js#2267

@estolfo
Copy link

estolfo commented Jun 20, 2024

I would suggest also asking @david-luna and @trentm for a review.

@JoshMock JoshMock requested review from trentm and david-luna June 20, 2024 16:22
Copy link
Member

@david-luna david-luna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the PR looks good. One minor issue I think we should discuss is there is a package which holds attribute names and values (if they are enums). The package is @opentelemetry/semantic-conventions.

I'm hesitant to request for changes since that package is in the process of a major update and may probably break this instrumentation if we use the package an try to update. Another option is tu use hardcoded strings for now and use the package when ready.

In both cases we I think wee need a tracking issue to action when the new version of the semantic conventions package is ready.

Thoughts?

src/Transport.ts Outdated Show resolved Hide resolved
Copy link
Member

@trentm trentm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a first pass. I will look again a bit more this afternoon.

package.json Outdated
@@ -65,10 +66,15 @@
"tslib": "^2.4.0",
"undici": "^6.12.0"
},
"peerDependencies": {
"@opentelemetry/api": "1.x",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't there be an entry in "dependencies" for @opentelemetry/api, as it is imported/required by the using code?

I suppose using peerDeps typically works for modern npm, as peerDeps are installed by default (https://docs.npmjs.com/cli/v10/configuring-npm/package-json#peerdependencies). I say "typically" because some projects unfortunately use the legacy-peer-deps npm config option (https://docs.npmjs.com/cli/v10/using-npm/config#legacy-peer-deps) which results in peerDeps being ignored. The result, in this case, is an @elastic/transport that breaks. Here is a smallish repro of that breakage:

% cat package.json
{
  "name": "asdf.20240624t094529",
  "version": "1.0.0",
  "dependencies": {
    "@elastic/transport": "git+https://github.com/elastic/elastic-transport-js.git#otel"
  }
}

% npm install --legacy-peer-deps
...

% npm ls -a
npm ERR! code ELSPROBLEMS
npm ERR! invalid: @elastic/[email protected] /Users/trentm/tmp/asdf.20240624T094529/node_modules/@elastic/transport
npm ERR! missing: @opentelemetry/[email protected], required by @elastic/[email protected]
npm ERR! missing: @opentelemetry/[email protected], required by @elastic/[email protected]
[email protected] /Users/trentm/tmp/asdf.20240624T094529
└─┬ @elastic/[email protected] invalid: "git+https://github.com/elastic/elastic-transport-js.git#otel" from the root project
  ├── UNMET DEPENDENCY @opentelemetry/[email protected]
  ├── UNMET DEPENDENCY @opentelemetry/[email protected]
  ├─┬ [email protected]
  │ └── [email protected]
  ├── [email protected]
  ├── [email protected]
  ├── [email protected]
  ├── [email protected]
  └── [email protected]
...

Here I manually copy in the built JS code from a git working copy in a separate dir:

% rsync -av ~/el/elastic-transport-js2/lib/ node_modules/@elastic/transport/lib/

Then importing the package fails:

% node
Welcome to Node.js v18.18.2.
Type ".help" for more information.
> require('@elastic/transport')
Uncaught Error: Cannot find module '@opentelemetry/api'
Require stack:
- /Users/trentm/tmp/asdf.20240624T094529/node_modules/@elastic/transport/lib/Transport.js
- /Users/trentm/tmp/asdf.20240624T094529/node_modules/@elastic/transport/index.js
- <repl>
    at Module._resolveFilename (node:internal/modules/cjs/loader:1077:15)
    at Module._load (node:internal/modules/cjs/loader:922:27)
    at Module.require (node:internal/modules/cjs/loader:1143:19)
    at require (node:internal/modules/cjs/helpers:119:18) {
  code: 'MODULE_NOT_FOUND',
  requireStack: [
    '/Users/trentm/tmp/asdf.20240624T094529/node_modules/@elastic/transport/lib/Transport.js',
    '/Users/trentm/tmp/asdf.20240624T094529/node_modules/@elastic/transport/index.js',
    '<repl>'
  ]
}

I don't think there is a significant downside in taking the direct dependency on @opentelemetry/api, is there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. I'd initially intended to put the SDK in as a peer dependency, but forgot to clean that up. Fixed in ec782d6.

package.json Outdated Show resolved Hide resolved
src/Transport.ts Outdated Show resolved Hide resolved
src/Transport.ts Outdated Show resolved Hide resolved
src/Transport.ts Outdated Show resolved Hide resolved
src/Transport.ts Outdated Show resolved Hide resolved
src/Transport.ts Outdated Show resolved Hide resolved
src/Transport.ts Outdated Show resolved Hide resolved
src/Transport.ts Outdated Show resolved Hide resolved
@JoshMock
Copy link
Member Author

In both cases we I think wee need a tracking issue to action when the new version of the semantic conventions package is ready.

I've opened #108 and assigned it to myself to remind me to circle back to this later. Feel free to add any useful context to that issue if I missed anything.

@JoshMock JoshMock requested a review from trentm June 27, 2024 18:48
@JoshMock
Copy link
Member Author

@trentm thanks so much for all the useful feedback! I'd never used the OTel API before so your expertise is much appreciated. Please take another look and let me know what you think.

Copy link
Member

@trentm trentm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

Something to consider for later: configurability of the OTel instrumentation.
Similar to how the older diagnostics events can be configured via TransportOptions.diagnostic, there are a couple things that might be nice to be configurable:

  1. A way to disable this OTel instrumentation. Users of an OTel SDK will be somewhat used to the ability to disable particular instrumentations. Usually this config var is called enabled, in the context of an instrumentation.
  2. A way to suppress child HTTP spans that will be created under the Elasticsearch spans. There are a few existing OTel instrumentations that have this same situation, e.g.: instrumentation-aws-sdk, instrumentation-mongoose. Typically this config var is a boolean called suppressInternalInstrumentation. E.g. https://github.com/open-telemetry/opentelemetry-js-contrib/blob/main/plugins/node/opentelemetry-instrumentation-aws-sdk/README.md#aws-sdk-instrumentation-options

AFAIK there aren't any other Node.js libraries that have native OTel instrumentation like is being added here, so there isn't prior art on exactly what to name OTel-related config. Perhaps having:

export interface TransportOptions {
  opentelemetry?: {
    enabled?: boolean;
    suppressInternalInstrumentation?: boolean
  }

Should I create a separate issue for this?

@JoshMock
Copy link
Member Author

Should I create a separate issue for this?

That would be fantastic, thank you! As much context and prior art you can provide would be super helpful for me. This PR was definitely meant as a "bare minimum" OTel implementation to have something ready in time for 8.15, so there is plenty of opportunity to make enhancements down the road.

}
}

return await this[kOtelTracer].startActiveSpan(params.meta.name, { attributes, kind: SpanKind.CLIENT }, async (otelSpan: Span) => {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoshMock Hi there!

Why is elastic creating an otel span by default without Otel being activated/used? 🤔

I have not used --require '@opentelemetry/auto-instrumentations-node/register' but I can see while debugging that the otel spans are getting created.

Could you please explain to me why elastic is approaching this? Thanks so much!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the OpenTelemetry docs, the code will create no-op traces if the OpenTelemetry SDK has not been initialized. They'll still show up in the stack trace, but they won't go anywhere.

If you need to disable OTel collection for this process, you can set the environment variable OTEL_SDK_DISABLED=true.

@trentm
Copy link
Member

trentm commented Aug 15, 2024

I hadn't followed up on creating issues for the things I mentioned above in #104 (review)
I'll do that now.

@kirrg001 Having a disable option in the Elasticsearch client config would allow a way to disable Elasticsearch OTel instrumentation without having to fully disable the SDK, or add a SpanProcess or something to handle dropping those spans if they are undesired.

@trentm
Copy link
Member

trentm commented Aug 15, 2024

I hadn't followed up on creating issues for the things I mentioned above in #104 (review)
I'll do that now.

I take that back. I had followed up: the feature request issue is here: elastic/elasticsearch-js#2299

@trentm
Copy link
Member

trentm commented Aug 15, 2024

@kirrg001 The argument for having OTel instrumentation directly in a given library (so-called "native" instrumentation) is in the top-section here: https://opentelemetry.io/docs/concepts/instrumentation/libraries/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants