Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTLPMetricExporter forceFlush() doesn't flush metrics #5116

Closed
douglasg14b opened this issue Nov 6, 2024 · 3 comments · Fixed by #5126
Closed

OTLPMetricExporter forceFlush() doesn't flush metrics #5116

douglasg14b opened this issue Nov 6, 2024 · 3 comments · Fixed by #5126
Assignees
Labels
bug Something isn't working pkg:sdk-metrics priority:p2 Bugs and spec inconsistencies which cause telemetry to be incomplete or incorrect

Comments

@douglasg14b
Copy link

douglasg14b commented Nov 6, 2024

What happened?

Steps to Reproduce

  1. Have a long exportIntervalMillis
  2. Produce metrics
  3. try and collect/export/flush metrics

Expected Result

Metrics are flushed and exported

Actual Result

No metrics are exported at all

Additional Details

If I setup a very short exportIntervalMillis like 500ms and ensure the process waits a second or two before exiting I get metrics. This isn't a tenable solution though since we need to be able to force this flush for quickly running scripts.

OpenTelemetry Setup Code

/*instrumentation.ts*/
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-proto';
import { Resource } from '@opentelemetry/resources';
import { PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics';
import * as opentelemetry from '@opentelemetry/sdk-node';
import { ATTR_SERVICE_NAME, ATTR_SERVICE_VERSION } from '@opentelemetry/semantic-conventions';

import { OTEL_EXPORTER_OTLP_METRICS_ENDPOINT, OTEL_METRIC_EXPORT_INTERVAL, OTEL_SERVICE_NAME } from './environment';

const exporter = new OTLPMetricExporter({
    url: OTEL_EXPORTER_OTLP_METRICS_ENDPOINT,
    headers: {}, // an optional object containing custom headers to be sent with each request
});

const metricsReader = new PeriodicExportingMetricReader({
    exportIntervalMillis: OTEL_METRIC_EXPORT_INTERVAL,
    exporter: exporter,
});

export const telemetrySdk = new opentelemetry.NodeSDK({
    resource: new Resource({
        [ATTR_SERVICE_NAME]: OTEL_SERVICE_NAME,
        [ATTR_SERVICE_VERSION]: '1.0',
    }),
    metricReader: metricsReader,
    instrumentations: [getNodeAutoInstrumentations()],
});

export function startTelemetry() {
    telemetrySdk.start();
}

export async function stopTelemetry() {
    const collectResult = await metricsReader.collect();
    const metrics = collectResult.resourceMetrics; // Metrics are collected here, this isn't empty

    exporter.export(metrics, (result) => {
        console.log(result); // {code: 0 }
    });

    await exporter.forceFlush();
    await metricsReader.forceFlush();
    await telemetrySdk.shutdown(); // I thought this would flush, but it doesn't appear to, or has the same issue as the above?
}

package.json

{
    "name": "",
    "version": "1.47.3",
    "type": "module",
    "main": "src/infra/web/index.ts",
    "scripts": {
        "start:dev": "pnpm build:routes && dotenvx run --env-file=.env.development -- npx tsx --experimental-loader=@opentelemetry/instrumentation/hook.mjs --import ./telemetry.ts src/infra/web/index.ts",
        "build:routes": "tsoa -c tsoaConfig.json spec-and-routes",
    },
    "dependencies": {
        "@aws-sdk/client-connect": "catalog:aws-sdk",
        "@aws-sdk/client-dynamodb": "catalog:aws-sdk",
        "@aws-sdk/client-pinpoint": "catalog:aws-sdk",
        "@aws-sdk/client-s3": "catalog:aws-sdk",
        "@aws-sdk/client-ssm": "catalog:aws-sdk",
        "@aws-sdk/lib-dynamodb": "catalog:aws-sdk",
        "@aws-sdk/s3-request-presigner": "catalog:aws-sdk",
        "@aws-sdk/util-dynamodb": "catalog:aws-sdk",
        "@opentelemetry/sdk-node": "^0.54.0",
        "@opentelemetry/sdk-metrics": "^1.27.0",
        "@opentelemetry/instrumentation": "^0.54.0",
        "@opentelemetry/auto-instrumentations-node": "^0.52.0",
        "@opentelemetry/exporter-prometheus": "^0.54.0",
        "@opentelemetry/semantic-conventions": "^1.27.0",
        "@opentelemetry/resources": "^1.27.0",
        "@dotenvx/dotenvx": "^0.37.1",
        "@smithy/node-http-handler": "3.0.0",
        "@tsoa/runtime": "^6.2.1",
        "cors": "^2.8.5",
        "date-holidays": "^3.23.12",
        "dotenv": "*",
        "dynamodb-toolbox": "^0.9.5",
        "env-var": "^7.5.0",
        "express": "^4.19.2",
        "file-type": "^19.1.0",
        "fluent-ffmpeg": "^2.1.3",
        "jsonwebtoken": "^9.0.2",
        "ky": "^1.3.0",
        "libphonenumber-js": "^1.11.4",
        "lodash-es": "^4.17.21",
        "luxon": "^3.5.0",
        "module-alias": "^2.2.3",
        "tsoa": "^6.2.1",
        "type-fest": "^4.18.3",
        "typia": "^6.10.4",
        "uuid": "*"
    },
    "devDependencies": {
        "@types/jsonwebtoken": "^9.0.6",
        "@faker-js/faker": "^8.4.1",
        "@hyrious/esbuild-plugin-commonjs": "^0.2.4",
        "@types/chai": "^4.3.19",
        "@types/cors": "^2.8.17",
        "@types/express": "^4.17.21",
        "@types/fluent-ffmpeg": "^2.1.24",
        "@types/lodash-es": "^4.17.12",
        "@types/luxon": "^3.4.2",
        "@types/module-alias": "^2.0.4",
        "@types/supertest": "^6.0.2",
        "@types/uuid": "*",
        "aws-sdk-client-mock": "^4.0.1",
        "copyfiles": "^2.4.1",
        "deepmerge": "^4.3.1",
        "supertest": "^7.0.0",
        "ts-add-js-extension": "^1.6.4",
        "ts-patch": "^3.2.1",
        "ts-runtime-checks": "^0.6.2",
        "tsc-alias": "^1.8.10",
        "tsx": "^4.11.2",
        "typescript": "^5.6.2",
        "vitest-tsconfig-paths": "^3.4.1"
    },
    "_moduleAliases": {
        "@": "src"
    }
}

Relevant log output

No response

@douglasg14b douglasg14b added bug Something isn't working triage labels Nov 6, 2024
@douglasg14b douglasg14b changed the title OTLPMetricExporter forceFlush() doens't flush metrics OTLPMetricExporter forceFlush() doesn't flush metrics Nov 6, 2024
@pichlermarc
Copy link
Member

Hi @douglasg14b, thanks for reaching out. 🙂

I vaguely remember that there was some contention on this back in the day when we implemented this because it was not part of the spec, and we ended up with a MeterProvider#shutdown() that did not call MeterProvider#forceFlush().
It seems that all SDKs pretty much have this behavior but it's not in the spec yet.

The spec issue for this is still open: open-telemetry/opentelemetry-specification#2983
Reading through it it'll likely be fine if we change the behavior to also call MeterProvider#forceFlush(). It's just waiting for someone to drive the spec issue to completion and there seems to be consensus that calling it is the right thing to do.

I'm sure I completely understand the setup code from issue description - are you saying that await metricReader.forceFlush() also not export metrics in your case or is it just NodeSDK#shutdown() that's not behaving as expected? 🤔

@douglasg14b
Copy link
Author

douglasg14b commented Nov 7, 2024

Hi, thanks for the response. Correct, none of the above code actually flushes any metrics for me. The only way I can get metrics out of a process before it exits is to wait for at least the export interval.

    const collectResult = await metricsReader.collect();
    const metrics = collectResult.resourceMetrics; // <---- metrics exist here

    exporter.export(metrics, (result) => { // <--- No metrics are exported to the receiver
        console.log(result); // {code: 0 }
    });

    await exporter.forceFlush(); // <--- No metrics are exported to the receiver
    await metricsReader.forceFlush(); // <--- No metrics are exported to the receiver

@pichlermarc pichlermarc self-assigned this Nov 7, 2024
@pichlermarc
Copy link
Member

Ah, I think I see what's wrong now - I believe that there's a bug in the PeriodicExportingMetricReader here:

// Avoid scheduling a promise to make the behavior more predictable and easier to test
if (resourceMetrics.resource.asyncAttributesPending) {
resourceMetrics.resource
.waitForAsyncAttributes?.()
.then(doExport, err =>
diag.debug('Error while resolving async portion of resource: ', err)
)
.catch(globalErrorHandler);
} else {
await doExport();
}

If async attributes are pending, the promise is not awaited by and forceFlush() and the metrics will only be exported after the exporter has already been shut down. That should be an easy fix. I'll work a PR.

@pichlermarc pichlermarc added priority:p2 Bugs and spec inconsistencies which cause telemetry to be incomplete or incorrect and removed triage needs:author-response waiting for author to respond pkg:exporter-metrics-otlp-proto labels Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pkg:sdk-metrics priority:p2 Bugs and spec inconsistencies which cause telemetry to be incomplete or incorrect
Projects
None yet
2 participants