Running on multiple AppEngine instances #219

mjvankampen · 2022-12-01T07:48:35Z

I've been setting up OTel metrics with Google Cloud Monitoring for our Django app running on AppEngine.

I used code that looked like this

metrics_exporter = CloudMonitoringMetricsExporter(
    project_id=GOOGLE_CLOUD_PROJECT, add_unique_identifier=True
)
metric_reader = PeriodicExportingMetricReader(
    exporter=metrics_exporter, export_interval_millis=5000
)

resource = Resource.create(
    {
        "service.name": env("GAE_SERVICE", default="cx-api"),
        "service.namespace": "Our Platform",
        "service.instance.id": env("GAE_INSTANCE", default="local"),
        "service.version": env("GAE_VERSION", default="local"),
    }
)
tracer_provider = TracerProvider(resource=resource)
tracer_provider.add_span_processor(BatchSpanProcessor(trace_exporter))
trace.set_tracer_provider(tracer_provider)

metrics.set_meter_provider(
    MeterProvider(
        metric_readers=[metric_reader],
        # As GCP only allows writing to a timeseries once per second we need to make sure every instance has a different
        # series by setting a unique instance id
        resource=resource,
    )
)

I thought that by using a unique instance ID I would get around the
One or more TimeSeries could not be written: One or more points were written more frequently than the maximum sampling period configured for the metric
issue. But it seems I really need to use add_unique_identifier=True. While I understand this for multithreaded applications or multiple exporters for the same resource, I don't understand it if the resource is already unique.

The text was updated successfully, but these errors were encountered:

mjvankampen · 2022-12-01T12:48:01Z

Even with add_unique_identifier set to true sometimes the error still pops up

mjvankampen · 2022-12-02T19:22:22Z

I think I found why it still sometimes happens. This coincides with a scale-down of an AppEngine instance. This kind of makes sense if metrics are flushed on shutdown.

aabmass · 2023-01-19T20:56:05Z

Is your app being run in multiple processes e.g. gunicorn pre-fork model?

If not, then the shutdown final flush is the likely culprit. The shortest allowed interval to publish metrics to Cloud Monitoring is 5s. If you're app flushes metrics before shutdown and the previous export ran within the last 5s you can see this error. Do you see any issues in your dashboards/metrics?

nsaef · 2023-04-13T11:53:03Z

Hey, I hope it's okay to latch onto this question. What would the answer be if the app was run in gunicorn? I'm currently testing exporting metrics to Cloud Monitoring and during local development (with a single-process flask development server), everything works fine. But when I deploy to Cloud Run (gunicorn with multiple workers), I frequently get One or more TimeSeries could not be written: One or more points were written more frequently than the maximum sampling period configured for the metric.

I'm setting up the reader/exporter with add_unique_identifier=True, like this:

PeriodicExportingMetricReader(
    CloudMonitoringMetricsExporter(add_unique_identifier=True),
    export_interval_millis=5000,
)

Any tips on how to avoid this? Thanks a lot!

kendra-human · 2023-05-11T03:24:50Z

Hi! Thanks for the lib. I see a similar issue:

One or more TimeSeries could not be written: Points must be written in order. One or more of the points specified had an older start time than the most recent point

This is a uvicorn app on Cloud Run with 2 containers, running 2 workers each.

A minimal repro of the code:

exporter = CloudMonitoringMetricsExporter(project_id=gcp_project_id)
reader = PeriodicExportingMetricReader(exporter)
detector = GoogleCloudResourceDetector(raise_on_error=True)
provider = MeterProvider(metric_readers=[reader], resource=detector.detect())
meter = provider.get_meter(name="api")
# The meter object is injected into multiple classes, each which create their own instruments, e.g.:
latency = meter.create_histogram(name="api.latency", unit="ms")

aabmass · 2023-07-17T19:01:09Z

I'd really like to fix this in an automatic way but not sure the best way to go about it. Are you able to defer the initialization of OpenTelemetry MeterProvider until after the workers start (post-fork)?

nioncode · 2024-10-28T11:22:19Z

We run on Cloud Run and also get the error Points must be written in order. One or more of the points specified had an older start time than the most recent point..

This only gets fixed by using CloudMonitoringMetricsExporter(add_unique_identifier=True). This seems weird, since our resource should be set up with unique labels, since we have 1) the Cloud Run instance id and 2) the process id:

        resource = get_aggregated_resources(
            [GoogleCloudResourceDetector(raise_on_error=True)],
            initial_resource=Resource.create(attributes={
                SERVICE_INSTANCE_ID: f"worker-{os.getpid()}",
            }),
        )

        gcp_monitoring_exporter = CloudMonitoringMetricsExporter(
            add_unique_identifier=True
        )
        metric_reader = PeriodicExportingMetricReader(gcp_monitoring_exporter)
        meter_provider = MeterProvider(
            metric_readers=[metric_reader],
            resource=resource,
        )
        metrics.set_meter_provider(meter_provider)

Is it expected that you always need to set add_unique_identifier=True or how do we know when this is really required (and does it do any harm?)?

aabmass added question Further information is requested priority: p3 labels Jan 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running on multiple AppEngine instances #219

Running on multiple AppEngine instances #219

mjvankampen commented Dec 1, 2022 •

edited

Loading

mjvankampen commented Dec 1, 2022

mjvankampen commented Dec 2, 2022

aabmass commented Jan 19, 2023

nsaef commented Apr 13, 2023

kendra-human commented May 11, 2023 •

edited

Loading

aabmass commented Jul 17, 2023

nioncode commented Oct 28, 2024 •

edited

Loading

Running on multiple AppEngine instances #219

Running on multiple AppEngine instances #219

Comments

mjvankampen commented Dec 1, 2022 • edited Loading

mjvankampen commented Dec 1, 2022

mjvankampen commented Dec 2, 2022

aabmass commented Jan 19, 2023

nsaef commented Apr 13, 2023

kendra-human commented May 11, 2023 • edited Loading

aabmass commented Jul 17, 2023

nioncode commented Oct 28, 2024 • edited Loading

mjvankampen commented Dec 1, 2022 •

edited

Loading

kendra-human commented May 11, 2023 •

edited

Loading

nioncode commented Oct 28, 2024 •

edited

Loading