Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running on multiple AppEngine instances #219

Open
mjvankampen opened this issue Dec 1, 2022 · 7 comments
Open

Running on multiple AppEngine instances #219

mjvankampen opened this issue Dec 1, 2022 · 7 comments
Labels
priority: p3 question Further information is requested

Comments

@mjvankampen
Copy link

mjvankampen commented Dec 1, 2022

I've been setting up OTel metrics with Google Cloud Monitoring for our Django app running on AppEngine.

I used code that looked like this

metrics_exporter = CloudMonitoringMetricsExporter(
    project_id=GOOGLE_CLOUD_PROJECT, add_unique_identifier=True
)
metric_reader = PeriodicExportingMetricReader(
    exporter=metrics_exporter, export_interval_millis=5000
)

resource = Resource.create(
    {
        "service.name": env("GAE_SERVICE", default="cx-api"),
        "service.namespace": "Our Platform",
        "service.instance.id": env("GAE_INSTANCE", default="local"),
        "service.version": env("GAE_VERSION", default="local"),
    }
)
tracer_provider = TracerProvider(resource=resource)
tracer_provider.add_span_processor(BatchSpanProcessor(trace_exporter))
trace.set_tracer_provider(tracer_provider)

metrics.set_meter_provider(
    MeterProvider(
        metric_readers=[metric_reader],
        # As GCP only allows writing to a timeseries once per second we need to make sure every instance has a different
        # series by setting a unique instance id
        resource=resource,
    )
)

I thought that by using a unique instance ID I would get around the
One or more TimeSeries could not be written: One or more points were written more frequently than the maximum sampling period configured for the metric
issue. But it seems I really need to use add_unique_identifier=True. While I understand this for multithreaded applications or multiple exporters for the same resource, I don't understand it if the resource is already unique.

@mjvankampen
Copy link
Author

Even with add_unique_identifier set to true sometimes the error still pops up

@mjvankampen
Copy link
Author

I think I found why it still sometimes happens. This coincides with a scale-down of an AppEngine instance. This kind of makes sense if metrics are flushed on shutdown.

@aabmass
Copy link
Collaborator

aabmass commented Jan 19, 2023

Is your app being run in multiple processes e.g. gunicorn pre-fork model?

If not, then the shutdown final flush is the likely culprit. The shortest allowed interval to publish metrics to Cloud Monitoring is 5s. If you're app flushes metrics before shutdown and the previous export ran within the last 5s you can see this error. Do you see any issues in your dashboards/metrics?

@aabmass aabmass added question Further information is requested priority: p3 labels Jan 19, 2023
@nsaef
Copy link

nsaef commented Apr 13, 2023

Hey, I hope it's okay to latch onto this question. What would the answer be if the app was run in gunicorn? I'm currently testing exporting metrics to Cloud Monitoring and during local development (with a single-process flask development server), everything works fine. But when I deploy to Cloud Run (gunicorn with multiple workers), I frequently get One or more TimeSeries could not be written: One or more points were written more frequently than the maximum sampling period configured for the metric.

I'm setting up the reader/exporter with add_unique_identifier=True, like this:

PeriodicExportingMetricReader(
    CloudMonitoringMetricsExporter(add_unique_identifier=True),
    export_interval_millis=5000,
)

Any tips on how to avoid this? Thanks a lot!

@kendra-human
Copy link

kendra-human commented May 11, 2023

Hi! Thanks for the lib. I see a similar issue:

One or more TimeSeries could not be written: Points must be written in order. One or more of the points specified had an older start time than the most recent point

This is a uvicorn app on Cloud Run with 2 containers, running 2 workers each.

A minimal repro of the code:

exporter = CloudMonitoringMetricsExporter(project_id=gcp_project_id)
reader = PeriodicExportingMetricReader(exporter)
detector = GoogleCloudResourceDetector(raise_on_error=True)
provider = MeterProvider(metric_readers=[reader], resource=detector.detect())
meter = provider.get_meter(name="api")
# The meter object is injected into multiple classes, each which create their own instruments, e.g.:
latency = meter.create_histogram(name="api.latency", unit="ms")

@aabmass
Copy link
Collaborator

aabmass commented Jul 17, 2023

I'd really like to fix this in an automatic way but not sure the best way to go about it. Are you able to defer the initialization of OpenTelemetry MeterProvider until after the workers start (post-fork)?

@nioncode
Copy link

nioncode commented Oct 28, 2024

We run on Cloud Run and also get the error Points must be written in order. One or more of the points specified had an older start time than the most recent point..

This only gets fixed by using CloudMonitoringMetricsExporter(add_unique_identifier=True). This seems weird, since our resource should be set up with unique labels, since we have 1) the Cloud Run instance id and 2) the process id:

        resource = get_aggregated_resources(
            [GoogleCloudResourceDetector(raise_on_error=True)],
            initial_resource=Resource.create(attributes={
                SERVICE_INSTANCE_ID: f"worker-{os.getpid()}",
            }),
        )

        gcp_monitoring_exporter = CloudMonitoringMetricsExporter(
            add_unique_identifier=True
        )
        metric_reader = PeriodicExportingMetricReader(gcp_monitoring_exporter)
        meter_provider = MeterProvider(
            metric_readers=[metric_reader],
            resource=resource,
        )
        metrics.set_meter_provider(meter_provider)

Is it expected that you always need to set add_unique_identifier=True or how do we know when this is really required (and does it do any harm?)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p3 question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants