Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Synchrotron metric collection appears to fail if large numbers of background tasks present. #7596

Closed
michaelkaye opened this issue May 28, 2020 · 1 comment · Fixed by #7597
Assignees

Comments

@michaelkaye
Copy link
Contributor

michaelkaye commented May 28, 2020

2020-05-28 21:35:06,821 - twisted - 192 - ERROR -  - ----------------------------------------
2020-05-28 21:35:06,837 - twisted - 192 - ERROR -  - Exception happened during processing of request from ('MONITORING-HOST', remote-port)
2020-05-28 21:35:06,877 - twisted - 192 - ERROR -  - Traceback (most recent call last):
2020-05-28 21:35:06,895 - twisted - 192 - ERROR -  -   File "/usr/local/lib/python3.7/socketserver.py", line 650, in process_request_thread
2020-05-28 21:35:06,922 - twisted - 192 - ERROR -  -     self.finish_request(request, client_address)
2020-05-28 21:35:06,948 - twisted - 192 - ERROR -  -   File "/usr/local/lib/python3.7/socketserver.py", line 360, in finish_request
2020-05-28 21:35:06,968 - twisted - 192 - ERROR -  -     self.RequestHandlerClass(request, client_address, self)
2020-05-28 21:35:06,982 - twisted - 192 - ERROR -  -   File "/usr/local/lib/python3.7/socketserver.py", line 720, in __init__
2020-05-28 21:35:07,013 - twisted - 192 - ERROR -  -     self.handle()
2020-05-28 21:35:07,041 - twisted - 192 - ERROR -  -   File "/usr/local/lib/python3.7/http/server.py", line 426, in handle
2020-05-28 21:35:07,056 - twisted - 192 - ERROR -  -     self.handle_one_request()
2020-05-28 21:35:07,080 - twisted - 192 - ERROR -  -   File "/usr/local/lib/python3.7/http/server.py", line 414, in handle_one_request
2020-05-28 21:35:07,100 - twisted - 192 - ERROR -  -     method()
2020-05-28 21:35:07,107 - twisted - 192 - ERROR -  -   File "/home/synapse/src/synapse/metrics/_exposition.py", line 205, in do_GET
2020-05-28 21:35:07,114 - twisted - 192 - ERROR -  -     output = generate_latest(registry, emit_help=emit_help)
2020-05-28 21:35:07,130 - twisted - 192 - ERROR -  -   File "/home/synapse/src/synapse/metrics/_exposition.py", line 116, in generate_latest
2020-05-28 21:35:07,139 - twisted - 192 - ERROR -  -     for metric in registry.collect():
2020-05-28 21:35:07,163 - twisted - 192 - ERROR -  -   File "/home/synapse/src/synapse/metrics/__init__.py", line 59, in collect
2020-05-28 21:35:07,175 - twisted - 192 - ERROR -  -     for metric in REGISTRY.collect():
2020-05-28 21:35:07,196 - twisted - 192 - ERROR -  -   File "/home/synapse/env-py37/lib/python3.7/site-packages/prometheus_client/registry.py", line 75, in collect
2020-05-28 21:35:07,216 - twisted - 192 - ERROR -  -     for metric in collector.collect():
2020-05-28 21:35:07,238 - twisted - 192 - ERROR -  -   File "/home/synapse/src/synapse/metrics/background_process_metrics.py", line 120, in collect
2020-05-28 21:35:07,260 - twisted - 192 - ERROR -  -     process.update_metrics()
2020-05-28 21:35:07,289 - twisted - 192 - ERROR -  -   File "/home/synapse/src/synapse/metrics/background_process_metrics.py", line 155, in update_metrics
2020-05-28 21:35:07,305 - twisted - 192 - ERROR -  -     _background_process_ru_utime.labels(self.desc).inc(diff.ru_utime)
2020-05-28 21:35:07,320 - twisted - 192 - ERROR -  -   File "/home/synapse/env-py37/lib/python3.7/site-packages/prometheus_client/metrics.py", line 243, in inc
2020-05-28 21:35:07,326 - twisted - 192 - ERROR -  -     raise ValueError('Counters can only be incremented by non-negative amounts.')
2020-05-28 21:35:07,334 - twisted - 192 - ERROR -  - ValueError: Counters can only be incremented by non-negative amounts.
2020-05-28 21:35:07,341 - twisted - 192 - ERROR -  - ----------------------------------------

At the time this fired, the background task count for this process was through the roof (150,000 +)

@erikjohnston
Copy link
Member

This causes pages so we should do something here.

I think this is likely due to the replication RDATA linearizer so we could do something there to stop bg tasks stacking up, or we could try and figure out a way of fixing the more general problem.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants