-
-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
high cpu usage with only two services managed #272
Comments
I'll add a comment here. I have several VMs with Sablier. When I restart Sablier, its cpu consumption returns to normal. |
Hello @kraoc, do you have more inputs on that ? If you have maybe CPU usage graphs, or specific events after which you noticed that ? I will investigate this issue for sure when I have time, but more input to narrow down the issue will help! |
I'll try to add more usefull debug tips :p My first think point to a potential leak on first start (eg. on a vm boot) ; because after a restart of sablier it's cpu consumption is ok. I'm not talking of Traefik plugin but acouvreur/sablier:latest (maybe a precision) |
I have the same issue. I am running in Docker Desktop on Ubuntu, with the CPU limit under "Resources" set to 4. After restarting the container the CPU usage is almost zero. But about 4 minutes later it ramps up to just over 100% (the maximum available is 400%). This happens whether or not I send any requests through Sablier. The usage is coming from the main Sablier container, not the proxy (I use Caddy). It seems to work fine even when the CPU usage is high. I can't see anything useful in the logs, but let me know if there's anything else I can provide. |
I still have the bug too, even with the last version, always same behavior, after restart its cool. And when I hit one of the endpoint manage by sablier then the process consumne lot of cpu. |
I'd be curious to have more information to try to reproduce the issue.
I've tried to reproduce the issue but even after multiple calls, I cannot go over 0.3% CPU usage. I'm currently running a version of the program with CPU profile, memory profiling and trace profiling. If I can better reproduce your use cases, I'll be able to better understand why it behaves this way. Thanks! |
Hi @acouvreur, For me same It's always #272 (comment) but with last sablier version. Some self-hosted service on a synology DS918+ with an Intel(R) Celeron(R) CPU J3455 @ 1.50GHz. I have around 60 containers running, if the docker events flow matter.
I have config in config in dynamic-config.toml and some labels in my docker-compose.yaml Here few extract, I have 4 endpoint managed by sablier, I try playing with timeout and session time, for sure, sometimes, the problem doesnt trigger instant, but often this happens instantly. dynamic-config.toml
docker-compose.yml
|
I've done the following experiment:
docker run -d -l sablier.enable=true --name whoami containous/whoami
docker run -v '/var/run/docker.sock:/var/run/docker.sock' --cpu-shares=1024 -m 256m --name sablier -p 10000:10000 acouvreur/sablier:1.7.0 start --logging.level=trace
docker stats
for i in {1..500}; do
curl "http://localhost:10000/api/strategies/dynamic?names=whoami&session_duration=30s" &
curl "http://localhost:10000/api/strategies/blocking?names=whoami&session_duration=30s" &
curl "http://localhost:10000/api/strategies/blocking?group=default&session_duration=30s" &
curl "http://localhost:10000/api/strategies/dynamic?group=default&session_duration=30s" &
done This will do 2k requests with names and group matching with both strategies. Recording.2024-06-27.145655.1.mp4Still I cannot reproduce the CPU usage. Note that I've set the same resource restrictions that you have. |
I've done some testing on my setup, and I can only reproduce it when I use a docker socket proxy. Looks like there's another issue open in the same area: |
Ok, I confess I cleanup my config a little for pasting it in the previous post, and yes, I use a cetusguard proxy too ! Good catch ! |
Without doubt it works well in your lab, so I understand i'm facing docker socket proxy issue too. |
I can try with the docker socket proxy. It might be related to Sablier listening for incoming events. Can you share your setup with a docker socket proxy ? |
|
So, I've tried with the docker socket proxy with the following setup: version: "3.7"
services:
cetusguard-sablier:
container_name: cetusguard-sablier
image: docker.io/hectorm/cetusguard
read_only: true
security_opt:
- no-new-privileges:true
volumes:
- "/var/run/docker.sock:/var/run/docker.sock:ro"
environment:
CETUSGUARD_BACKEND_ADDR: "unix:///var/run/docker.sock"
CETUSGUARD_FRONTEND_ADDR: "tcp://:2375"
CETUSGUARD_RULES: |
! Monitor events
GET %API_PREFIX_EVENTS%
! Get system information
GET %API_PREFIX_INFO%
! Get data usage information
GET %API_PREFIX_SYSTEM%
! List containers
GET %API_PREFIX_CONTAINERS%/json
! Inspect a container
GET %API_PREFIX_CONTAINERS%/%CONTAINER_ID_OR_NAME%/json
! Start stop container
POST %API_PREFIX_CONTAINERS%/%CONTAINER_ID_OR_NAME%/start
POST %API_PREFIX_CONTAINERS%/%CONTAINER_ID_OR_NAME%/stop
GET %API_PREFIX_CONTAINERS%/%CONTAINER_ID_OR_NAME%/stats
CETUSGUARD_LOG_LEVEL: "2"
sablier:
image: acouvreur/sablier:1.7.0
ports:
- 10000:10000
environment:
- PROVIDER_NAME=docker
- DOCKER_HOST=tcp://cetusguard-sablier:2375
whoami:
image: containous/whoami:v1.5.0
labels:
- sablier.enable=true And still after a lot of http calls:
|
Docker sock config and proxy config are both commented - uncomment one at a time. Also docker.sock.raw is for Docker Desktop, use docker.sock otherwise name: sablier2
services:
proxy2:
restart: always
image: caddy:2.8.4-with-sablier # built as per sablier docs
networks:
- network2
ports:
- "8000:80" # whoami
volumes:
- ./Caddyfile:/etc/caddy/Caddyfile:ro
whoami2:
restart: none
image: traefik/whoami
networks:
- network2
labels:
- sablier.enable=true
- sablier.group=demo
sablier2:
restart: always
image: acouvreur/sablier
networks:
- network2
- sablier-socket-network2
# volumes:
# - '/var/run/docker.sock.raw:/var/run/docker.sock'
# environment:
# - DOCKER_HOST=tcp://socket-proxy-sablier2:2375
socket-proxy-sablier2:
image: lscr.io/linuxserver/socket-proxy:latest
environment:
- ALLOW_START=1
- ALLOW_STOP=1
- ALLOW_RESTARTS=1
- CONTAINERS=1
- EVENTS=1
- INFO=1
- NETWORKS=1
- PING=1
- POST=1
- SERVICES=1
- VERSION=1
volumes:
- /var/run/docker.sock.raw:/var/run/docker.sock:ro
restart: always
read_only: true
tmpfs:
- /run
networks:
- sablier-socket-network2
networks:
network2:
name: sablier-network2
sablier-socket-network2:
name: sablier-socket-network2 |
I can create a discord server for the project so we can actually debug this together, what do you think ? I need something with voice channels, do you have any suggestions ? |
Ok, actually, thanks @luked34 I can actually reproduce the issue now!
|
I would have suggested discord, might be worth setting one up even if we don't use it for this issue |
I think I found the issue, but the issue seems to happen when the event stream returns a nil channel. In such case the event stream cannot be properly initialized and the I can make a fix to properly break out of the faulty event stream, however, this means that there is something wrong with your setup as we can't properly listen to the event stream. I'll publish a fix in the |
So basically what happens is the client is registered without making sure the docker connection is established. But it works afterwards because, by the time being, the connection was established. I've changed this behavior so sablier will actually exit if the connection cannot be established. You've all probably had some kind of race in which the connection was unavailable to the docker host at the time of registering the event stream. This ended up burning the CPU, for which I've also added a fail-sage. |
The fix is available in v1.7.1-beta.1, let me know if still happens. |
Maybe important points...
So, maybe in a vm restart condition Sablier (which depend on socket proxy) is started before the socket proxy. Sorry for late reply :p |
Thanks for the quick fix, it seems to be working. The log now shows two extra messages:
I've tried adding a depends_on to the compose file, restarting Sablier while leaving the socket proxy running, and changing the socket proxy permissions to allow everything, but the messages still appear. CPU usage is consistently below 1% now, so I'm happy to ignore the messages :) |
Maybe using willfarrell/autoheal with setting to unhealthy Sablier container when no socket proxy available should maid the trick ? |
I will check again later but at first glance it's far away better! Thank you! |
When Sablier cannot connect to the docker host, it should fail and exit. If you simply add the option |
Already has always :p |
First thanks for this cool toy 😃
Describe the bug
High CPU usage
Context
Expected behavior
Low cpu usage
Additional context
I manage 2 services in a big docker-compose files with sablier, and when I do a top I can see sablier process very hard with my tiny cpu.
When I restart it, the process become cool, until an unknown event make it hard with the cpu.
Is it a know issue ? I can see there is a full rewrite PR in progress, this can help ? (I didnt dig in the code actually)
The text was updated successfully, but these errors were encountered: