-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to create rules when container not started #41
Comments
I think it's possible to optionally wait until containers are in a healthy state to create rules for them, but I don't see how that'd solve your issue. How can you guarantee the order containers are started across multiple Docker Compose files by setting health checks? I don't understand that. The reason you couldn't create container -> container rules by specifying the container's static IP is because the way whalewall creates nftables rules. Packets entering the 'whalewall' chain jump to the appropriate container chain based on the source IP. For established inbound traffic for an outbound rule the packets will originate from the container specified in the outbound rule, hence requiring nftables rules to be created in both container's chains. |
Because I can do a health check using curl command to check if the other container is started, if success container will be in healthy state and then whalewall will create the rules. |
Right, I understand how health checks work, I just don't see how setting a health check on a container will affect the order in which containers in different docker compose files will be started. How are you starting/stopping the different stacks? For each compose file, 'docker compose up $file'? If a container specifies a rule mentioning a container that is not yet started, whalewall will continue handling container start events for up to 10 seconds. Basically it'll wait for the mentioned container to hopefully start. If the container doesn't start within 10 seconds, whalewall will return an error. This can be changed with the '-t' flag, you may be able to solve your problem by specifying a longer timeout. I realize from the description of the '-t' flag that it affects waiting for containers isn't obvious, I think I should either update the description or add another flag only for container waiting timeouts. |
We use portainer and when you reboot the server the stacks starts without an order. When you use -t flag whalewall doesn't create db files.
Using binary works well, but after reboot and executing whalewall in cron job (whalewall -t 30s) I have this error:
|
Ah, because the location of the database is specified in the As for your original issue, I've been thinking about it a bit, and I think I may have come up with a solution. Try pulling the latest Docker image and omitting the If that works I'll create a release soon. |
Hello, thanks for your reply. Now, whalewall only applies the rules when you restart the whalewall container, not when you update the labels of the containers that have rules. Also when you restart the machine, no rule is applied. Rules are deleted as you can see with "nft list table ip filter" command after machine restart. It seems that the database is not working properly. |
Hmm that's weird, if anything I'd think that functionality would be working better now, not worse... Are you running whalewall as a container or a binary, and what arguments are you passing it? |
I've tried both and the same behavior. No arguments, just running whalewall container or "./whalewall" if using the binary. Can you test it by restarting the computer? There is no rule after rebooting the computer and whalewall does not create them until you run './whalewall -clear' and then './whalewall'. Also when you add whalewall labels to a container it does not create the rules until you restart whalewall. |
I will do some in depth testing, but note that whalewall has never acted on container label change events, whalewall only responds to container 'start' or 'die' events. If you want to change the rules of a container you'll have to update the labels and restart it. |
Also just to confirm, if you didn't already can you confirm that the previous version of whalewall you were using didn't have this bug? If so can you post the output of 'whalewall -version' for both the working old version and the buggy new version? Thanks |
I know but when I update the labels and restart the containers whalewall doesn't update the rules (no log about rules in whalewall container), it only does if I restart the whalewall container. |
Oh, I think I know what's wrong then, thanks for clarifying |
No. In v0.2.3 version when you restart host machine, whalewall applies the rules using the database. In the newer version, whalewall doesn't apply any rule. Buggy build:
|
I think I finally found the source of the regression, can you please test again? Specifically, whalewall should modify firewall rules for containers when they are created or stopped whether it is running or not. The builds you were testing with had a bug where Docker events were ignored, so whalewall only modified firewall rules when it started up. If that's fixed, let me know if your original issue is solved as well, I think it should be. If you still have issues, I'd run |
Now working well, but rules still not working after a machine reboot. Here whalewall log before and after reboot:
nft table after reboot:
As you can see all rules are flushed after reboot and whalewall is not creating the existent rules using the database. So i need to restart the containers two times:
|
Hmm that's very weird, haven't noticed any behavior like that before... Can you share a minimal version of your docker compose file that still causes this issue, and test again with debug logging enabled? If you set the command of the whalewall container to '-d /new-data -debug', it will both ensure you're using a new database with the new schema and enable debug logs. |
My docker compose: version: "3.8"
services:
whalewall:
container_name: whalewall
image: ghcr.io/capnspacehook/whalewall:latest
restart: unless-stopped
privileged: true
command:
- "-debug"
- "-d"
- "/data"
network_mode: host
volumes:
- whalewall_data:/data
- /var/run/docker.sock:/var/run/docker.sock:ro
volumes:
whalewall_data:
name: whalewall_data
driver: local
driver_opts:
o: bind
device: /mnt/docker/whalewall/volumes/whalewall_data
type: none whalewall log before reboot:
nft table before reboot:
Now after reboot, the nft table:
And whalewall log (including log entries before reboot):
|
Thanks, but to reproduce your issue I need the docker compose details for your 'wiki' and 'wiki_db' services. Redact any personal info if it exists in the config as long as it doesn't change the behavior with whalewall. |
Here you go (you can ommit traefik labels and just publish port 3000 tcp but u should use mapped_ports config from whalewall): version: "3.8"
services:
db:
container_name: wiki_db
image: postgres:14.5-alpine
restart: unless-stopped
environment:
POSTGRES_DB: wiki
POSTGRES_USER: wikijs
POSTGRES_PASSWORD: somepassword
networks:
wiki_network:
volumes:
- wiki_db_data:/var/lib/postgresql/data
labels:
# WHALEWALL
whalewall.enabled: true
wiki:
container_name: wiki
image: ghcr.io/dti-ceis/wiki:latest
restart: unless-stopped
depends_on:
- db
environment:
DB_TYPE: postgres
DB_HOST: db
DB_PORT: 5432
DB_NAME: wiki
DB_USER: wikijs
DB_PASS: somepassword
networks:
wiki_network:
public:
volumes:
- wiki_data:/wiki/data/content
labels:
# WHALEWALL
whalewall.enabled: true
whalewall.rules: |
output:
# Permitir a la base de datos
- network: wiki_network
container: wiki_db
proto: tcp
port: 5432
# TRAEFIK
traefik.enable: true
traefik.http.routers.wiki.rule: "Host(`wikidev.iamruben.local`)"
traefik.http.routers.wiki.entrypoints: "web,websecure"
traefik.http.routers.wiki.tls: true
traefik.http.services.wiki.loadbalancer.server.port: 3000
traefik.http.routers.wiki.service: "wiki"
networks:
wiki_network:
external: true
public:
external: true
volumes:
wiki_data:
name: wiki_data
driver: local
driver_opts:
o: bind
device: /mnt/docker/wiki/volumes/wiki_data
type: none
wiki_db_data:
name: wiki_db_data
driver: local
driver_opts:
o: bind
device: /mnt/docker/wiki/volumes/wiki_db_data
type: none |
Hmm, maybe part of the problem is that I've never tested with |
From command line or portainer. docker network create -d bridge wiki_network
docker network create -d bridge public |
Thanks for troubleshooting for so long with me, I may have finally fixed everything. Pull the latest image and try again please, whalewall should recreate rules properly even after a reboot. If you're interested as to what the problem was, I detailed it in this commit message: 6515da8 |
Thanks for fix that. Now seems to be working, but when using traefik container as a reverse proxy and reboot the machine, whalewall fails to create some rules and I need to restart Seems whalewall detects Compose file of traefik: version: "3.8"
services:
traefik:
container_name: traefik
image: traefik:latest
restart: unless-stopped
environment:
- TRAEFIK_ENTRYPOINTS_WEB_ADDRESS=:80
- TRAEFIK_ENTRYPOINTS_WEBSECURE_ADDRESS=:443
- TRAEFIK_ENTRYPOINTS_WEB_HTTP_REDIRECTIONS_ENTRYPOINT_TO=websecure
- TRAEFIK_ENTRYPOINTS_WEB_HTTP_REDIRECTIONS_ENTRYPOINT_SCHEME=https
- TRAEFIK_ENTRYPOINTS_WEB_HTTP_REDIRECTIONS_ENTRYPOINT_PERMANENT=true
- TRAEFIK_PROVIDERS_DOCKER=true
- TRAEFIK_PROVIDERS_DOCKER_EXPOSEDBYDEFAULT=false
- TRAEFIK_PROVIDERS_DOCKER_NETWORK=public
- TRAEFIK_API_DASHBOARD=true
- TRAEFIK_LOG_LEVEL=info
- TZ=Europe/Madrid
ports:
- target: 80
published: 80
protocol: tcp
mode: host
- target: 443
published: 443
protocol: tcp
mode: host
volumes:
- "/var/run/docker.sock:/var/run/docker.sock:ro"
networks:
public:
labels:
#WHALEWALL
whalewall.enabled: true
whalewall.rules: |
mapped_ports:
external:
allow: true
output:
- network: public
container: portainer
proto: tcp
port: 9000
- network: public
container: wiki
proto: tcp
port: 3000
# Permite el trafico HTTP
- log_prefix: "http"
proto: tcp
port: 80
# Permite el trafico HTTPS
- log_prefix: "https"
proto: tcp
port: 443
# TRAEFIK
traefik.enable: true
traefik.http.routers.traefik.entrypoints: "web,websecure"
traefik.http.routers.traefik.tls: true
traefik.http.routers.traefik.rule: "Host(`traefikdev.iamruben.local`)"
traefik.http.routers.traefik.service: "api@internal"
traefik.http.services.traefik.loadbalancer.server.port: 8080
networks:
public:
external: true Before restart
After restart
I don't know what is the meaning of this error Also, I think -clear flag is not working properly (not flushing nft table). Thanks in advance. |
It looks like adding the traefik container to the database fails because one of its IP addresses is already in the database somehow. That's why whalewall always thinks it's new, because it was never added to the database. That's also why traefik's rules are cleared, because whalewall doesn't remember it as it isn't in the database. Can you send the full docker compose file with all containers that use whalewall? Good to know the previous issues are solved at least. |
Try to restart traefik container and reboot host machine to reproduce the error. INF | msg=creating rules container.id=54ff108fcdab container.name=traefik container.is_new=false
ERR | msg=error creating output rules container.id=54ff108fcdab container.name=traefik error=conn.Receive: netlink receive: no such file or directory stacktrace=github.com/capnspacehook/whalewall.(*RuleManager).createContainerRules
github.com/capnspacehook/whalewall/create.go:288
github.com/capnspacehook/whalewall.(*RuleManager).createRules
github.com/capnspacehook/whalewall/create.go:62
github.com/capnspacehook/whalewall.(*RuleManager).Start.func1
github.com/capnspacehook/whalewall/manager.go:118 Workaround is restart traefik container to recreate the rules as a "new" container: INF | msg=deleting rules container.id=54ff108fcdab container.name=traefik
INF | msg=creating rules container.id=54ff108fcdab container.name=traefik container.is_new=true Below the environment details:
docker network create -d bridge public
docker network create -d bridge wiki_network
version: "3.8"
services:
traefik:
container_name: traefik
image: traefik:latest
restart: unless-stopped
environment:
- TRAEFIK_ENTRYPOINTS_WEB_ADDRESS=:80
- TRAEFIK_ENTRYPOINTS_WEBSECURE_ADDRESS=:443
- TRAEFIK_ENTRYPOINTS_WEB_HTTP_REDIRECTIONS_ENTRYPOINT_TO=websecure
- TRAEFIK_ENTRYPOINTS_WEB_HTTP_REDIRECTIONS_ENTRYPOINT_SCHEME=https
- TRAEFIK_ENTRYPOINTS_WEB_HTTP_REDIRECTIONS_ENTRYPOINT_PERMANENT=true
- TRAEFIK_PROVIDERS_DOCKER=true
- TRAEFIK_PROVIDERS_DOCKER_EXPOSEDBYDEFAULT=false
- TRAEFIK_PROVIDERS_DOCKER_NETWORK=public
- TRAEFIK_API_DASHBOARD=true
- TRAEFIK_LOG_LEVEL=info
- TZ=Europe/Madrid
ports:
- target: 80
published: 80
protocol: tcp
mode: host
- target: 443
published: 443
protocol: tcp
mode: host
volumes:
- "/var/run/docker.sock:/var/run/docker.sock:ro"
networks:
public:
labels:
# WHALEWALL
whalewall.enabled: true
whalewall.rules: |
mapped_ports:
external:
allow: true
output:
- network: public
container: wiki
proto: tcp
port: 3000
- log_prefix: "http"
proto: tcp
port: 80
- log_prefix: "https"
proto: tcp
port: 443
# TRAEFIK
traefik.enable: true
traefik.http.routers.traefik.entrypoints: "web,websecure"
traefik.http.routers.traefik.tls: true
traefik.http.routers.traefik.rule: "Host(`traefik.iamruben.local`)"
traefik.http.routers.traefik.service: "api@internal"
traefik.http.services.traefik.loadbalancer.server.port: 8080
networks:
public:
external: true
version: "3.8"
services:
db:
container_name: wiki_db
image: postgres:14.5-alpine
restart: unless-stopped
environment:
POSTGRES_DB: wiki
POSTGRES_USER: wikijs
POSTGRES_PASSWORD: somepassword
networks:
wiki_network:
volumes:
- wiki_db_data:/var/lib/postgresql/data
labels:
# WHALEWALL
whalewall.enabled: true
wiki:
container_name: wiki
image: requarks/wiki
restart: unless-stopped
depends_on:
- db
environment:
DB_TYPE: postgres
DB_HOST: db
DB_PORT: 5432
DB_NAME: wiki
DB_USER: wikijs
DB_PASS: somepassword
networks:
wiki_network:
public:
volumes:
- wiki_data:/wiki/data/content
labels:
# WHALEWALL
whalewall.enabled: true
whalewall.rules: |
output:
- network: wiki_network
container: wiki_db
proto: tcp
port: 5432
# TRAEFIK
traefik.enable: true
traefik.http.routers.wiki.rule: "Host(`wiki.iamruben.local`)"
traefik.http.routers.wiki.entrypoints: "web,websecure"
traefik.http.routers.wiki.tls: true
traefik.http.services.wiki.loadbalancer.server.port: 3000
traefik.http.routers.wiki.service: "wiki"
networks:
wiki_network:
external: true
public:
external: true
volumes:
wiki_data:
wiki_db_data: |
I tried to reproduce with your exact Compose configs and did find and fix a bug related to what you mentioned, but I'm not sure if it's the exact issue your described. Can you test again please? Note that you'll have to update the whalewall configs slightly since I've added and renamed port related fields to support multiple source and destination ports per rule, see the README for details. |
Using this rule:
If the container 'test_container' isn't running whalewall can't create all the rules for this service.
That's correct but what happen when you have multiple stacks (docker compose files) and do a server reboot. Whalewall won't create the rules for some services because some containers isn't running (all the stacks are deployed at the same time)
So we can control the start order for services on the same docker compose file with 'depends_on' label, but not the services in other compose files. We could control this with a health check instead.
Is possible to create the rules only when the container is in healthy state? We can do a health check to check if the dependant container has already started and then create the whalewall rules when the service has the healthy label. This can be configurable with a label like 'whalewall.healthy: true' if true create the rules when container is in healthy state, and if isn't set create the rules when container starts.
Or do 3-5 retries until the container starts?
I don't know how to solve this problem, because if i use static ip for containers i need to set rules in both containers.
Loki:
Promtail:
Dec 27 11:17:11 xxxxxxx kernel: [ 5433.189077] whalewall-loki-16c68c683925 drop: IN=xxxxxxxx OUT=xxxxxxxx PHYSIN=xxxxxxx PHYSOUT=xxxxxxxx MAC=xxxxxxxxxxxxxx SRC=172.27.0.2 DST=172.27.0.3 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=TCP SPT=3100 DPT=47836 WINDOW=65160 RES=0x00 ACK SYN URGP=0
nft chain:
Rule "ip saddr 172.27.0.2 ip daddr 172.27.0.3 tcp sport 3100 ct state established,related counter packets 0 bytes 0 accept" not working because is blocking by "log prefix "whalewall-loki-16c68c683925 drop:" first.
Traffic will be blocked in the opposite side (idk the reason of this traffic because it's only promtail -> loki). But when using container label, everything works:
nft chain:
The text was updated successfully, but these errors were encountered: