-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: [NPM] [Linux] improve iptables version detection and cleanup #3090
Conversation
Signed-off-by: Hunter Gregory <[email protected]>
d3d96ce
to
716f6f1
Compare
Signed-off-by: Hunter Gregory <[email protected]>
716f6f1
to
5f3fa4b
Compare
Signed-off-by: Hunter Gregory <[email protected]>
/azp list |
/azp run NPM Conformance Tests |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run NPM Scale Test |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run Azure Container Networking PR |
Azure Pipelines successfully started running 1 pipeline(s). |
Signed-off-by: Hunter Gregory <[email protected]>
Signed-off-by: Hunter Gregory <[email protected]>
27a8174
to
acc84e3
Compare
Signed-off-by: Hunter Gregory <[email protected]>
Signed-off-by: Hunter Gregory <[email protected]>
/azp run Azure Container Networking PR |
/azp run NPM Conformance Tests |
/azp run NPM Scale Test |
Azure Pipelines successfully started running 1 pipeline(s). |
2 similar comments
Azure Pipelines successfully started running 1 pipeline(s). |
Azure Pipelines successfully started running 1 pipeline(s). |
Signed-off-by: Hunter Gregory <[email protected]>
/azp run Azure Container Networking PR |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run NPM Conformance Tests |
/azp run NPM Scale Test |
Azure Pipelines successfully started running 1 pipeline(s). |
1 similar comment
Azure Pipelines successfully started running 1 pipeline(s). |
Just repeated manual testing for this latest version. Confirmed that this version works and that v1.5.37 still seg faults |
should this be against the |
Planning to backport to v1.5 |
) * fix: improve iptables version detection Signed-off-by: Hunter Gregory <[email protected]> * fix: redo everything and add tests Signed-off-by: Hunter Gregory <[email protected]> * fix: address comments Signed-off-by: Hunter Gregory <[email protected]> * fix: avoid segfault by only listing one chain Signed-off-by: Hunter Gregory <[email protected]> * style: log the kernel version Signed-off-by: Hunter Gregory <[email protected]> * style: fix lints Signed-off-by: Hunter Gregory <[email protected]> * fix: don't use stale chains. add comments. minor style change Signed-off-by: Hunter Gregory <[email protected]> * fix: listing kube chain. get stderr too. also add missing ut Signed-off-by: Hunter Gregory <[email protected]> * fix: log messages Signed-off-by: Hunter Gregory <[email protected]> * fix: stop checking kernel version. default nft, never crash Signed-off-by: Hunter Gregory <[email protected]> * style: fix lint Signed-off-by: Hunter Gregory <[email protected]> * style: try fixing gci/gofumpt lint Signed-off-by: Hunter Gregory <[email protected]> * test: fix unit tests referencing iptables legacy Signed-off-by: Hunter Gregory <[email protected]> * style: fix lint in iptm_test.go Signed-off-by: Hunter Gregory <[email protected]> --------- Signed-off-by: Hunter Gregory <[email protected]>
…cleanup (#3110) * fix: [NPM] [Linux] improve iptables version detection and cleanup (#3090) * fix: improve iptables version detection Signed-off-by: Hunter Gregory <[email protected]> * fix: redo everything and add tests Signed-off-by: Hunter Gregory <[email protected]> * fix: address comments Signed-off-by: Hunter Gregory <[email protected]> * fix: avoid segfault by only listing one chain Signed-off-by: Hunter Gregory <[email protected]> * style: log the kernel version Signed-off-by: Hunter Gregory <[email protected]> * style: fix lints Signed-off-by: Hunter Gregory <[email protected]> * fix: don't use stale chains. add comments. minor style change Signed-off-by: Hunter Gregory <[email protected]> * fix: listing kube chain. get stderr too. also add missing ut Signed-off-by: Hunter Gregory <[email protected]> * fix: log messages Signed-off-by: Hunter Gregory <[email protected]> * fix: stop checking kernel version. default nft, never crash Signed-off-by: Hunter Gregory <[email protected]> * style: fix lint Signed-off-by: Hunter Gregory <[email protected]> * style: try fixing gci/gofumpt lint Signed-off-by: Hunter Gregory <[email protected]> * test: fix unit tests referencing iptables legacy Signed-off-by: Hunter Gregory <[email protected]> * style: fix lint in iptm_test.go Signed-off-by: Hunter Gregory <[email protected]> --------- Signed-off-by: Hunter Gregory <[email protected]> * fix: crash NPM if unable to locate kube chain Signed-off-by: Hunter Gregory <[email protected]> --------- Signed-off-by: Hunter Gregory <[email protected]>
Reason for Change:
Requirements:
New bootup logic
Policy Manager bootup will:
Changes
detecting iptables version
Fix #3094.
avoid segmentation fault at high scale
NPM has logic to detect whether kube proxy is using nft or legacy iptables. NPM must use the same version as kube proxy. That logic is based on this article.
We need to check whether kube chains are in mangle table. We used to get the whole table's output with
iptables-nft-save
. When there exist ~20K rules in iptables-nft, running this command can result in a segmentation fault error.A fix is to only list the kube chains we mean to check for via
iptables-nft -nL <chain>
.fix non-happy path
We used to use legacy tables for any failure to run the command detecting iptables version.
Now we update the logic based on this article to check for either kube chain in NFT. If either is there, use nft. If it's not there or there is any failure, check for either kube chain in legacy. If found, use legacy. Otherwise, default to nft.
clean up other iptables version
clean up nft tables if we use legacy
Previously, we only cleaned up legacy if using nft. Support the opposite.
prevent possibility of failing to clean up other iptables version
Fixes #3088
NPM should crash if it fails to guarantee that it has cleaned up the other iptables enough so that packets won't be accepted/dropped.
The criteria for crashing are 1) failed to delete any jump to AZURE-NPM chain and 2) failed to flush AZURE-NPM chain.
first, try flushing all chains
Used to flush each chain one by one. Now try restoring all flushes. Flush one by one if restore fails.
Testing
All the below tests were performed where NPM v1.5.37 was getting segmentation fault. I added 30K iptables rules in nft, causing the segmentation fault for
iptables-nft-save
.detect nft based on hint chain. Then clean up legacy:
detect legacy based on kube canary chain. Then clean up nft:
crash on failure to clean up other iptables