Fix CNI api timeout for a long time #87
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What happened:
CNI timeout on pod initializing:
The timeout error will encountered continuously for 10-30mins even cni retry times.
After further investigation, the k8s client using
http2
protocol to connect apiserver.http2
will reuse tcp connection duration http requests. When CNI timeout error, I found theterwayd
's connection to apiserver become half-closed. The tcp state is stillESTABLISHED
, but request packages on the connection cannot get response from remote. After tcp retries about 10-30 minutes, the connecting will be reconstructed, and then the CNI able to return normal.How to resolve
reconstruct connection to apiserver immediately when half-closed connection produced
some commuity discuss on kubernetes/client-go#374
and kubelet's have done this on kubernetes/kubernetes#78016