Skip to content

Conversation

@sjenning
Copy link
Contributor

@sjenning sjenning commented Aug 8, 2019

When a container is created, cadvisor creates a handler, triggered by inotify. In the case of a crio managed container, the crio handler immediately queries crio for the pid of the just created container. Due to performance concerns in crio, the pid might not yet be known and thus returns 0. If the pid is 0, then network stats are not collected for the lifetime of the container or the lifetime of cadvisor, whichever is shorter.

https://github.com/google/cadvisor/blob/master/container/libcontainer/handler.go#L80-L83

This PR changes the crio handler to requery crio for the pid if the pid was unknown during handler initialization or the last GetStat() call. Once the pid becomes known, the libcontainer handler with the non-zero pid is cached and no further crio queries are needed.

Fixes openshift/origin#23492

@mrunalp @dashpole

@mrunalp
Copy link
Contributor

mrunalp commented Aug 8, 2019

👍

@sjenning
Copy link
Contributor Author

sjenning commented Aug 8, 2019

fyi @Reamer

@Reamer
Copy link

Reamer commented Aug 8, 2019

Hi @sjenning,
I will try this fix today in my test environment.

Reamer pushed a commit to Reamer/origin that referenced this pull request Aug 8, 2019
Reamer added a commit to Reamer/origin that referenced this pull request Aug 8, 2019
@Reamer
Copy link

Reamer commented Aug 8, 2019

I think you have a small typo.

diff --git a/container/crio/handler.go b/container/crio/handler.go
index 7d0c5a83..d9eab06e 100644
--- a/container/crio/handler.go
+++ b/container/crio/handler.go
@@ -296,7 +296,7 @@ func (self *crioContainerHandler) getLibcontainerHandler() *containerlibcontaine
 	}
 
 	self.pidKnown = true
-	self.libcontainerHandler = containerlibcontainer.NewHandler(self.cgroupManager, self.rootFs, cInfo.Pid, self.ignoreMetrics)
+	self.libcontainerHandler = containerlibcontainer.NewHandler(self.cgroupManager, self.rootFs, cInfo.Pid, self.includedMetrics)
 
 	return self.libcontainerHandler
 }

@sjenning
Copy link
Contributor Author

sjenning commented Aug 8, 2019

ah yes, I made this patch against origin 3.11 vendor tree and that changed in master

@sjenning sjenning force-pushed the crio-retry-get-pid branch from 98cb778 to 73b0de8 Compare August 8, 2019 16:21
@sjenning
Copy link
Contributor Author

sjenning commented Aug 8, 2019

/assign @dashpole

@mrunalp
Copy link
Contributor

mrunalp commented Aug 8, 2019

/retest

Copy link
Collaborator

@dashpole dashpole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Missing network metrics from cadvisor after some time

4 participants