Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttachContainer sometimes hangs #114

Open
gravis opened this issue Jul 8, 2014 · 25 comments
Open

AttachContainer sometimes hangs #114

gravis opened this issue Jul 8, 2014 · 25 comments
Labels

Comments

@gravis
Copy link

gravis commented Jul 8, 2014

Hi,

I'm fighting with a nasty bug. Sometimes, but not always (a bit hard to reproduce), AttachContainer will hangs, eating 100% of CPU.
I have profiled my program and ended up with this (profiled during the hang):

(pprof) top
Total: 3002 samples
    1906  63.5%  63.5%     3002 100.0% github.com/fsouza/go-dockerclient/utils.StdCopy
    1096  36.5% 100.0%     1096  36.5% bufio.(*Reader).Read
       0   0.0% 100.0%     3002 100.0% github.com/fsouza/go-dockerclient.func·001
       0   0.0% 100.0%     3002 100.0% runtime.gosched0

I'm using the latest go-dockerclient (master / e21b2a4), and having the same problem with older versions (tried until 992e6ab)

My code is pretty standard:

  var buf bytes.Buffer
  opts := docker.AttachToContainerOptions{
      Container:    w.container.ID,
      OutputStream: &buf,
      Logs:         true,
      Stdout:       true,
      Stderr:       true,
  }
  err = w.client.AttachToContainer(opts)
  if err != nil {
      log.Printf("Error while fetching logs from container: %s\n", err)
  }

and works "most of the time" (99% of runs).

I'm not sure if it's related to docker, but here's my configuration:

Containers: 0
Images: 206
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Dirs: 206
Execution Driver: native-0.2
Kernel Version: 3.2.0-4-amd64 

Any hint would be appreciated :)

@fsouza
Copy link
Owner

fsouza commented Jul 8, 2014

Which docker version are you using? I will try to reproduce. If you're using docker>=0.11, does it happen with the method Logs as well? It uses a different endpoint in the Docker API.

Is your container running anything special? Or are you able to reproduce it with any command?

@gravis
Copy link
Author

gravis commented Jul 8, 2014

I didn't know the Logs commands, thanks for pointing it. Will try and report in a few minutes.

My container doesn't run anything special, just a shell script to get bower dependencies, and it doesn't always hangs if I restart the process with the same params, just sometimes.

I'm running docker 1.1.0. By the way, the container is stopped (Exited (0) 2 minutes ago) when I try to fetch to output (logs).

@gravis
Copy link
Author

gravis commented Jul 8, 2014

Ok, so with Logs, it's a bit better.
I'm killing our api server, because most of the results are invalid json. There's a weird char before the response from the container:

buf.String() ��{"name":"angular-resource","versions":["1.3.0-build.2893+sha.5963b5c"," [...]

(from code:

var buf bytes.Buffer
opts := docker.LogsOptions{
    Container:    w.container.ID,
    OutputStream: &buf,
    Stdout:       true,
    Stderr:       true,
}
err = w.client.Logs(opts)
if err != nil {
    log.Printf("Error while fetching logs from container: %s\n", err)
}

fmt.Printf("buf.String() %+v\n", buf.String())

Where docker run simply returns:

{"name":"angular-resource","versions":["1.3.0-build.2893+sha.5963b5c"," [...]

@fsouza
Copy link
Owner

fsouza commented Jul 8, 2014

@gravis that's bizarre.

I'm able to reproduce the bug with the strange caracter in the method Logs, I will have a look.

I'm trying to reproduce now the attach issue.

Thanks for the reports!

@gravis
Copy link
Author

gravis commented Jul 8, 2014

That's "good" news, at least it's not related to my docker install nor my image/container :)
Thanks again for your time and patience.

@fsouza
Copy link
Owner

fsouza commented Jul 8, 2014

@gravis the bug is with the docker multiplexing capability. We need to interpret that character and find out whether it's stdout or stderr. utils.StdCopy does that for us.

I'll push a fix soon. Still trying to figure out the AttachToContainer issue.

@gravis
Copy link
Author

gravis commented Jul 8, 2014

Great!

Regarding AttachToContainer, were you able to reproduce it at least once?

@fsouza
Copy link
Owner

fsouza commented Jul 9, 2014

@gravis nope, not yet :(

fsouza added a commit that referenced this issue Jul 9, 2014
Docker sends the output in the following format:

	<header identifying the stream and with the size fo the frame>
	<frame>

It allows us to multiplex the content between the output and error
streams.

I wondering whether I should rewrite the StdCopy function, it's a little
messy.

Related to #114.
@fsouza
Copy link
Owner

fsouza commented Jul 9, 2014

I fixed a bug in StdCopy implementation, I don't think it would hang, but it may be related.

I've also fixed the Logs issue. Can you give it another try (both for Logs and AttachToContainer)?

@gravis
Copy link
Author

gravis commented Jul 9, 2014

Great!

@gravis
Copy link
Author

gravis commented Jul 9, 2014

First test: Logs is hanging like AttachContainer now (+CPU @100%), so the issue is probably located around hijack or StdCopy

@fsouza
Copy link
Owner

fsouza commented Jul 9, 2014

Logs doesn't use hijack, so it's probably StdCopy. I'm still unable to reproduce, but I think we're closer to the solution now.

Please try setting RawTerminal: true on LogsOptions (or AttachToContainerOptions), you will get the bad characters again, but StdCopy won't get called.

@gravis
Copy link
Author

gravis commented Jul 9, 2014

Ouch, AttachContainer is returning weird chars now from container logs :(

d┘▒┼±⎺↑ce┌e⎼≤==2↓2↓7
d┘▒┼±⎺↑┐⎺└b┤>=▮↓9↓4
└☃└e⎻▒⎼⎽e>=▮↓1↓3
⎻≤├▒⎺┼↑d▒├e┤├☃┌==1↓5
⎻d°└☃┼e⎼==2▮11▮515
stripe==1.12.0
⎻≤c⎼≤⎻├⎺==2↓6
b⎺├⎺==2↓13↓3
d┘▒┼±⎺↑├☃└e≥⎺┼e⎽==▮↓2
d┘▒┼±⎺↑⎽e⎽==▮↓4↓1
Ge⎼▒┌d⎺==▮↓4↓16
d┘▒┼±⎺↑├▒⎽├≤⎻☃e==▮↓9↓12
┘⎺▒┼┼≤↑c▒c▒e==1↓4
d⎺c┤├☃┌⎽==▮↓9↓1
chardet==2.1.1
dj-database-url==0.2.1
psycopg2==2.4.5
gunicorn==0.17.2
django-s3-folder-storage==0.1
django-storages==1.1.5
python-memcached==1.48
W▒⎺⎺⎽▒==2↓4↓1
⎽☃└⎻┌e↑⎽▒┌e⎽°⎺⎼ce==▮↓51

@gravis
Copy link
Author

gravis commented Jul 9, 2014

(we should probably create a branch to test, master is getting messy :()

@fsouza
Copy link
Owner

fsouza commented Jul 9, 2014

@gravis with or without RawTerminal? Seems like it's working fine here. I will run some tsuru integration tests against go-dockerclient's master.

Were you able to test it with RawTerminal: true? Does it hang?

@gravis
Copy link
Author

gravis commented Jul 9, 2014

Definitely voting for an issue with StdCopy. It works fine (so far) with RawTerminal: true!
Let's wait another hour to be sure.

@gravis
Copy link
Author

gravis commented Jul 9, 2014

It would have hanged already. Using RawTerminal fixed it. Thanks
There's something to fix in StdCopy :(

@fsouza
Copy link
Owner

fsouza commented Jul 9, 2014

Good news, I've just managed to reproduce the hang in an automated test. It fails whenever StdCopy needs to grow the buffer. I will push a fix soon :)

@fsouza fsouza closed this as completed in 9dba2cd Jul 9, 2014
@gravis
Copy link
Author

gravis commented Jul 9, 2014

You're my hero :)

@kminehart
Copy link

kminehart commented Jun 20, 2022

I think this may still be happening.

grafana-kminehart-linux ~ » docker version
Client:
 Version:           20.10.14
 API version:       1.41
 Go version:        go1.18
 Git commit:        a224086349
 Built:             Thu Mar 24 08:56:17 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server:
 Engine:
  Version:          20.10.14
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.18
  Git commit:       87a90dc786
  Built:            Thu Mar 24 08:56:03 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.6.2
  GitCommit:        de8046a5501db9e0e478e1c10cbcfb21af4c6b2d.m
 runc:
  Version:          1.1.1
  GitCommit:
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

In github.com/grafana/scribe, I have this function:

https://github.com/grafana/scribe/blob/fix-docker-mode/plumbing/pipeline/clients/docker/container.go#L102

and I'm expecting the RunContainer function to return when the container exits.

It seems to work pretty consistently for the container created here but not the one created here.

This is what docker ps -a usually reports about the container while AttachToContainer is still waiting to return something:

e9052d92db24   ubuntu:latest   "/opt/scribe/pipelin…"   About a minute ago   Exited (0) About a minute ago             scribe--esDVKqiq

I have a feeling this may be related to how quickly the container exits before "AttachToContainer" is called. I'll try adding a sleep before echoing and seeing what happens.


Edit: confirmed, it's definitely "AttachToContainer" being called right before or right after the container has exited, causing it to hang. Do you think there might be a solution to this? I'd be happy to submit a PR.

@fsouza
Copy link
Owner

fsouza commented Jun 26, 2022

@kminehart hey, sorry for the delayed response. Do you think we'd be able to get a minimal reproducer? If not, I can dig into the code you shared and see if we can figure out exactly what's going on.

@fsouza fsouza reopened this Jun 26, 2022
@kminehart
Copy link

I can try to get one today!

@kminehart
Copy link

@fsouza Much appreciated, here's a link to an example I able to reproduce it with.

https://gist.github.com/kminehart/e50ae98557adc139ba9648077271367b

@stale
Copy link

stale bot commented Jul 31, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale label Jul 31, 2022
@fsouza fsouza removed the stale label Aug 3, 2022
@stale
Copy link

stale bot commented Nov 2, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale label Nov 2, 2022
@fsouza fsouza added bug and removed stale labels Nov 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants