Protect against hung subprocesses with inactivity timeouts #84

sdboyer · 2016-08-18T02:13:18Z

Noticed today that trying to bzr branch with the bzr:// protocol ended up causing the child process to just...sit, generating no output.

We need some basic protection against this (there's no general solution, because halting problem). Sadly, we can't just use a simple timeout, as there are times when we absolutely expect these processes to take a while (e.g., cloning kubernetes). However, if we can watch stdout/stderr for output and update a timer every time we see activity, then we should be able to set an "inactivity" timeout, where we kill the command only if a certain window has elapsed since progress was reported. That'll work for most git, hg, and bzr operations at least, I think.

The text was updated successfully, but these errors were encountered:

erizocosmico · 2017-01-28T01:14:08Z

I can take this one, as said in #159!

sdboyer · 2017-01-28T02:01:57Z

Awesome. Please don't hesitate AT ALL to ask questions. Here, or in slack - whatever. Meantime, here's some starting thoughts:

I think you're likely to run into a blocker right from the start the start, as Masterminds/vcs does not offer an interface for direct control over executing processes. I can't imagine a way of doing this without direct access to the running exec.Command object and using Start(), as everything else will just block until the command completes. So, you may need to start with a patch to Masterminds/vcs that creates a companion to RunFromDir that returns the assembled exec.Command object instead of calling CombinedOutput(). (/cc @mattfarina)
We do know we plan to be passing down cancellation channels eventually (per Incorporate context.Context into SourceManager #159) into the command loop, so design with that in mind.
Be warned, I plan to convert gps from glide to dep tonight. (This is risky, because https://github.com/golang/dep/milestone/1)
The *SourceMgr side of the code (as opposed to the solver) has been devolving towards spaghetti, and the path by which these objects get created/managed is kinda long and winding. I like grokking bigger-picture context before I work, so in case you do, here's how I recommend you go about grokking the flow:
- All the logic you'll be changing is inside source types, which are called from pretty much all the public interface methods, returned from SourceMgr.getSourceFor()
- You can mostly skip the deducer stuff there - just see how a maybeSource gets set up, and how it interweaves with baseVCSSource
- There's tons of logic that tries to "lazily" determine when it needs to do work. Much of the refactor I need to do is bringing some sanity and order to that, so try to ignore it. Just focus on the calls that are made against the baseVCSSource.crepo.r - that's the actual Masterminds/vcs object.
Tests, please! 😄 This part of gps isn't nearly as well-tested as the solver, but it's time to impose discipline.
Yes, a bunch of these commands are very racey. The solver is a strictly serial algorithm, but that's not a good excuse. Really, I just wasn't thinking about it when I did my first pass. Fixing that is my primary motivation for refactoring all of SourceMgr.

sdboyer · 2017-01-28T02:09:13Z

I'd also be happy to find time for a an hour pairing session on this, if you think it'd be helpful. Maybe once you've gotten your hands dirty a bit, and you have some questions about how the hell it all fits together.

erizocosmico · 2017-01-29T10:58:03Z

Nice, let me grok a bit with the code and I'll ping you when I have some questions!

EDIT:

There are some operations in gps that don't directly use RunFromDir, but directly use the API of vcs. These operations could not be monitored unless we migrate them to RunFromDir or add the timeout directly into vcs, which I guess is not what we want. Should I assume we'll manage the exec.Cmd always for now?

Aside from that, I've been thinking about how to implement this and I thought of the following:

In vcs: add a method to Repo interface (and implementors) CmdFromDir which does the exact same thing RunFromDir does, except it does not call CombinedOutput and returns the command. Also, modify RunFromDir so it uses CmdFromDir internally and avoid having the same code on both.
In gps: create a type MonitoredCmd (any suggestions on the name?, doesn't quite convince me), which would be created with a timeout and an exec.Cmd. We can offer a exec.Cmd-like API providing CombinedOutput/Output and Run/Start+Wait. Adding context.Context to the mix should be fairly easy. Instead of just killing the underlying cmd process when there is no activity, do it when ctx deadline is met as well.

Any thoughts about this before I start coding the proposed solution?

Also, gophers slack says I need to contact an admin to get an invite.

sdboyer · 2017-01-29T14:27:54Z

Both of these ideas sound 👍!

I don't have any name suggestions for MonitoredCmd - it seems fine. Though, unless there's some reason it needs to be exposed via the public interface, it can just be monitoredCmd for now. That'll give us more leeway in renaming it in the future.

Of course, if we're inferring from stdout/stderr outputs, it may require a bit of research to pick sane numbers for these inactivity timeouts.

sdboyer · 2017-01-29T14:34:44Z

Oh, and https://invite.slack.golangbridge.org/ should get you an invite for the gopher slack

erizocosmico · 2017-01-29T15:25:32Z

Thanks! Will start working on the implementation then

sdboyer · 2017-02-10T01:38:42Z

Just checking in on this :)

erizocosmico · 2017-02-10T10:04:21Z

I'm working on implementing the monitored cmds on all operations right now. The issue with this is that all operations that are not RunFromDir can't be trivially changed to use monitoredCmd.
For example, when vcs.Repo.Update is used, internally that cmd is not monitored. The solution to that is reimplement the Update (or any other method being used) for all possible VCSs using monitored cmds instead.

What are your thoughts about this?

sdboyer · 2017-02-10T11:53:07Z

Unfortunately, I think you're right - there's no magical way around that design of the interface. And now that you say it, I'm picturing just how much work that is ☹️ Thanks for sticking with it!

The only real thought I'd have about it is that I'd love to see the WIP as you go in a PR, as the e.g. vcs_source.go file is already poorly organized, but still bloated. I don't know exactly what that SHOULD look like, but I'd rather avoid having you get to the end of the work and finding myself wanting to ask for a bunch of reorganization.

sdboyer · 2017-04-16T15:41:04Z

I think this can be marked as done now. There are some improvements needed for handling on windows, but that's not the original scope here.

Would you agree, @erizocosmico ?

erizocosmico · 2017-04-16T19:49:06Z

👍 On Sun, 16 Apr 2017 at 17:42, sam boyer <[email protected]<mailto:[email protected]>> wrote: I think this can be marked as done now. There are some improvements needed for handling on windows, but that's not the original scope here. Would you agree, @erizocosmico<https://github.com/erizocosmico> ? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#84 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ABQFF-aFI9FEkigxu8nnctVBcwv3WB8xks5rwjaQgaJpZM4JnEYL>.

sdboyer added the bug label Aug 18, 2016

sdboyer changed the title ~~Protect against inactive subprocess timeouts~~ Protect against inactive subprocesses with inactivity timeouts Aug 18, 2016

sdboyer changed the title ~~Protect against inactive subprocesses with inactivity timeouts~~ Protect against hung subprocesses with inactivity timeouts Aug 18, 2016

sdboyer mentioned this issue Dec 29, 2016

Feedback: dep ensure must not fail catastrophically if operations time out golang/dep#66

Closed

sdboyer mentioned this issue Jan 25, 2017

signal handling should happen in dep, not gps golang/dep#160

Closed

sdboyer added the help wanted label Jan 28, 2017

sdboyer mentioned this issue Jan 28, 2017

Incorporate context.Context into SourceManager #159

Open

erizocosmico mentioned this issue Jan 29, 2017

Expose CmdFromDir Masterminds/vcs#65

Merged

erizocosmico mentioned this issue Feb 10, 2017

Monitored commands #171

Merged

sdboyer mentioned this issue Feb 13, 2017

Integration tests failing? golang/dep#214

Closed

sdboyer mentioned this issue Mar 6, 2017

Improve performance golang/dep#289

Closed

sdboyer closed this as completed Apr 16, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Protect against hung subprocesses with inactivity timeouts #84

Protect against hung subprocesses with inactivity timeouts #84

sdboyer commented Aug 18, 2016 •

edited

Loading

erizocosmico commented Jan 28, 2017

sdboyer commented Jan 28, 2017 •

edited

Loading

sdboyer commented Jan 28, 2017

erizocosmico commented Jan 29, 2017 •

edited

Loading

sdboyer commented Jan 29, 2017

sdboyer commented Jan 29, 2017

erizocosmico commented Jan 29, 2017

sdboyer commented Feb 10, 2017

erizocosmico commented Feb 10, 2017

sdboyer commented Feb 10, 2017

sdboyer commented Apr 16, 2017

erizocosmico commented Apr 16, 2017 via email

Protect against hung subprocesses with inactivity timeouts #84

Protect against hung subprocesses with inactivity timeouts #84

Comments

sdboyer commented Aug 18, 2016 • edited Loading

erizocosmico commented Jan 28, 2017

sdboyer commented Jan 28, 2017 • edited Loading

sdboyer commented Jan 28, 2017

erizocosmico commented Jan 29, 2017 • edited Loading

sdboyer commented Jan 29, 2017

sdboyer commented Jan 29, 2017

erizocosmico commented Jan 29, 2017

sdboyer commented Feb 10, 2017

erizocosmico commented Feb 10, 2017

sdboyer commented Feb 10, 2017

sdboyer commented Apr 16, 2017

erizocosmico commented Apr 16, 2017 via email

sdboyer commented Aug 18, 2016 •

edited

Loading

sdboyer commented Jan 28, 2017 •

edited

Loading

erizocosmico commented Jan 29, 2017 •

edited

Loading