-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: runtime: support "strict mode" run-time checks analogous to "go vet" #37681
Comments
A small remark, Go lang programs do have a main thread and on Android and other platforms we have to use LockOsThread to be sure that we are rendering, etc. on that main thread. But for the rest, extra checks are a useful idea. |
CC @bradfitz |
Another existing example of |
This is a great idea, but it seems the option name -race is already incorrect at this time. Sorry to bikeshed, but this should perhaps be renamed to -strict ? |
@beoran it's true that we've piggybacked on |
Thinking about this some more: I think the “strict mode” check should be separate from
In contrast, the “strict mode” checks would indicate something that is usually a problem — such as a memory or file-descriptor leak — but that (for example) might not cause incorrect behavior especially in a short-running program. I don't want the “strict mode” checks to undermine users' confidence in |
100% agree with Bryan. Also I'd like to see a clearer definition of what 'strict mode' means before we add a separate mode. So far it seems like just forgetting to close files? |
I have come to agree with Bryan too, after thinking about it for a few weeks. I think I should amend the proposal a bit, but it isn't clear what the new knob would be. Simply a standard around a
For the first cut, yes. In the original issue I list a couple of other goroutine-related ideas, under "more ideas". I didn't want to spend too much effort here, because it feels like we could bikeshed for a long time on what extra checks would or would not be a good idea. If we can agree on the high-level definition, I don't think the first cut of checks is that relevant. To me, the rough definition would be: the equivalent of vet, but at run-time instead of via static analysis. That is, program behavior that seems suspicious, and probably means a bug or unintended behavior of some sort. The key difference here is "suspicious" and "probably"; there is a possibility for false positives, which isn't the case for |
Randomizing the scheduler really was about races. Even with checkptr, right now -race is 100% accurate: if it crashes, you have a serious problem that needs fixing. That would not be true of forgetting to close one fd at the start of your program. The right things are happening anyway - the program is not incorrect. This sounds kind of like -lint to me, not -strict. I'm not super excited about linters in general. They tell me that my program is not what someone else wants it to be, even though it's correct. This is why I asked what else might go in. If it really is -lint, let's put our time into other things. It seems counterproductive to open the runtime to bikeshedding about linting. It's also worth asking whether the things in -strict/-lint are expected to be fast. There is more expensive cgo checking we could do, but it slows down the program quite a lot. |
The three cases that @mvdan described originally — unclosed files, permanently-blocked goroutines, and abandoned runnable goroutines that aren't relevant to the program's output — are all what I would describe as “leaks”: they all cause the program to consume unneeded resources for an unbounded amount of time. (The resource being leaked varies: it may be file descriptors, or kernel resources, or in-process memory, or CPU cycles.) So perhaps the thing to propose is a more narrowly defined “leak sanitizer”, which would cause programs to report detectably-leaked resources, and to which libraries could add their own best-effort leak detection (perhaps via a well-known build tag such as |
For every one of those three categories, I can give a completely reasonable, correct program that I'd be annoyed to have reported and "sanitized" for me. I define a lint check as something that's not about correctness. These are exactly lint checks by that definition. I don't believe we should start doing lint checks in the Go runtime. That path leads to no end of bikeshedding about exactly which checks to add, and grumbling from users like me who have perfectly fine programs getting flagged. Note that goroutine leaks show up in the pprof goroutine profile and runaway goroutines show up in pprof cpu profiles. It might be interesting to somehow flag finalized allocations in the heap profile. |
I agree with that definition of a “lint check”. However, I disagree that the existence of false-positives for these checks implies that they are “not about correctness”. A program that fails due to the system OOM killer (or due to running out of file descriptors) fails nonetheless. (Personally, I would generally consider a program with an unbounded goroutine or file-descriptor leak to be “incorrect”, and a program with a bounded leak to be “correct but inefficient”.) So I would say that the leak checks for these categories are about correctness, but have a nonzero rate of false-positives. Moreover, those false-positives should be rare and easy to avoid by writing clearer code — for example, by making the lifetimes of goroutines and open files correlate more directly to lexical blocks and/or explicit calls in the program. We already have a compile-time tool that detects correctness issues with a small-but-nonzero rate of false-positives, but that tool is I suspect that what @mvdan intends here is a run-time analogue to That is: we currently have three compile-time tools, but only two of the three corresponding run-time tools:
|
Correct - this is what I intended to say in an earlier comment, by "the equivalent of vet, but at run-time".
@bcmills already argued about doing this elsewhere, like a The only reason I'm convinced this should be part of the toolchain is for two reasons:
|
One important difference between static checks and runtime checks is that static checks can be scoped to one package. I can decide to run go vet on my code and not worry about the fact that go vet complains about stuff in your code, even if I depend on your code. At runtime that's a lot harder. So if I write a popular library that isn't "strict" the result is I'm going to get a steady stream of bug reports telling me I should make it "strict". It's not optional in the same way as go vet. The retitling still says the word "strict" but you mentioned a leak tag. Is this now scoped to be only about reporting leaks? If so probably the next thing to do is to enumerate the leaks we care about. And again we've used custom pprof profiles for leaks in the past and maybe that's the right answer here. Then the aggregate info can be inspected instead of reporting every single one. (An O(1) leak is not usually a leak; only O(n) leaks are.) |
One possible example of such a check is #8606 (comment) |
@josharian, I think that one really is more in |
That's true, but if you write a popular library that leaks resources (even O(1) resources), you're likely to receive the same bug reports from performance-sensitive users. (Witness, for example, a few recent CLs from @bradfitz making package initialization lazy in order to prune out O(1) “leaks” induced by otherwise-unused compile-time dependencies.) |
Even if this is a -leak mode, there's still a big difference between a user asking for a performance optimization (you can say no) and a user reporting that your package doesn't work with some standard (implicitly blessed) tool. I'm still not 100% sure exactly what's being proposed, but it seems like too heavy a change - it will cause work for everyone to clean up code that was allowed as correct before. (In contrast to the race detector, there are very few harmless races but plenty of harmless O(1) leaks, like the occasional os.File opened for read that the GC closes.) |
Does anyone want to take another stab at saying precisely what is being proposed here? |
Based on the discussion above, this proposal seems like a likely decline. |
I still think there's value in such checks, and I disagree that leaks are generally not bugs. The fact that most people don't notice them, and that the GC tends to clean up after a subset of them like open files, doesn't really mean they're not a problem. Right now, if one has such leaks in a running Go program, they can be hard to find. I disagree that tools like pprof or trace can be a good solution here. For example, I routinely work with Go servers that have tens of thousands of goroutines. Sifting through those in the hope of finding a stuck goroutine feels backwards. In comparison, something like I think my latest thoughts are well summarized by @bcmills in #37681 (comment). I don't feel like adding to what he said, because it feels like the discussion would continue going in circles. In hindsight, maybe I should have spent more time figuring out what qualifies as a "leak" in Go, before filing a proposal to implement a solution. But I also want to emphasize that this proposal isn't just about leaks. For example, see Josh's #37681 (comment). Unless we are OK with adding more of those to |
Since @mvdan is deferring to my comment, I'll take a stab at a precise statement of what I would like the concrete proposal to be: Define and document a well-known build tag (perhaps
If the runtime changes for such a mode require something that cannot be achieved with just build tags, also add a corresponding Also encourage authors of third-party packages to support the same build tag, with the same meaning, in their own packages. |
See also #6705, particularly #6705 (comment) and #6705 (comment). (However, note that the |
Thanks for the attempt to state the proposal clearly, @bcmills. I think that basically matches our discussion earlier. In particular, as I noted on Apr 29 O(1) leaks that will be GC'ed don't need to be fixed and O(N) leaks show up in profiles. No change in consensus since marking likely decline, so declined. |
Background
Android has had StrictMode for years, and it's very useful to catch bugs at run-time:
Go doesn't have a main thread for UI operations, so many of these sanity checks don't apply. However, a few might make sense in Go, if we can find a way to enable them appropriately.
The closest to a "strict mode" build flag that Go has today is
-race
, which enables the data race detector. As of Go 1.14, the flag also enables extra run-time checks, such as instrumentation to check invalid uses of unsafe pointers.Proposed change
I propose that such "strict mode" checks should be enabled with
-race
. We can start by borrowing its idea to detect when closeable values are leaked without being closed.Go does not have a way to detect if a value whose type implements io.Closer has been closed. Similarly, not all such values get closed automatically when finalized; such finalizers are set on a case-by-case basis, for types like
net.netFD
andos.File
.However, we can easily build on the existing code to catch "leaks" of such types. For example, the current finalizer for
os.File
is set up as follows, where a call toos.File.close
removes the finalizer to avoid closing twice:We could adapt it as follows:
Given that
race.Enabled
is a constant, this should have no extra run-time cost when the race detector isn't enabled. And as said before, since closing a file removes its finalizer, our panic wouldn't be called if a file was properly closed.Once this first check has been added and released successfully as part of a stable Go release, more issues could be filed to add more checks. For example, some potential ideas:
select {}
would be more evident and less error-prone), or when a goroutine gets blocked sending/receiving on a channel and we know that said channel isn't reachable by any other goroutine.Alternatives considered
First, we considered static analysis, such as a
go/analysis
checker using syntax trees and type information. However, that would only catch the most basic cases, as it's impossible to statically know at what point an allocated object would become unreachable and garbage collected. If it were possible, run-time garbage collection wouldn't be needed.Second, we considered an external tool to detect these cases at run-time. Let's briefly cover some options:
os.OpenFile
. However, this solution would require a lot of complex code, and it would be difficult to catch all leaks (such as those originating in external libraries).+build strictmode
. Very similar to the proposed solution, given that the race detector uses+build race
. However, the big advantage of reusing-race
is that it's well understood and widely used already, so we don't need to teach Go users about new flags or build tags they should remember. We assume this is the same reason why-d=checkptr
was added to-race
.The text was updated successfully, but these errors were encountered: