-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: runtime: add a pointer check code to gcWriteBarrier for helping debugging zombie pointers #51276
Comments
Go doesn't have macros, so it's not clear to me exactly what you are suggesting.
Perhaps we could do this with build tags, but I have a feeling that few people would such a feature in practice. CC @golang/runtime |
The first example seems like something that I think that it might be possible for If |
FWIW, in the past for debugging I tried to set the write barrier buffer size to 1, so it will always go through the slow path, which has at least some check e.g. in findObject, and it was helpful. Perhaps we can make a GODEBUG mode that set the the buffer size to 1. |
I think that you can actually get the effect of a buffer size of 1 today by using |
A vet checker looks plausible: it can strip the
Valid cases include:
The check can easily go transitively over the types, e.g. when T2 and T3 are structs with fields of named types. So the checker may narrow down the patterns it will report, e.g. only report the case when T1 = T2, which covers the examples:
and
The frequency of these patterns in real code is a question though. |
I get this point, However the actual problem I concern it's that two examples I gave are just coming from my personal knowledge, I don't know other cases that could trigger this.
Well they are kinda different actually, I think you mean when
I do find out go rarely use macros, however |
and another thing I think maybe should be pointed out is I am not very sure but is it ok to make |
Another "fix" for this problem is to just wait for #50860 to be accepted and implemented. Then we can have people move to using the generic |
Background
Zombie pointer problem is very difficult to debug. Now GoGC will only report this issue in scanning objects, even though Go GC will panic here, information is not enough for debugging (reportZombies will print this span, but won't tell developer how this pointer collected to gcw, developer will have to find out where this memory address comes from.).
Currently there are two ways that could put pointers to gcw:
greyobject
andgcWriteBarrier
(and its cx, dx... derivation).greyobject
already hasdebug.gccheckmark
flag to validate if this pointer is ok, howevergcWriteBarrier
has no corresponding way to do the validation.For example, a misuse
atomic.StorePointer
won't have any errors reported, if developers don't pay attention to the correctness of this value. (Espcially when this field is an ID or some other int64 fields have no real world meaning) code snippetMoreover, if this function is working under heavy load, this wrong assign may leads to a zombie pointer which is hard to debug, link As this panic will only tell developers which
mspan
contains zombie pointer, but no information about where this pointer is allocated.Another case is unsafe type conversion. playground Same
UserInfo
from different RPCs may leads developer to use unsafe.Pointer to elimate new allocation performance loss and avoid writing too long code for mapping field. But after IDL files updated,UserInfoFromARPC
andUserInfoFromBRPC
may be different, and there is a high probability that the newly added fields will be of pointer type, since new fields are often optional.Proposal
Idea in brief
Add a macro-controlled code section in gcWriteBarrier, gcWriteBarrier will try to check pointer validation before store it. If the pointer is illegal, panic here. In general, the panic stack will have the frame of the wrong "allocation" happened function. Developer could use this to find their bugs easier.
Why choose to modify gcWriteBarrier
ZombiePointers will only happened when this wrong pointer has been esacped to heap. If a pointer doesn't escaped to heap, it won't cause zombie pointer at least.
Why choose to using a macro instead of a flag
As
gcWriteBarrier
is a frequently called function, using macro will avoid changing regular code. Only when developers find there is a zombie pointer in their program and it's reproducible, they could compile codes with-+
and this flag to recompile runtime and enable these tracking code, and waiting for panic to find out where the bug is.Cons
PS: should we modify
reportZombies
to make it not panic wheninvalidptr=0
is set?The text was updated successfully, but these errors were encountered: