Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/cgo: specify rules for passing pointers between Go and C #12416

Closed
ianlancetaylor opened this issue Aug 31, 2015 · 72 comments
Closed

cmd/cgo: specify rules for passing pointers between Go and C #12416

ianlancetaylor opened this issue Aug 31, 2015 · 72 comments

Comments

@ianlancetaylor
Copy link
Member

Issues #12116 and #8310 discuss the problem of when it is OK to pass pointers between Go and C code when using cgo. This proposal is a specific set of rules for when it is OK, plus some modifications to cgo to partially enforce the rules.

Definitions: a Go pointer is a pointer to memory allocated by the Go runtime. A C pointer is a pointer to memory not allocated by the Go runtime, such as by C.malloc.

The proposed rules are:

  • Go code may pass a Go pointer to C provided that the Go memory to which it points does not contain any Go pointers. That rule must be preserved during C execution, in that the program must not store any Go pointers into that memory.
  • C code may otherwise use the Go pointer and the Go memory to which it points freely during the call. However, C code may not keep a copy of the Go pointer after the call returns.
  • If Go code passes a Go pointer to a C function, the C function must return. While there are no documented time limits, a C function that simply blocks holding a Go pointer while other goroutines are running may eventually cause the program to run out of memory and fail.
  • A Go function called by C code may not return a Go pointer. A Go function called by C code may take C pointers as arguments, and it may store non-pointer or C pointer data through those pointers, but it may not store a Go pointer into memory pointed to by a C pointer. A Go function called by C code may take a Go pointer as an argument, but it must preserve the property that the Go memory to which it points does not contain any Go pointers.

The proposal is that we declare that all Go programs that adhere to those rules are, and will remain, valid, under the Go 1 compatibility guarantee. Go programs that do not adhere to these rules may break today or in future releases.

We further propose some modifications to cgo that will help enforce these rules. The cgo changes are not the rules; the rules are as above. The cgo changes are intended to make it easier to detect, at build time, programs that violate the rules.

  • We modify cgo to prohibit any type conversion from a Go pointer type to a C pointer type, except as a direct argument to a C function call. Any such type conversion is an error in cgo. Even within a direct function call, we prohibit type conversions if the Go pointer type points to a type that itself contains pointers.
  • A function labelled //export, which may be called from C code, must not return any Go pointer type and must not have any parameters that are Go pointer types.

These cgo rules are more strict than the rules about passing pointers. They will prohibit cases that are permitted. It is also possible to work around them, even without using unsafe, and write code that is not blocked by cgo but that violates the above rules. I will provide examples in a separate design doc.

@swetland
Copy link

This addresses pretty much all of my concerns about being able to build plumbing from Go to native libraries or syscalls. Rule #3 is potentially a little problematic (imagine calling into a USB read syscall for a device that simply does not return data for a loooong time), but can be worked around (periodic timeout and retry, two part operation w/ select + read, etc), without too much suffering.

@griesemer
Copy link
Contributor

Presumably this requires either a) a GC that doesn't move objects; or b) a form or (automatic, via CGO) pinning of Go pointers passed to C.

@dr2chase
Copy link
Contributor

@swetland - I've harassed Ian enough on this issue already, but the workaround for Rule number 3 is to allocate memory with malloc, and let that pointer leak into the Go code instead of a Go pointer leaking into C. The GC will tolerate it, does not need to worry about collecting it, and you can use a finalizer on the Go object encapsulating the state of the USB device to ensure that if the object is not freed by close, then it is freed by the finalizer. This is the idiom I've used in the past, and it's the one I recommend until you have performance problems that would justify trying something else, and my first answer for the first round of performance problems is "we need to improve the inliner" so that encapsulation of the malloc'd buffer is a zero-cost abstraction.

@mdempsky
Copy link
Contributor

@ianlancetaylor For the first rule, can you clarify what "Go code may pass a Go pointer to C provided that the Go memory to which it points does not contain any Go pointers." means and perhaps some further rationale behind this restriction? I understand why C code mustn't mutate Go pointers, but there seems to be a significant gap between those two restrictions.

For example, in the code below, is this rule satisfied or violated by passing a pointer to a pointer-free subregion of a Go struct? If it's violated, do you propose the toolchain should reject this still? If so, how?

// #include <string.h>
import "C"

func fill(x []byte) {
    C.memset(&x[0], 'x', len(x))
}

type N struct {
    Next *N
    Buf [1024]byte
}

func fillN(n *N) {
    fill(n.Buf[:])
}

Also, I assume by "Go pointers" you actually mean to include all pointer-like types (i.e., strings, slices, functions, interfaces, maps, channels)?

@minux
Copy link
Member

minux commented Aug 31, 2015 via email

@akavel
Copy link
Contributor

akavel commented Aug 31, 2015

Thanks for starting the work! That said, some aspects are still not very clear to me -- I'd be very grateful for some clarifications. I'm especially interested from the point of view of the "Windows world", i.e. interacting with WinAPI "syscalls" & some other custom DLLs:

  1. Is this understood to cover the syscall package use cases too? I.e. when somebody does import "syscall", but does not import "C". Please state this explicitly, whether it is covered or not, and to what extent. I'm aware that the underlying mechanisms are the same to some extent (or at least were a few versions ago), but I'm especially not sure if that's assumed "official" or "accidental". I believe it isn't really stated anywhere in the official docs (or is it?). This proposal could be a very good place to clarify this, and I'd be really grateful for that.
  2. (Related to 1.) Specifically, the WinAPI syscalls often take uintptr as an argument. In +/-Go 1.3 it was (AFAIU) semi-officially named as a "non-pointer" (with regards to GC). Am I right to undertand this then falls outside the current wording of the discussed proposal? And thus, back into uncharted territory...? :( Or is uintptr now blessed into pointers family?
  3. (Related to 2.) If uintptr is still not-a-pointer, is there a chance that any rules for "pinning" a pointer could become part of this proposal? Such that they'd enable taking a uintptr of a pointer in some "officially safe" way (hopefully "in Go 1.x where x>=3") and passing it into syscall/C world for some (finite?) time?
    • (I'd be specially grateful if the rules could be stated explicitly also "back in time", starting with Go 1.3, as I believe this knowledge is still not really codified anywhere oficially, and all that can be known to a "common man", not versed in all odds and ends of the compiler & GC, is currently spread between some more or less obscure threads on golang-nuts, golang-dev, and maybe some random entries on the issue tracker...))
  4. (Related to 2., 3.) Also, how about structs containing uintptr?

I'd be really very grateful if answers to those questions could be incorporated into this proposal...

@ianlancetaylor
Copy link
Member Author

@griesemer Yes. If the GC requires notification about pinned pointers, cgo will generate the calls to pin them.

@RLH
Copy link
Contributor

RLH commented Aug 31, 2015

Likewise if the GC needs to know when an object is no longer pinned, cgo
will generate the calls to unpin the object.

On Mon, Aug 31, 2015 at 3:11 PM, Ian Lance Taylor [email protected]
wrote:

@griesemer https://github.com/griesemer Yes. If the GC requires
notification about pinned pointers, cgo will generate the calls to pin them.


Reply to this email directly or view it on GitHub
#12416 (comment).

@ianlancetaylor
Copy link
Member Author

@mdempsky

For the first rule, can you clarify what "Go code may pass a Go pointer to C provided that the Go memory to which it points does not contain any Go pointers." means and perhaps some further rationale behind this restriction? I understand why C code mustn't mutate Go pointers, but there seems to be a significant gap between those two restrictions.

If at some point we have a moving GC, then the GC will need to update all Go pointers to point to their new location. We can work around that for a Go pointer explicitly passed to C code, by temporarily pinning the pointer. However, if the memory to which that Go pointer points itself holds pointers, then we must either pin those pointers too or we must modify them while the C code is running. The former is difficult and causes pinning to become a transitive operation. The latter is racy.

The only way this restriction could be a problem would be if Go code wanted to pass a complex data structure into C code, where the data structure is allocated by Go code in Go memory. It would have to use C types, of course, or the C code wouldn't be able to use it. That is not a case that seems likely to arise often, and in particular it seems often best to allocate these complex data structures using C memory via C.malloc.

So I think it's preferable to use a relatively simple rule here, rather than try to develop a more complex set of restrictions. This proposal aims to find out whether people can live with the rule.

Your code example satisfies the rule. The Go memory to which the pointer points is the [1024]byte.

Also, I assume by "Go pointers" you actually mean to include all pointer-like types (i.e., strings, slices, functions, interfaces, maps, channels)?

Yes.

@ianlancetaylor
Copy link
Member Author

@minux Your example is fine. I tried to clearly define, for purposes of this proposal, a Go pointer as a pointer into memory allocated by the Go allocator. It is a purely dynamic concept that has nothing to do with types.

The cgo part of the proposal has to do with types, but that is not the actual rules about what is permitted.

@ianlancetaylor
Copy link
Member Author

@akavel

Is this understood to cover the syscall package use cases too? I.e. when somebody does import "syscall", but does not import "C". Please state this explicitly, whether it is covered or not, and to what extent. I'm aware that the underlying mechanisms are the same to some extent (or at least were a few versions ago), but I'm especially not sure if that's assumed "official" or "accidental". I believe it isn't really stated anywhere in the official docs (or is it?). This proposal could be a very good place to clarify this, and I'd be really grateful for that.

Clearly the syscall package has to consider the same set of issues, but these rules are intended to apply to cgo. We control all the entry points to the syscall package, so if appropriate we can do something different. I think that for Windows we essentially need to define for each entry point what is permitted.

For example, currently on Unix the function syscall.ForkExec takes a pointer to a syscall.ProcAttr and that structure is permitted to contain Go pointers. This is fine, and it demonstrates that the syscall functions are not required to precisely follow the rules for cgo.

(Related to 1.) Specifically, the WinAPI syscalls often take uintptr as an argument. In +/-Go 1.4 it was (AFAIU) semi-officially named as a "non-pointer". Am I right to undertand this then falls outside the current wording of the discussed proposal? And thus, back into uncharted territory...? :( Or is uintptr now blessed into pointers family?

What matters for this proposal is whether the value is a pointer into Go memory. The actual type of the value does not matter.

(Related to 2.) If uintptr is still not-a-pointer, is there a chance that any rules for "pinning" a pointer could become part of this proposal? Such that they'd enable taking a uintptr of a pointer in some "officially safe" way (hopefully "in Go 1.x where x>=4") and passing it into syscall/C world for some (finite?) time?

One goal of this proposal is to avoid any documented rules for pinning pointers. Sure, this may be implemented by pinning internally, but I don't think we want to make pinning pointers part of the Go runtime API.

(Related to 2., 3.) Also, how about structs containing uintptr?

What matters is the values stored in this uintptr fields.

Update: some of my above comments were incorrect. A Go value with type uintptr is not going to be treated as a Go pointer. While the type of the pointer doesn't matter, it does have to be a pointer.

@minux
Copy link
Member

minux commented Aug 31, 2015

On Mon, Aug 31, 2015 at 3:22 PM, Ian Lance Taylor [email protected]
wrote:

@minux https://github.com/minux Your example is fine. I tried to
clearly define, for purposes of this proposal, a Go pointer as a pointer
into memory allocated by the Go allocator. It is a purely dynamic concept
that has nothing to do with types.

Could we add a dynamic checking to cgo to verify that no pointer points to
Go heap?
That check can be enabled with certain mechanism, for example, a special
build tag,
to ease diagnosis of cgo/GC interaction problems?

@mdempsky
Copy link
Contributor

@ianlancetaylor So it sounds like the rationale then is you want to do direct pinning without transitive pinning, and transitive pinning is necessary if C code is going to access Go pointers in Go memory. Your proposed solution seems to try to use the C type system to prevent C code from accessing Go pointers in Go memory.

For completeness, I see another solution of simply ruling that C code may not access Go pointers in Go memory. I.e., if a *N pointer is passed to C code, the C code may freely read/write Buf, but not Next. Instead, the C code would need to call into a Go helper if it wants to access the pointers. (Analogously, Go code needs to call into C helpers to access unusual C struct members like bit fields.)

@ianlancetaylor
Copy link
Member Author

@minux Yes, we could add that dynamic check. I can't decide how much it would help. To be safe, it would have to consider the type of the argument, so that we don't give an error for *float64 where the float value happens to look like a pointer. But then a Go pointer converted to uintptr will slip through. So it becomes another partial check. It might still be worth doing, of course.

@ianlancetaylor
Copy link
Member Author

@mdempsky I think I would say that we want to put the smallest possible number of restrictions on future garbage collection work. Prohibiting passing any Go pointers into C is too severe and prevents too many reasonable coding patterns. This is the minimal next step.

To be very pedantic, the proposed solution is the set of rules listed in the issue. The cgo work is an attempt to detect violations of those rules at build time.

For completeness, I see another solution of simply ruling that C code may not access Go pointers in Go memory. I.e., if a *N pointer is passed to C code, the C code may freely read/write Buf, but not Next. Instead, the C code would need to call into a Go helper if it wants to access the pointers. (Analogously, Go code needs to call into C helpers to access unusual C struct members like bit fields.)

I don't see any technical problems with this. But it seems easier to get wrong, and doesn't seem any simpler.

@sbinet
Copy link
Member

sbinet commented Aug 31, 2015

IIUC, the rule stating that one should not store pointers to go memory from C code (as the pointed at go value may be a struct holding fields pointing to go memory, would significantly complicate writing (in my case) extension modules in go, like I do in gopy and tracked there: go-python/gopy#58.

(I am not complaining, I am just trying to make sure I correctly understand the proposal)

does this mean that the "only" way to write extension modules (for CPython, C-ruby, C-shiny-interpreter) which usually involves wrapping a go struct inside a C struct for the target VM consumption, is to use a mechanism like golang.org/x/mobile/bind/seq?
ie: in-process RPC, a kind of LPC (local procedure call)?

@ianlancetaylor
Copy link
Member Author

@sbinet Yes, I think you are correct. Sorry.

Serialization is one approach, but it is not the only approach. Another approach is a map[int]interface{}. For each Go value you need to pass to Python, increment a counter to get a token value, and store the Go value in the map at that location. When calling back into Go, pass the token value. The Go side uses the token to look into the map to fetch the value. When you release the Python value, tell the Go side to delete the map entry. This mechanism is probably simplest if the Python code never needs to examine the Go value.

@sbinet
Copy link
Member

sbinet commented Aug 31, 2015

couldn't this be implemented as some kind of C-api on top of reflect ? (or a cgo-exported set of functions gathered in a cgo package?)

@dr2chase
Copy link
Contributor

For each Go value you need to pass to Python, increment a counter to get a token value, and store the Go value in the map at that location. When calling back into Go, pass the token value. The Go side uses the token to look into the map to fetch the value.

Is the map a global? That would imply contention and/or synchronization. We don't have a notion of thread-local/Goroutine-local storage in Go, do we? This would also need to survive a transition from Go to C to Go to make this work.

@minux
Copy link
Member

minux commented Aug 31, 2015

On Mon, Aug 31, 2015 at 6:02 PM, dr2chase [email protected] wrote:

For each Go value you need to pass to Python, increment a counter to get a
token value, and store the Go value in the map at that location. When
calling back into Go, pass the token value. The Go side uses the token to
look into the map to fetch the value.

Is the map a global? That would imply contention and/or synchronization.
We don't have a notion of thread-local/Goroutine-local storage in Go, do
we? This would also need to survive a transition from Go to C to Go to make
this work.

The map will be global. And no, we don't have thread-local/goroutine-local
storage,
intentionally. I don't think the contention on the map will be relevant for
this particular
case as the Python interpreter is pretty much single threaded due to its
GIL.

@ianlancetaylor
Copy link
Member Author

@dr2chase @minux Right, the map would be a global variable. Besides Minux's point about Python, contention on the map is going to be negligible compared to the overhead of calling between C and Go. In any case, it's only a suggestion. Other approaches are possible.

@ianlancetaylor
Copy link
Member Author

@sbinet

couldn't this be implemented as some kind of C-api on top of reflect ? (or a cgo-exported set of functions gathered in a cgo package?)

I'm sorry, I don't understand what you are asking.

For an example of the map approach, see https://github.com/swig/swig/blob/master/Lib/go/goruntime.swg#L408 .

@ianlancetaylor
Copy link
Member Author

Design doc at https://go-review.googlesource.com/14112 . It incorporates some of the above discussion.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/14112 mentions this issue.

@akavel
Copy link
Contributor

akavel commented Sep 1, 2015

Thanks for some clarifications, and mention of syscall in the doc.

Clearly the syscall package has to consider the same set of issues, but these rules are intended to apply to cgo. We control all the entry points to the syscall package, so if appropriate we can do something different. I think that for Windows we essentially need to define for each entry point what is permitted.

For example, currently on Unix the function syscall.ForkExec takes a pointer to a syscall.ProcAttr and that structure is permitted to contain Go pointers. This is fine, and it demonstrates that the syscall functions are not required to precisely follow the rules for cgo.

Would those per-entry-point "detailed" features be available to third-party library/package writers? I believe there's significant value in being able to call WinAPI functions from pure Go code, which then can be just "go get", without need for a whole MinGW toolchain installed and well-configured (and necessarily from some specific, this-month-fashionable fork). And I believe there are third-party libraries making use of that, and not only on Windows but I believe on Linux too. Not to mention golang.org/x/sys, I suppose?

And, right, WinAPI calls do love to receive pointers to structs with pointers (including ASCIIZ strings), that's quite true.

What matters for this proposal is whether the value is a pointer into Go memory. The actual type of the value does not matter.
(...)
What matters is the values stored in this uintptr fields.

I'm not sure I understand still. In Go ~1.4 the boundary between what to GC was a "pointer" vs. "not a pointer, just an int, even if same bit pattern" seemed to be placed quite explicitly between unsafe.Pointer and uintptr. From the above I seem to understand this may possibly change? That said, from other comments I'm starting to understand the specifics here are maybe expected to be fleshed out later (or do I misunderstand?)

@ianlancetaylor
Copy link
Member Author

@akavel

Would those per-entry-point "detailed" features be available to third-party library/package writers? I believe there's significant value in being able to call WinAPI functions from pure Go code, which then can be just "go get", without need for a whole MinGW toolchain installed and well-configured (and necessarily from some specific, this-month-fashionable fork). And I believe there are third-party libraries making use of that, and not only on Windows but I believe on Linux too. Not to mention golang.org/x/sys, I suppose?

I don't know. The syscall and x/sys packages can provide functions to handle structs with pointers, doing things like copying the pointers into non-garbage-collected memory. I think it would be fine to make something like that generally available. I think we have to be careful about what we make generally available, to avoid overly constraining future GC work.

I'm not sure I understand still. In Go ~1.4 the boundary between what to GC was a "pointer" vs. "not a pointer, just an int, even if same bit pattern" seemed to be placed quite explicitly between unsafe.Pointer and uintptr. From the above I seem to understand this may possibly change? That said, from other comments I'm starting to understand the specifics here are maybe expected to be fleshed out later (or do I misunderstand?)

There is not any intent to change the garbage collector behaviour. This proposal is about the very specific problem of how to safely pass a Go pointer to C code. You're right: I was wrong about the types. A value of type uintptr is not a pointer. You are already in potential trouble if you convert a Go pointer to uintptr. This proposal won't make that worse or better. The type of a Go pointer doesn't matter, but it does have to be some sort of pointer type.

@idavydov
Copy link

idavydov commented May 10, 2016

Is there a working example of passing callback function into a C code? It seems that example from the wiki isn't working anymore.
https://github.com/golang/go/wiki/cgo#function-variables

@ianlancetaylor
Copy link
Member Author

@idavydov please see https://golang.org/wiki/Questions

@mattn
Copy link
Member

mattn commented May 11, 2016

@idavydov is this your help? https://github.com/mattn/go-pointer/blob/master/_example/main.go

cention-sany added a commit to cention-nazri/go-deps that referenced this issue Sep 22, 2016
This reverts commit e4d361c.
The above commit is not working properly as it will convert wrongly
characters. Seem to be caused by improper use pointer because iconv
expecting pointer to pointer which it seem to allocate the memory
internally. This allocated memory is not being passed out to
golang side as the converted characters is not visible to golang side.
This is a quick fix before proper inspection should be done on iconv
pkg. However, without that commit mean that this pkg can not support
go1.6 and above as document here:
golang/go#12416
paultag added a commit to paultag/go-webkit2 that referenced this issue Oct 7, 2016
Discussion at:

    golang/go#12416
    https://github.com/golang/proposal/blob/master/design/12416-cgo-pointers.md

Issues with the Go memory management code were hacked around by adding a
new check that ensured that pointers to memory managed by Go are never
passed to cgo code, since one can not be sure how and where pointers to
Go memory are being stored in the C code.

As a result, the memory must be manually malloc'd and free'd by cgo,
even if it's a Go object in the C memory. The userdata fix makes this
change explicit, by running a malloc and free using cgo, to explicitly
control the memory management.
dsnet added a commit to dsnet/compress that referenced this issue Dec 23, 2016
Re-implement the cgo wrappers for flate, bzip2, and brotli to properly
follow the C-to-Go pointer rules outlined in (golang/go#12416).
Also, move the wrappers to seperate packages and update the bench
tool to use the new wrappers.
@golang golang locked and limited conversation to collaborators May 11, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests