-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: a.foo() syntax requires foo's first argument be a pointer #13249
Comments
there's plenty of valid uses to not use |
I was hoping that would work as well, but there are cases where you aren't returning anything, just calling functions on internal state. For example, in the std lib priority_queue.zig: pub fn deinit(self: Self) void {
self.allocator.free(self.items);
} This only works because self.allocator only contains pointers, and self.items is a slice which also only contains pointers, so even though it's technically deinit-ing a copy of self it just happens to work. |
if it made a difference, then in which case youd get a panic/trace and know exactly how to fix it |
This wouldn't prevent this case:
I think this type of issue is better solved by something like #7769, which doesn't impact existing ergonomics. |
@nektro I agree that in this case it happens to work, but this is not a theoretical issue. My point with that example is that the intention conveyed when you write This causes real bugs where internal state is being manipulated mistakenly on a copy. |
@InKryption Yes this proposal doesn't affect calling functions without dot syntax. I'm less concerned about that because it's less obvious that you mean to call a function on a specific object (at least to me). I haven't seen a bug myself related to that. I think #7769 would help in some regards, but there are structs with internal state (say a fixed array buffer) that don't have any internal pointers, so pinning is somewhat orthogonal to this issue. |
This fixes some rendering problems I encountered using zig stage2. See ziglang/zig#13244 which was me trying to figure out the bug. Also ziglang/zig#13249 which has more explanation.
This fixes some rendering problems I encountered using zig stage2. See ziglang/zig#13244 which was me trying to figure out the bug. Also ziglang/zig#13249 which has more explanation.
I think it's an interesting point to consider that Zig's parameter semantics give more responsibilities to programmers than other languages do. While lifetime nuances are always an additional detail to consider/ worry about, and I've written bugs like this myself, I currently wouldn't lean towards restricting code of this shape. I could see a potential path restricting one method invocation syntax, if we introduce a second one for the other remaining use cases. I would like to point out though that there are many useful instances (obviously non-modifying) of methods that take the first ("self"-) argument by value. |
@rohlem Thanks for the feedback! I don't understand the workarounds argument. Can you give an example of where someone would need to work around getting this error? I can't think of any time a person would want to temporarily copy obj during I agree that there are many examples of code that uses self by-val that work fine. I think changing them to specifically call by const pointer is not a large burden. Do you disagree? If so, how about a slightly weaker proposal where instead of a compiler error, the compiler is just forced to always pass obj by-pointer when called with dot syntax? Maybe that will help narrow down where we disagree. |
@david-vanderson My response ended up very long, sorry about that. First a motivating example for status-quo:const Minion = struct{ pub fn render(self: A, renderer: *Renderer) !void {
try renderer.render(...); // directly renders the Minion to the screen
} };
const Mountain = struct{ pub fn render(self: *const B, renderer: *Renderer) !void {
try renderer.delayedRenderList.enqueue(self); // enqueues the Mountain to be rendered later
} }; Here we can document the fact that We also document the fact that Why do this?As authors of the code, we provide this information both to:
Now if we forbid the dot-syntax for calling
Now for direct responses:
Your initial proposal forbids dot-syntax for method calls for methods with a non-pointer first ("self"-) argument.
To me a value type is helpful documentation that the address (and lifetime outside the function scope) of the object are not needed.
For me, seeing a pointer argument is a strong indication that it's relevant to understand the lifetime of the pointee. If everything I wrote so far made sense maybe it's superfluous, but as for your last point:
This would mean The If answered yes, then the goal were actually to add maximum friction within the bounds of simple-to-understand language semantics. |
Another way to make it harder to do this by accident is by forbidding automatic coercion from a |
@rohlem This is good, because I was thinking of the very same example with a different outcome. We start with this code working: const Minion = struct{ pub fn render(self: Minion, renderer: *Renderer) !void {
try renderer.render(&self); // Renders immediately
} }; This works. It gets committed. Tomorrow someone modifies it to: const Minion = struct{ pub fn render(self: Minion, renderer: *Renderer) !void {
try renderer.renderDelayed(&self); // Renders later
} }; And it works! Maybe. Most of the time. Depending on the whims of the zig compiler's choice. And generally we don't get a crash, just random memory corruption. Very hard to debug. Do you agree with this line of reasoning? My theory is that the first version gets written because of a hidden assumption that calling |
@leroycep That resolves the ambiguity but I think in the opposite direction than most people expect. |
@david-vanderson Not sure if I missed something, but in my eyes your last example doesn't really motivate your proposal. In your example, at step (1) it is semantically sound to call the method on a temporary or copy of the "originally intended" object. Let's say it is "address-agnostic" for short. With status-quo, the error that happens in step (2) is that the function is no longer address-agnostic, AND the interface wasn't updated accordingly. In fact, let's consider the scenario where the change in step (2) was written correctly:
Note: Intentional points of friction such as these, present throughout Zig's language design, are tangible benefits (even if just minor) of the parameter type changing along with the change in lifetime semantics. From my perspective, the fact that the interface would be required to change in tandem with the semantics, is a benefit of initially writing the value type in step (1), which would be demotivated by your proposal "downgrading" functions with value-type parameters. For the erroneous code of step (2) to actually result in a runtime error (such as "random memory corruption"), two things can happen:
Keep in mind that it's a valid opinion to want to trigger errors from buggy code, so that those bugs will be discovered and fixed earlier than if we didn't. If we force Also note that runtime errors of scenario (B), with your proposal in place, will still only manifest at step (2) and not at step (1), because previously invoking the method on a copy of the object was semantically sound.
Do you mean the second version? The first version, as far as I can tell, seems free of bugs. I think there may be value in making changes in lifetime semantics more prominent/explicit, which would make bugs easier to spot. P.S.:Forgive me if I'm making an incorrect assumption, but if this particular issue personally interests you, I would highly recommend you to check out Rust if you haven't yet, a language that has been set on formalizing and modelling object lifetimes to the full extent possible/feasible. |
A friend pointed out correctly that this reply itself is ambiguous, so let me try again. const Object = struct {
pub fn foo(self: Object) void {}
pub fn bar(self: *const Object) void {}
};
var o = Object{};
o.foo(); // o is either copied (I think unexpectedly), or coerced to *const Object (compiler's choice)
o.bar(); // o is coerced to *const Object If I understand, you are suggesting removing both coercions to *const Object. This proposal assumes that people expect the coercion to *const Object more than expecting the copy. That's why I wrote "opposite direction". Let me know if this is still unclear. |
At first that suggestion sounded too simplistic, because I'd want to be able to call mutating functions on declared example code:const S = struct{
fn mutate(s: *S) void {_ = s;}
fn readLater(s: *const S) void {_ = s;}
};
fn example(mut_s: *S, arg_const_s: *const S, value_s: S) void {
var var_s = S{};
const const_s = S{};
var_s.mutate(); //allowed
mut_s.mutate(); //allowed
//type system already disallows const_s.mutate(), arg_const_s.mutate() and value_s.mutate()
var_s.readLater(); //allowed
const_s.readLater(); //allowed
mut_s.readLater(); //allowed
arg_const_s.readLater(); //allowed
value_s.readLater(); // NEW ERROR: taking address of contextually declared object in method invocation, use explicit syntax (&value_s)
(&value_s).readLater(); // EXPLICIT syntax, indicating that "we thought about this"
}
Indeed, this sort of opposes the original proposal posted, but I'd be inclined to support this as a separate change proposal. |
@rohlem Definitely helping, I appreciate your taking the time to dig into this. I think our first point of disagreement is when the bug was introduced. My claim is the bug was present in step 1, and the program only worked accidentally. This is my experience with real bugs where the code never even made it to step 2. I'm trying to figure out a way to prevent the bug in step 1. I'm not suggesting changing the semantics of function calls in general, because it seems that programmers rarely confuse call by-ref and by-val using normal function call syntax. I think the heart of the bug is you look at the call site That's why I think of this as a foot-gun instead of a normal bug. You write code that works (by accident), and then starts failing at some later date. |
@david-vanderson Ah, apologies, I initially misread your example; I assumed step (1) would call the function to pass only the object's value (as I'd intended in my initial example), as But even then, the first version only has a bug if the renderer assumes a longer lifetime of its argument's pointee, right? In my opinion, if there is a lifetime bug in either step, then passing the object by value helps reveal this lifetime bug, while passing it by pointer just hides it.
While it's possible to read the code that way, in the provided example the bug in step (1) - assuming additional object lifetime is necessary - would be with the function signature, not the call site.
I agree that non-reproducible, spuriously-manifesting bugs are more annoying to find. Ideally, the compiler will include safety-checks that make code with errors like this fail reliably, every time, maybe even with a hard crash and stack trace. |
I don't yet have an opinion about the best approach, but I'm grateful to see the discussion about the foot-gun. It's an especially nasty one:
|
It feels to me like the root of the issue is the implicit object-to-pointer coercion. Zig is generally so skeptical of implicit magic that the coercion feels out of character here. Restricting the coercion feels like addressing the root of the issue, which I find appealing at a vague aesthetic level. As a point of reference, in C the difference is syntactically explicit: |
Yes, which means that it's easy for the bug to stay hidden until later when someone changes the renderer to defer things. The person doing that change could easily reason that the objects don't move due to the way they are allocated (and miss the hidden foot-gun copy). Then things continue to work for some time, and then later break on upgrading to a newer zig. (This is almost the exact situation where I first encountered this issue)
I think either position is defensible and it comes down to a practical question of what people expect when they use dot syntax method calling.
I agree in an ideal world, but I'm not sure this is possible or practical for zig. This proposal is my attempt at a minimal practical change to improve the situation. |
I hacked the zig compiler locally to make this an error. The error reporting stuff in the zig compiler is really good! After almost 1000 fixups to the standard library I gave up before getting Specific Notes:
I didn't find a smoking gun kind of bug so far. Changing the first argument to a pointer caused a lot of changes in the crypto stuff which made me worried. Also functions on enums look weirder. All around the changes were more substantial than I anticipated. I'll try a compiler change where instead of an error we just mandate passing by pointer when using dot syntax. |
rather than forcing |
I must be missing something. Isn't that how the compiler works today? It gets to choose whether to pass |
I used my hacked compiler on a bunch of zig projects (the ones listed here: #89). I also did not find any smoking gun bugs there. They had things similar to what I found in the std lib. I was wrong about how often this bug shows up in current code. It seems either rarer to make, or faster/easier to identify and fix, which is contrary to my own experience, so maybe I'm the outlier. #2646 will help cover some of the cases here like returning a slice to an internal buffer. So I'll close this issue. Thank you so much to everyone who commented and thought through this with me! |
When calling
obj.foo()
we expect foo to operate on obj, not on a copy of obj. I propose that calling a function using theobj.foo()
syntax be a compile error if foo's first argument is not a pointer.Message could be something like
error: calling function with dot syntax requires first argument of function to be pointer
Rationale
Any member function that accepts the struct by-val as the first argument is a foot-gun:
You want this instead:
I've made this mistake myself, seen it in zig libs, and now even many places in the zig std lib. I'll send PRs for the ones I've found.
Motivation
I'm using TinyVG to render icons in a gui library. When switching zig to stage2, a few of the icons started rendering nonsense. Eventually I submitted a repo of what I thought was a stage2 bug as #13244.
@nektro and @Vexu quickly pointed out the real problem was accidentally passing a copy of self, which stage2 does more often (at least for now?).
This is a place where code can be working (due to the compiler choosing to pass a const pointer) and then break on a minor compiler upgrade or struct change (which causes the compiler to switch to passing a copy).
This is also hard to figure out because passing a copy often works by accident. If the struct contains only pointers and other copyable fields (Allocator?) then operating on a copy can either work or appear to work.
I'll try to implement this and see how it goes.
Downsides
If you have a small copyable struct like a 2d Point, you have to choose either:
pub fn add(a: Point, b: Point) Point {}
and call it likePoint.add(a, b)
orpub fn add(a: *const Point, b: Point) Point {}
and call it likea.add(b) or Point.add(&a, b)
The text was updated successfully, but these errors were encountered: