Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Make function definitions expressions #1717

Closed
hryx opened this issue Nov 13, 2018 · 108 comments
Closed

RFC: Make function definitions expressions #1717

hryx opened this issue Nov 13, 2018 · 108 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@hryx
Copy link
Contributor

hryx commented Nov 13, 2018

Overview

This is a proposal based on #1048 (thank you to everyone discussing in that thread). I opened this because I believe that conversation contains important ideas but addresses too many features at once.

Goals

  • Provide syntactic consistency among all statements which bind something to an identifier
  • Provide syntactic foundation for a few features: functions-in-functions (#229), passing anonymous funtions as arguments (#1048)

Non-goals

  • Closures

Motivation

Almost all statements which assign a type or value to an identifier use the same syntax. Taken from today's grammar (omitting a few decorations like align for brevity):

VariableDeclaration = ("var" | "const") Symbol option(":" TypeExpr) "=" Expression

The only construct which breaks this format is a function definition. It could be argued that a normal function definition consists of:

  1. an address where the function instructions begin;
  2. the type information (signature, calling convention) of the function;
  3. a symbol binding the above to a constant or variable.

Ideally, number 3 could be decoupled from the other two.

Proposal

Make the following true:

  1. A function definition is an expression
  2. All functions are anonymous
  3. Binding a function to a name is accomplished with assignment syntax
const f = fn(a: i32) bool {
    return (a < 4);
};

Roughly speaking, assigning a function to a const would equate to existing behavior, while assigning to a var would equate to assigning a function pointer.

Benefits

  • Consistency. There is alignment with the fact that aggregate types are also anonymous.
  • Syntactically, this paves the way for passing anonymous functions as arguments to other functions.
  • I have a suspision that this will make things simpler for the parser, but I'd love to have that confirmed/debunked by someone who actually knows (hint: not me).
  • Slightly shrinks the grammar surface area:
- TopLevelDecl = option("pub") (FnDef | ExternDecl | GlobalVarDecl | UseDecl)
+ TopLevelDecl = option("pub") (ExternDecl | GlobalVarDecl | UseDecl)

Examples

The main function follows the same rule.

pub const main = fn() void {
    @import("std").debug.warn("hello\n");
};

The extern qualifier still goes before fn because it qualifies the function definition, but pub still goes before the identifier because it qualifies the visibility of the top level declaration.

const puts = extern fn([*]const u8) void;

pub const main = fn() void {
    puts(c"I'm a grapefruit");
};

Functions as the resulting expressions of branching constructs. As with other instances of peer type resolution, each result expression would need to implicitly castable to the same type.

var f = if (condition) fn(x: i32) bool {
    return (x < 4);
} else fn(x: i32) bool {
    return (x == 54);
};

// Type of `g` resolves to `?fn() !void`
var g = switch (condition) {
    12...24 => fn() !void {},
    54      => fn() !void { return error.Unlucky; },
    else    => null,
};

Defining methods of a struct. Now there is more visual consistency in a struct definition: comma-separated lines show the struct members, while semicolon-terminated statements define the types, values, and methods "namespaced" to the struct.

pub const Allocator = struct.{
    allocFn:   fn(self: *Allocator, byte_count: usize, alignment: u29) Error![]u8,
    reallocFn: fn(self: *Allocator, old_mem: []u8, new_byte_count: usize, alignment: u29) Error![]u8,
    freeFn:    fn(self: *Allocator, old_mem: []u8) void,
    
    pub const Error = error.{OutOfMemory};

    pub const alloc = fn(self: *Allocator, comptime T: type, n: usize) ![]T {
        return self.alignedAlloc(T, @alignOf(T), n);
    };

    // ...
};

Advanced mode, and possibly out of scope.

Calling an anonymous function directly.

defer fn() void {
    std.debug.warn(
        \\Keep it down, I'm disguised as Go.
        \\I wonder if anonymous functions would provide
        \\benefits to asynchronous programming?
    );
}();

Passing an anonymous function as an argument.

const SortFn = fn(a: var, b: var) bool; // Name the type for legibility

pub const sort = fn(comptime T: type, arr: []T, f: SortFn) {
    // ...
};

pub const main = fn() void {
    var letters = []u8.{'g', 'e', 'r', 'm', 'a', 'n', 'i', 'u', 'm'};

    sort(u8, letters, fn(a: u8, b: u8) bool {
        return a < b;
    });
};

What it would look like to define a function in a function.

pub const main = fn() void {
    const incr = fn(x: i32) i32 {
        return x + 1;
    };

    warn("woah {}\n", incr(4));
};

Questions

Extern?

The use of extern above doesn't seem quite right, because the FnProto evaluates to a type:

extern puts = fn([*]const u8) void;
              --------------------
                 this is a type

Maybe it's ok in the context of extern declaration, though. Or maybe it should look like something else instead:

extern puts: fn([*]const u8) void = undefined;

Where does the anonymous function's code get put?

I think this is more or less the same issue being discussed in #229.

Counterarguments

  • Instructions and data are fundamentally separated as far as both the programmer and the CPU are concerned. Because of this conceptual separation, a unique syntax for function body declaration is justifiable.
  • Status quo is perfectly usable and looks familiar to those who use C.
@bheads
Copy link

bheads commented Nov 13, 2018

I love the idea overall, but wonder about the syntax a little. Defining the function and the function type is a little too close:

const A = fn(i32) void;
const B = fn(x: i32) void {};
var C: A = B;

@Hejsil just redid the stage 1 parse and probably could say if this can be parsed correctly.

@emekoi
Copy link
Contributor

emekoi commented Nov 13, 2018

given we have syntactic sugar already in the form of optional_pointer.? would it be possible to make pub fn foo() void {} syntactic sugar for pub const foo = fn() void {};?

@Hejsil
Copy link
Contributor

Hejsil commented Nov 13, 2018

@bheads Parsing fn defs and fn photos uses the same grammatical rules already, so this proposal doesn't make a difference in how similar these constructs will be.

@emekoi Given that Zig values "only one way", probably not. Pretty sure .? exists as asserting for not null is very common when calling into C. We also don't have .! (syntactic sugar for catch unreachable).

@emekoi
Copy link
Contributor

emekoi commented Nov 13, 2018

@Hejsil according to this, optional_pointer.? was, and still is, syntactic sugar for optional_pointer orelse unreachable.

@Hejsil
Copy link
Contributor

Hejsil commented Nov 13, 2018

@emekoi I know. We give syntatic sugar when it really affects the readability to not have it. Things like try is a good example. ((a orelse unreachable).b orelse unreachable).c is a lot worse than a.?.b.?.c so we give syntactic sugar here. I don't think there is really a value in keeping the old fn syntax if we're gonna accept this proposal.

@andrewrk andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Nov 13, 2018
@andrewrk andrewrk added this to the 0.5.0 milestone Nov 13, 2018
@rohlem
Copy link
Contributor

rohlem commented Nov 13, 2018

@bheads To me the syntax seems consistent in that curly braces after a type instantiate that type.
The only missing step towards full consistency would be parameter names. The argument list of a function introduces those variables into the function's scope.

const A = struct {a: i32}; //type expression/definition
const a = A {.a = 5}; //instantiation
const F = fn(a: i32) void; //type expression/definition
const f = F { return; }; //instantiation

When instantiating a function type (F above), I would think the parameters to be exposed via the names used in the function type definition/expression. While that might decouple their declaration from their usage, it's similar to struct definitions assigning names to their members.
Alternatively, if that seems too strange, I could see a builtin of the form @functionArg(index: comptime_int) T (or possibly @functionArgs() [...] returning a tuple (#208) / anonymous struct) to serve niche/"library" use cases.

@hryx
Copy link
Contributor Author

hryx commented Nov 13, 2018

@rohlem I've contemplated that "define/instantiate a function of named type F" idea before, but it breaks down quickly for a few reasons:

  1. The parameter names are not part of the actual function type. This is fine and even useful in some cases, I think.
  2. Imagine if you wanted to write a function that implemented function type F as specified by some other library author, but you had to use the param names that that author chose. That would cause problems, including the fact that in Zig you can't shadow or otherwise repurpose any identifiers which are currently in scope. (So if this imaginary F takes a x: i32, you'd better not already have an x in scope). In Zig, you always get to choose your var identifiers, even for imported stlib packages.
  3. Making it possible to define the body of a function without having the parameter names/types and return type visible immediately above that body would be very harmful to readability and comprehension. Not just in 6 months, but now while you are currently writing the function. Unfortunately, a @functionArg(...) builtin wouldn't help there.

I agree that level of consistency is cool and enticing, but I think in this case it clearly works against Zig's goals.

@rohlem
Copy link
Contributor

rohlem commented Nov 13, 2018

@hryx For the record, I overall agree with your stances.

  1. I agree that two function types (fn(a: i2) void) and (fn(b: i2) void) should compare equal. I think it would be possible to have the names as extra data in their type object anyway, which would require a couple of workarounds in f.e. comptime caching though, so it's not ideal.
  2. Imagine the same with a struct retrieved from a @cImport call. Status quo Zig does not (yet) feature struct member renaming (EDIT: as in aliasing), though I'd be all for a proposal akin to that idea, which could then equally apply to function types. (Defining your own struct with different names will _probably work if handled carefully, but it's not 100% waterproof.) (EDIT: Now I see, I guess a function scope variable is different from a member name from a language perspective, so "shadowing" applies only to the former.)
  3. I agree that it harms readability, but in code that instantiates a generic function type you're already reasonably decoupled from the concrete type. While copying around the function head worked well enough up until now, I don't think there's a suitable replacement for defining a function instance like f.e. callbackType{trigger_update(); return @functionArg(0);} (EDIT: with callbackType being variable, coming f.e. from a comptime type argument). . I think this would be the closest alternative and Zig-iest syntax for instantiating function types.
  4. The biggest argument I currently see against it would be the fact that the value of a type T in T { } now dictates how to parse the instantiation (member list vs function code), which moves us further away from context-free grammar.

Either way, just adding to the discussion. Sorry for hijacking the thread, I definitely don't think the details about decoupling parameters should stand in the way of the original proposal.

@raulgrell
Copy link
Contributor

I agree with @hryx:

Defining the function and the function type is a little too close

We could approximate the switch case syntax and do something like, which opens the door for function expressions:

const A = fn(i32) void;
const B = fn(x: i32) void => { block; };
const X = fn(x: i32) u8 => expression;
var C: A = B;

@bheads
Copy link

bheads commented Nov 15, 2018

@raulgrell That would also solve the ambiguity with braces in the return type.

@raulgrell
Copy link
Contributor

@bheads yep, I think it came up in the discussion. The only weird case I could come up with, from @hryx's post:

var g = switch (condition) {
    13   => fn() !void => error.Unlucky,
    else => null,
};

@williamcol3
Copy link

williamcol3 commented Dec 4, 2018

What if instead if the fat arrow (=>) we instead use the placeholder syntax of while and for loops.

This allows the separation of parameter names from the type specification.

Examples:

// Typical declaration
const add = fn(i32, i32)i32 |a, b| {return a + b;};

// Usable inline
const sorted = std.sort(i32, .{3, 2, 4}, fn(i32,i32)bool |lhs, rhs| {return lhs >= rhs;});

// With a predefined type.
const AddFnType = fn(i32,i32)i32;
const otherAdd = AddFnType |a, b| {return a + b;};

Additionally, in line with #585, we could infer the type of the function declaration when obvious

// Type is inferred from the argument spec of sort
// However, the function type is created from the type parameter given
// earlier in the parameters, so I'm not sure how feasible this is
const sorted = std.sort(i32, .{3, 2, 4},  .|lhs, rhs| {return lhs >= rhs;});

We could even make the definition of the function take any expression, not just a block expression, but that may be taking it too far.

I think there is a lot of potential in this feature to provide inline function definition clarity without a lot of cognitive overhead.

(Please forgive any formatting faux pas, this was typed on mobile. I'll fix them later.)

@ghost
Copy link

ghost commented May 15, 2019

The following is already possible (version 0.4):

const functionFromOtherFile = @import("otherfile.zig").otherFunction;
_ = functionFromOtherFile(0.33);

I prefer the "standard" way of defining functions as it is more visually pleasing to me, but I don't see any real problems with this proposal either.

@andrewrk andrewrk added the accepted This proposal is planned. label Jul 4, 2019
@andrewrk andrewrk modified the milestones: 0.5.0, 0.6.0 Jul 4, 2019
@andrewrk
Copy link
Member

andrewrk commented Jul 4, 2019

This is now accepted.

@williamcol3 interesting idea, but I'm going to stick to @hryx's original proposal. Feel free to make a case for your proposed syntax in a separate issue.

The path forward is:

  1. Update the parsers to accept both.
  2. Update zig fmt to update the syntax to the new canonical way.
  3. Wait until the release cycle is done, and release a version of zig.
  4. Delete the deprecated syntax from the parsers.

Extern can be its own syntax, or it can be downgraded to builtin function, which might actually help #1917.

@FireFox317
Copy link
Contributor

Wasn't a goal of Zig to say close to the syntax of C? I would say with this change, there is quite a bit difference compared to C. This would make the step for current C developers to move to Zig way bigger.

However, the change makes sense in the current expression system of Zig and I like it, but I think that this is one extra step to overcome for C developers moving to Zig.

@ikskuh
Copy link
Contributor

ikskuh commented Jul 4, 2019

Extern can be its own syntax, or it can be downgraded to builtin function, which might actually help

Extern functions could just be variables with a function type, but no content:

// puts is a function value with the given function type
extern const puts :  fn([*]const u8) void;

// main is a function with the implicit type
const main = fn() {
    puts("Hello, World!\n");
};

// foo is a function of type `fn()`
const foo : fn() = fn() {
    puts("called foo\n");
};

For me this seems logical if we treat functions as values, we can also declare those values extern => consistent syntax for declaration of extern or internal functions

@daurnimator
Copy link
Contributor

// Usable inline
const sorted = std.sort(i32, .{3, 2, 4}, fn(i32,i32)bool |lhs, rhs| {return lhs >= rhs;});

the type here could be inferred (similar to enum literals), making it:

const sorted = std.sort(i32, .{3, 2, 4}, |lhs, rhs| {return lhs >= rhs;});

Which isn't a bad "short function syntax" at all.... @williamcol3 please do make another issue for your proposal.

@c-cube
Copy link

c-cube commented Jul 24, 2019

Could the function passed to sort be comptime, so that specialization (and inlining) can occur for each distinct function that is passed?

@SamTebbs33
Copy link
Contributor

I just noticed that this proposal has been accepted and thought I'd throw my two cents in. I don't see a way of applying the extern keyword to the fucntion definition, as extern requires that something has a name, but with this proposal function definitions would be anonymous and only the const/var they are assigned to would have a name. This would also be consistent with how extern is applied to the declaration (the pub const bit) rather than the definition/assignment of variables and types.

@andrewrk
Copy link
Member

andrewrk commented Jul 9, 2023

Allow me to address the two goals of this proposal:

Provide syntactic consistency among all statements which bind something to an identifier

This proposal does an excellent job at accomplishing this goal, and is the reason I originally accepted it. However, I will now make an argument that there is a good reason for there to not be syntax consistency between function declarations and other declarations.

Ultimately, Zig code will output to an object file format, an executable binary, or an intermediate format such as LLVM IR or C code that is ultimately destined for such a place. In those places, functions have symbol names. These symbol names show up in stack traces, performance measurements, debugging tools, and various other things. In other words, functions are not unnamed. This is different from constants and types, which may exist ephemerally and be unnamed.

So, I think the syntax inconsistency appropriately models reality, making Zig a better abstraction over the artifacts that it produces.

Provide syntactic foundation for a few features: functions-in-functions (#229), passing anonymous funtions as arguments (#1048)

I have rejected both of these proposals. In Zig, using functions as lambdas is generally discouraged. It interferes with shadowing of locals, and introduces more function pointer chasing into the Function Call Graph of the compiler. Avoiding function pointers in the FCG is good for all Ahead Of Time compiled programming languages, but it is particularly important to zig for async functions and for computing stack upper bound usage for avoiding stack overflow. In particular, on embedded devices, it can be valuable to have no function pointer chasing whatsoever, allowing the stack upper bound to be statically computed by the compiler. Since one of the main goals of Zig is code reusability, it is important to encourage zig programmers to generally avoid virtual function calls. Not having anonymous function body expressions is one way to sprinkle a little bit of friction in an important place.

Finally, I personally despise the functional programming style that uses lambdas everywhere. I find it very difficult to read and maintain code that makes heavy use of inversion of control flow. By not accepting this proposal, Zig will continue to encourage programmers to stick to an imperative programming style, using for loops and iterators.

@andrewrk andrewrk closed this as not planned Won't fix, can't repro, duplicate, stale Jul 9, 2023
@andrewrk andrewrk modified the milestones: 0.12.0, 0.11.0 Jul 9, 2023
@deflock
Copy link

deflock commented Jul 10, 2023

Maybe I had to re-read this proposal first, but regarding function names there may be mixed declarations for functions:

pub const sort = fn(comptime T: type, arr: []T, f: SortFn) {
    // Anonymous, auto-generate name if exported (e.g. some_module_sort_auto_blabla_1)
    // Bad for reproducible builds, but could be generated in a predictable way
};
pub const temp_sort = fn sort(comptime T: type, arr: []T, f: SortFn) {
    // Named, exported as `sort`, if you need a cute name somewhere
};
pub fn main() void {
    // Named, exported as `main`
};

Just some spontaneous 2¢-thoughts after reading the first reason.
I can't say anything about the second statement though.

@rofrol
Copy link
Contributor

rofrol commented Jul 10, 2023

Is passing compareFn to binarySearch an inversion of control flow?

@ziglang ziglang deleted a comment Jul 11, 2023
@ziglang ziglang deleted a comment from aretrace Jul 11, 2023
@mgord9518
Copy link
Contributor

Forgive me if I don't completely understand how symbol exporting works, but why can't something like this be done/ would be a bad idea?

// This will be exported to `file.zig.main` assuming the source file is named `file.zig`
const main = fn() !void {

    // This will be exported to `file.zig.main.addOne`
    const addOne = fn(x: i32) i32 {
        return x + 1;
    };

    // Extern disables all mangling, it's your job to ensure no symbol collisions. This will be exported to `externAddOne`
    const externAddOne = extern fn(x: i32) i32 {
        return x + 1;
    };

    _ = addOne(1);
    _ = externAddOne(1);
};

Anonymous blocks and statements would add another layer, something like file.zig.blk1.addOne

@presentfactory
Copy link

presentfactory commented Aug 15, 2024

I personally don't understand the justification for closing this. Specifically these points Andrew gave:

functions have symbol names. These symbol names show up in stack traces, performance measurements, debugging tools, and various other things. In other words, functions are not unnamed. This is different from constants and types, which may exist ephemerally and be unnamed.

In most cases a first class function would have a name, it's the constant it is being assigned to, e.g.:

const foo = fn () void {};

This is the same as this to any debugging tools:

fn foo() void {}

Sure by allowing first class functions you introduce unnamed functions, but it is incorrect to say functions have names. Functions don't even exist as far as the computer is concerned, they are merely blocks of code pointed to by an address which we conceptualize in a specific way. Names are for humans and the compiler, not the computer usually (other than a few specific things like say shared library exports).

Of course from the POV of debugging tools as mentioned functions "need" a name, but there is no reason you cannot just auto-generate a name as a placeholder, the more important thing in debugging is having the source correlation and that works fine with an anonymous function regardless. C++ has lambdas for instance and they can be debugged just fine, so this clearly is not an impossible task for LLVM and similar tools.

Furthermore, by this logic one could say a struct "needs" a name as well, so why does Zig support anonymous structs if this is such a big deal? Any codegenned C code from Zig or debugging information generated around such things will likely need an auto generated name and be marginally harder to understand debugging wise, but I think we can all agree that the patterns and elegance anonymous structs give Zig are great and definitely worth this small cost of being unnamed occasionally.

It interferes with shadowing of locals

I really don't see how. Why would a lambda have issues with shadowing when the de-facto method of "member function nested in anonymous struct" or a nested scope in general not have shadowing issues? It is simply a nested scope, there's no issues there unless I am missing something obvious.

In particular, on embedded devices, it can be valuable to have no function pointer chasing whatsoever, allowing the stack upper bound to be statically computed by the compiler.

Why would first class functions be any different from normal functions in this regard? Most the time a lambda or similar nested function declaration would be either called directly shortly after declaration or passed into another function (e.g. sort) as a comptime parameter. All of that is totally statically analyzable and no different from a normal function in terms of statically calculating stack size.

It is worth noting that maybe in the past this was more of a concern when the type fn () void served as a function pointer, but as far as I know that has not been a thing since 0.10. I'd expect most things using anonymous functions to do what mem.sort does currently which should make it clear to the compiler that it is a direct call rather than having to chase a function pointer:

fn sort(comptime T: type, comptime compare: fn(a: T, b: T) bool) void {}

Function pointers I'd imagine would be reserved for runtime-dependent function behavior which is a thing one can do already in Zig. I don't really see why first class functions specifically would encourage people to use function pointers any more than they can already do with normal functions, or again with the function-in-a-struct hack used in leiu of this feature.

I find it very difficult to read and maintain code that makes heavy use of inversion of control flow.

This may be true but as others have pointed out control flow inversion is already a thing in Zig's own standard library, and really just it is the most sane way to do something like a sorting comparator. Callbacks are a staple of C programming and also qualify as inversions of control so this is not a foreign concept to a C-like language. I agree such control flow can be confusing to follow if abused and don't think it should become a /common/ thing, but these patterns are used occasionally to make code more readable than it would be with another approach.

At the end of the day, I do not understand why I have to resort to an ugly hack like this when I want a nested one-off function closer to relevant code using it (and more narrowly scoped to avoid polluting other scopes with these otherwise not useful helper functions):

const foo = struct {
  fn impl() void {}
}.impl;

When I could just be writing this instead:

const foo = fn () void {};

If Zig already allows me to nest function declarations like this already then why avoid a more general and elegant solution like first class functions? This proposal is not asking for features that actually introduce runtime cost like captures (unlike other proposals), so this is really just a syntax change for how functions are declared that implicitly adds a bit of flexibility.

If this is all really to avoid the notion of anonymous functions because they "don't have a name" then that seems a bit extreme to me. It is slightly less good for debugging potentially sure, but C++ and Rust both can be debugged fine despite this. Realistically I do not expect people to even use lambdas much and to prevent abuse Zig's best practices could always simply suggest avoiding using them except in idiomatic places (e.g. sorting functions) to avoid encouraging too much functional programming craziness.

I am just tired of having to have to use a hack to do this given it adds syntactically noisy boilerplate and an extra indentation level to do what should just be possible to do in the language in a more clean way. Not having this feature is harming code readability more than I think any of the listed concerns would hurt it in the long run. There may be some unforseen challenges in implementation but I'm sure such things can be worked through rather than just giving up on the idea entirely.

As such I request that this decision is reconsidered as it feels like it was dismissed a bit on a whim especially given the large community desire for something like this.

Edit:
As an aside, to alleviate concerns about the fact that this would make function decls a bit more verbose, I think some of the suggestions earlier to keep the existing fn foo() void {} syntax as syntax sugar for const foo = fn() void {}; are probably a good idea since it is indeed a bit more noisy to do it with the purely first class approach. It is a bit of a 2 ways to do the same thing situation which Zig is trying to avoid I'd imagine, but there's other small syntax sugar things like that in the language already with .? and try for instance, so it's probably fine for something so commonly used.

Worst case if the whole first class function/lambda proposal is still deemed too problematic I'd still want something like the ability to define nested functions in any case to solve the code locality issue without relying on a hack. Just being able to do something like this even though it's not a lambda is still nicer than the current options of either using the struct hack or having the comp function 100 lines of code away and doesn't require anything beyond being able to nest declarations:

fn thing() void {
  // 100 lines of code here

  fn comp(a: i32, b: i32) bool { return a < b; }

  sort(i32, data, comp);

  // Mode code here
}

@mnemnion
Copy link

I personally don't understand the justification for closing this. Specifically these points Andrew gave

It's likely that you didn't understand the full significance of what that sentence entails. True, other kinds of value have entries in the symbol table. But to quote the Linux manual:

Function symbols (those with type STT_FUNC) in shared object files have special significance. When another object file references a function from a shared object, the link editor automatically creates a procedure linkage table entry for the referenced symbol. Shared object symbols with types other than STT_FUNC will not be referenced automatically through the procedure linkage table.

Functions have names in a meaningfully different way from objects (STT_OBJECT in ELF). Only a small subset of named const/var objects in Zig code end up in the symbol table, not to be confused with DWARF data.

Also, I think you're responding to a version of this proposal which was never seriously considered. When you say this:

I think some of the suggestions earlier to keep the existing fn foo() void {} syntax as syntax sugar for const foo = fn() void {}; are probably a good idea since it is indeed a bit more noisy to do it with the purely first class approach.

There was never a reality where Zig had two different flavors of function declaration. That would split the language into two dialects, based on the sort of declaration authors prefer, and that sort of thing has always been anathema to the project.

I want to note that a lot of your post here distinguishes between two different syntaxes for writing functions, by calling your preferred one "first class". That isn't correct, writing something in a different way doesn't make it any more or less first class.

Rather, one of the motives for the proposal was making function declaration more consistent with other proposals for changes in function behavior which could be glossed as "first class", and those proposals were rejected. That fact weakens the strength of the proposal itself, which is why Andrew also mentioned the rejection of those proposals in closing the issue.

as others have pointed out control flow inversion is already a thing in Zig's own standard library, and really just it is the most sane way to do something like a sorting comparator.

I don't agree with this, the use of a sort function pointer in the standard library has measurable negative impact on performance. It's of course true that various structures using function pointers are pretty important in systems programming, it's just that this is not a great example of that. I had hoped that the poorly-named stack capturing macros proposal would get picked up, because compile-time code injection is actually what we want here. It's possible that a more focused iteration of that idea which answers the various difficulties which #6965 identified might be considered.

Your 'ugly hack' I view as simply the trivial case of something more broadly useful: struct types as a namespace. The case you make for using that pattern is a weak one from my perspective, it's your decision to want that code closer to where it's used than the language cleanly supports, there's nothing stopping you from using a utility struct to put all your helper functions in without any pollution of the larger namespace, and then localizing one of them is just

const comp = comparisons.i32LessThan;

This is clearer, it's a cleaner separation of concerns, and it's easier to refactor.

It's a good thing when abusing a feature looks like an ugly hack. The actually-useful role of making a single-name namespace and extracting a function from it involves comptime specialization. I'm happy to agree that it isn't the prettiest construct, but that informs the reader that something weird is going on. If I saw that pattern being used to just define a function, I would be confused, if it were in a patch for a codebase I was responsible for, I would ask for it to be rewritten in a more natural way.

I don't happen to share Andrew's dislike of functional programming at all, I think it's great. I firmly agree that it's a bad fit for a language like Zig, for the same reasons: chasing function pointers is something the optimizer can rarely fix, and that demotivates making it any easier to work with than it already is.

Furthermore, by this logic one could say a struct "needs" a name as well, so why does Zig support anonymous structs if this is such a big deal?

It won't for long.

This seems to boil down to a desire to write Zig in a way you've become accustomed to from other languages, at the cost of adding a second, equivalent syntax to the language, for one of the most basic aspects of it. At no point do you describe something functional which Zig is preventing you from accomplishing, or some way that the status quo encourages bugs which the other style would reduce.

No one has a problem writing a function and then calling std.mem.eql in the middle of it, even though that function is very "far away" from its definition. To repeat a point, if I was reviewing code which made a tiny namespace in the middle of a function, just so that a comparison function could be closer to the code which calls it, I would ask them to put it where it belongs instead. Unless, of course, that function definition relies on a comptime-known parameter of the function call, in which case it's being used correctly.

But this isn't the nested function issue and it isn't the lambda/closure issue, it's just about the syntax used to define function bodies. We don't need two ways to do that, and the better of the two options was chosen.

@tmccombs
Copy link

tmccombs commented Aug 24, 2024

Functions have names in a meaningfully different way from objects (STT_OBJECT in ELF). Only a small subset of named const/var objects in Zig code end up in the symbol table, not to be confused with DWARF data.

So what? In most cases the compiler can generate a name from const name = fn ....
And there isn't any reason it can't generate a name for truly "anonymous" functions either. It could for example, use the name for the parent function/namespace suffixed with a unique identifier for the anonymous function, like anon1 for the first anonymous function, anon2 for the second, etc. Scala does something very similar to this.

I don't agree with this, the use of a sort function pointer in the standard library has measurable negative impact on performance.

It doesn't seem like that necessarily has to be the case. std.mem.sort takes a comptime function pointer. It seems like, at least in theory`, the compiler should be able to compile that to use a direct function call instead of using a dynamic function pointer, or even inline the function body.

there's nothing stopping you from using a utility struct to put all your helper functions in without any pollution of the larger namespace

That isn't always possible. In particular if you need to use comptime parameters.

Also, your i32LessThan example makes sense to do that. But what if you need a function that is only used in one place, and doesn't make sense outside of that context? Moving it into a separate namespace makes the code harder to follow.

The actually-useful role of making a single-name namespace and extracting a function from it involves comptime specialization.

This is the biggest reason why I want this feature. If you need to create a function that depends on comptime parameters, the only current way to do it now is ugly and awkward. And I disagree that it is good to have ugly syntax to indicate that "something weird is going on". Especially, since in zig, using comptime parameters isn't really all that weird.

The other reason is that I agree with others that a const = syntax for functions, would make the function more consistent.

This seems to boil down to a desire to write Zig in a way you've become accustomed to from other languages,

No, it comes from a frustration with having to create anonymous structs that have a single function, then extract the function from them.

And if you need to define a function that depends on comptime parameters, there often isn't any way to avoid that.

p.s.

The implementation for std.mem.sort (which just forwards to std.sort.block) hast this chunk of code at the beginning:

const lessThan = if (builtin.mode == .Debug) struct {
        fn lessThan(ctx: @TypeOf(context), lhs: T, rhs: T) bool {
            const lt = lessThanFn(ctx, lhs, rhs);
            const gt = lessThanFn(ctx, rhs, lhs);
            std.debug.assert(!(lt and gt));
            return lt;
        }
    }.lessThan else lessThanFn;

which, IMHO would be a lot clearer if it could be written like

const lessThan = if (builtin.mode== .Debug)  fn (ctx: @TypeOf(context), lhs: T, rhs: T) bool {
           const lt = lessThanFn(ctx, lhs, rhs);
           const gt = lessThanFn(ctx, rhs, lhs);
           std.assert(!(lt and gt));
           return lt;
       } else lessThanFn;

@mnemnion
Copy link

It seems like, at least in theory

I was referring to practice, not theory. That's why I said 'measurable' negative performance impact.

We disagree on how ugly the single-reference pattern is, I think. Which is fine.
But nesting functions inside other functions is unrelated to the expression syntax which is the main event on this proposal. You'll want to register your objections to that already-made decision in #229, but I suggest reading it before you do so.

Debates about how to sugar the cereal rarely lead to anything productive. My main point stands: you've offered aesthetic objections, but not practical reasons why your subjective preference will lead to better code. I expect you don't see it that way, which is also fine, but I won't find it worth litigating in further detail, and want to suggest that it isn't a good use of your time either.

@rohlem
Copy link
Contributor

rohlem commented Aug 25, 2024

Furthermore, by this logic one could say a struct "needs" a name as well, so why does Zig support anonymous structs if this is such a big deal?

It won't for long.

@mnemnion I disagree with that interpretation of the mentioned proposal. Direct quote from the comments:

"anonymous struct literal syntax" .{.a = 3} stays, but instead of being of an ad-hoc created unnamed "anonymous struct type" it is now of an ad-hoc created unnamed "regular struct type".

The only user-facing language change you will see [...] is certain coercions no longer working. These coercions are the defining property of anonymous struct types, and the only difference between them and concrete struct types.

The types are still unnamed. Further, type literals struct{...}, union{...} etc. can still be immediately used as part of expressions, without being assigned to a const or var declaration.

It seems like, at least in theory

I was referring to practice, not theory. That's why I said 'measurable' negative performance impact.

You were referring to a practical/measurable impact without any indication you made any sorts of measurements, which might make it difficult to judge for others (including me) how you measured, how big the impact was, etc..
It also makes it difficult (if not impossible) for others to refute or further discuss that statement.
Besides, we tell the compiler what code to generate. If we know of a performance deficiency, and know of a way to address it, we can implement such an optimization. In this case inlining from comptime-known function pointer.

@presentfactory
Copy link

presentfactory commented Aug 25, 2024

To address the stuff @mnemnion has said:

It's likely that you didn't understand the full significance of what that sentence entails.

This is irrelevant, again, C++ has anonymous functions and it works fine on Linux. Again, auto-generating function names for the purposes of things that require a name is trivial. This is a non-issue. Functions do not need names, they purely exist for humans. It is good to have a descriptive name for a function yes, but the benefit in readability anonymous functions provides can justify some auto-generated naming here and there in a place few will ever look.

That would split the language into two dialects, based on the sort of declaration authors prefer, and that sort of thing has always been anathema to the project.

This is already the case, as I explained with try, .? and member function syntax sugar. Two ways to do a thing is undesirable generally I agree, but for common use cases like this syntax sugar which makes code more readable is usually an acceptable tradeoff for the slight bit of "two ways to do the same thing" such things introduce.
Really anything more complex than a Turing machine already has a near infinite number of redundant ways to express the same thing anyways, programming languages just need to make a tradeoff in this regard otherwise the language will be excessively verbose and unusable.

The solution here is as it has always been really, Zig's "style guide" or documentation can simply recommend how to write idiomatic Zig that matches what you see in the stdlib and other codebases. That's kinda the only way you can rectify the many ways to do something in a programming language, or at least give people a path of least resistance in how stuff is written in the language, that way people just implicitly prefer it for the sake of readability and succinctness. In this case if there was a more verbose alternative way to declare functions (e.g. const f = fn () void {}; vs fn f() void {}) people probably just wouldn't use it unless necessary.

That isn't correct, writing something in a different way doesn't make it any more or less first class.

Sure, but making functions actual literals and allowing them to be treated just like any other variable is first class, and that is part of the elegance this proposal has. There's other ways to accomplish what is desired sure (e.g. just allowing function decls to be nested), but having first class functions implicitly solves that problem while also allowing more elegance and flexibility to the language overall.

I don't agree with this, the use of a sort function pointer in the standard library has measurable negative impact on performance.

These are not function pointers as others have explained, they are passed as comptime function types. If Zig cannot figure out to inline something like that then it's a compiler bug plain and simple.

This is simply misinformation anyways, note the distinct lack of dynamic jumps or even call instructions (because it's all inlined):
https://godbolt.org/z/4G4Mj7vrG
Even ReleaseSmall won't split into its own function because with only one call site it's smaller code wise to inline it in this case:
https://godbolt.org/z/4s6KofeeG

There may be cases where yes the compiler will keep it as its own function for code size (if it's deemed instruction cache is more valuable here) or if you request ReleaseSmall, but Zig's compiler not being able to make this optimization is a compiler issue, not an issue with the concept of control flow inversion.
In no cases should Zig generate a dynamic jump here though like you'd find when using a totally runtime function pointer. Zig's sorting API doesn't even allow that due to only accepting comptime function types.
In other languages like C yes you use function pointers for this sort of thing and without LTO it may indeed be forced to do this sort of dynamic jump, especially if you're talking about stuff across DLL boundaries (like a sort function implemented in the C runtime, e.g. qsort), but Zig is not C so this is not a relevant point.

It won't for long.

This is not removing the first class nature of structs as far as I can tell. In Zig "anonymous structs" are actually referring to the weird .{} literal initialization syntax, I am talking about things like:

const S = struct {};

Here struct {} is an anonymous struct type literal (dunno what Zig formally calls this, but it creates a new struct type and it's anonymous), it has no name until it is assigned to S.
Similarly, for functions to be first class the same would be true:

const F = fn () void {};

The function here is anonymous as it has no name until it is assigned to F. In both cases this flexibility allows for various useful patterns, be it for generic programming or for lambdas:

fn Thing(comptime T: type) type {
  return struct {
    a: T,
  };
}

sort(data, fn (lhs: i32, rhs: i32) bool {
  return lhs < rhs;
});

If Zig forced you to name a struct like you have to do in C/C++ this elegant generic programming model would turn into something like this uglier approach:

fn Thing(comptime T: type) type {
  // C-Like Struct Syntax
  struct Impl {
    a: T,
  };

  return Impl;
}

Same can be said for the current state of inflexible function decls right now. Even worse, they cannot be nested like this hypothetical struct example so such generic programming becomes impossible (though if comptime parameters will be exposed to nested function scopes like that I guess is a matter of debate, they are at least for structs but maybe this might cause problems for functions).

This seems to boil down to a desire to write Zig in a way you've become accustomed to from other languages

In some ways yes, but this is because I have years of experience programming with other languages and as such I have recognized the strengths and pitfalls of these languages. Zig in general does a good job of recognizing many of these pitfalls in say C/C++ and rectifying them while also keeping good ideas around (or taking others from say Rust), and that is part of why I use Zig since it feels fairly well thought out.
As such, I have recognized over the years that function locality is important for these one-off helper functions, having them scattered far away from where they are actually used hurts readability, and this is why many languages offer lambdas or some way to do this as a result.

So yes, I want Zig to be like these other languages, just because another language does something does not immediately make it bad and something Zig must do the opposite of. Languages like C++ while they have problems still have many good ideas in them, it's just a matter of picking them out and refining them in a way that makes sense for Zig.
Maybe in the end that means that first class functions and etc aren't fit for Zig, but I am just here to provide points as to why I think they may be useful.

No one has a problem writing a function and then calling std.mem.eql in the middle of it, even though that function is very "far away" from its definition.

Yes, because mem.eql has a very well-understood common meaning for anyone who is familiar with the standard library and is well documented in its behavior.
The point is that a random compareElements function being called from a sorting function in the middle of a large file has no inherent intuition about it. Like ok yes I know it's a comparator, but beyond that I have no idea how it may be implemented especially if I am trying to debug something. I will of course have to scroll up and find that function then analyze it to understand what exactly it is doing. This act of having to scroll around the file for bits of reusable code used only as a one-off thing in a single function as such makes the codebase less readable.
It just depends really, maybe that comparator is common enough throughout the codebase that it should be moved to a helper function on the type itself, but there are definitely cases where totally unique code needs to be passed to something as a callback, or just some bit of code needs to be called twice in a function in a row with slightly different parameters, and Zig just has no good way to do this.

I hit this desire just a few days ago in fact when working on my physics engine, I have some math to perform various axis penetration tests and I need to do it for 2 axes. 2 is small enough to not warrant a for loop (or bother with a inline for one orsomething), but 2 of anything is enough to violate the DRY principle if I just copy pasted the code. As such I ended up with this:

tryAddExistingVertex(
    &intersection,
    incident_edge_vertex_0,
    perpendicular_reference_axis,
    incident_edge_vertex_0_penetration_depth,
    incident_edge_vertex_0_perpendicular_projection,
);
tryAddExistingVertex(
    &intersection,
    incident_edge_vertex_1,
    perpendicular_reference_axis,
    incident_edge_vertex_1_penetration_depth,
    incident_edge_vertex_1_perpendicular_projection,
);

This is fine and it removes the code duplication, but it is a pain now as this function declaration is nearly 300 lines away in my implementation. It's not the end of the world really but in C++ this is something I'd use a lambda for just to act as a nested function declaration (C++ also lacking elegance with its function decl syntax forces usage of lambdas for this sort of thing).

Tldr, there is very good real world reason for this to exist. I do not know what the best solution is for Zig as there are clearly a lot of things to consider, but imo it is evident Zig is lacking something to handle these cases nicely. Using the current hack of a struct to declare a function is just a bit too detrimental to readability to me to be happy with it, especially for such a reasonably common thing to expect to do in a language. Ironically too such a pattern is only possible to begin with due to another bit of Zig syntax sugar, that being the member function decl syntax (which as I said before could be seen as "redundant" itself).

@mnemnion
Copy link

Tldr, there is very good real world reason for this to exist. I do not know what the best solution is for Zig as there are clearly a lot of things to consider, but imo it is evident Zig is lacking something to handle these cases nicely.

Sure. But again, nested function declarations aren't this issue, they're another issue. My main motive in responding in the first place was about expression syntax, and we've run out of things to say about that.

I maintain that code injection, not an anonymous function, is what is actually wanted here. You want to be able to define a small piece of behavior and have it appear inline in the function which receives it, that's broadly useful, but there's no reason why it should look like a function in the process. That is a solution other languages use, but not all of them, and I think Zig could do better.

The thing about not liking the way a certain syntax looks on the screen, is that you can just change your mind at any time, there's nothing preventing you from overcoming the frustration you feel when you decide to make a one-declaration namespace.

There's a reason the syntax exists. You'll discover that this doesn't compile:

fn innerFunction(T: type, val: anytype) bool {
    return struct {
        fn inner() bool {
            return (val < std.math.maxInt(T));
        }
    }.inner();
}

test innerFunction {
    var runtime: usize = 129;
    runtime += 12;
    try expect(innerFunction(u8, runtime));
}

This is something which would work in many languages, but not in Zig. Only T can 'cross' the struct, because it's comptime known.

If you replace runtime with a comptime_int like, say, 140, then this test will pass. The struct, being a type, is known to only exist at comptime, it signals what can and cannot be captured by the inner declaration.

Allowing bare nested function declarations would mean that functions so defined follow two rules: if they aren't nested, they can access anything in the outer context, including mutable state, but if they are nested, only comptime-known values. By reusing the generic struct creation mechanism, we have only one rule. That's better.

@presentfactory
Copy link

presentfactory commented Aug 25, 2024

I see no reason why that's relevant, the rules for how such things work can be figured out as needed. This is purely a request for more flexible syntax really in how functions are created to make them more in line with the "everything is an expression" model Zig already uses for everything else (be it types themselves, if statements, for loops, etc). Zig after all does let you do this sort of nested function decl as you showed, it just requires an ugly hack to do what people want.

This is how your example would be if using said proposal which to me just is much nicer to read. All the same rules can still apply so I don't know what's so bad about it:

fn innerFunction(T: type, val: anytype) bool {
    return fn() bool {
        return (val < std.math.maxInt(T));
    }();
}

test innerFunction {
    var runtime: usize = 129;
    runtime += 12;
    try expect(innerFunction(u8, runtime));
}

@tmccombs
Copy link

tmccombs commented Aug 25, 2024

But again, nested function declarations aren't this issue, they're another issue.

Nested function declarations could be used in some cases where this would be useful, but not all. In particular consider my example from std.sort.block above. Nested function definitions don't help there, because you want to use different definitions depending on comptime state.

Also, the issue you linked to is specifically about closures, which is not what any of us are asking for here.

And for that matter, using const = syntax would simplify conditional definitions at top levels as well.

@Maldus512
Copy link

Maldus512 commented Aug 26, 2024

Functions have names in a meaningfully different way from objects

I find the argument that "functions are special symbols" very hard to understand. Sure, they enjoy a special status at lower levels, but as others have pointed out there is no reason their uniqueness can't be handled entirely by the compiler, especially with Zig's comptime capabilities.

It's a good thing when abusing a feature looks like an ugly hack.

While I do agree that ultimately referencing the blandly-named method of an anonymous struct is an acceptable solution, it's still harder to read than both anonymous function expressions and nested functions.

You want to be able to define a small piece of behavior and have it appear inline in the function which receives it, that's broadly useful, but there's no reason why it should look like a function in the process.

To me "a small piece of behavior" comes very close to the exact definition of function. Why should it not look like one? Making up an entirely new concept to pass down behavior when functions are right there seems wasteful to me.

There was never a reality where Zig had two different flavors of function declaration. That would split the language into two dialects, based on the sort of declaration authors prefer, and that sort of thing has always been anathema to the project.

I think this is the real crux of the problem. Having two equivalent syntaxes for function definition would be just confusing; at best, one of the two would be dominant and the other remain obscure.
One should note that there is a precedent, as Lua has this exact configuration, and the result is somewhat acceptable. You can, if you wish, define all functions as anonymous and assign them to a variable instead of using the syntactic sugar function foo().
In my experience the desugared version is only used to pass anonymous functions around, but I can't deny that I favor the local foo = function() syntax and that I would do the same in Zig, if allowed to.

I realize that making function definitions expressions goes against Zig's idiom, but I would also argue that anonymous functions are too simple and useful not to include them and that the anonymous struct's method hack is far too ugly (i.e. unreadable) to be acceptable. A middle ground would be nice, perhaps restricting anonymous functions to actually be anonymous (i.e. forbidding to assign an anonymous function to a variable).

@aretrace
Copy link

I appreciate @Maldus512's middle ground conclusion.

@McSinyx
Copy link
Contributor

McSinyx commented Aug 27, 2024

perhaps restricting anonymous functions to actually be anonymous (i.e. forbidding to assign an anonymous function to a variable)

Minor nitpick: this would be an arbitrary style restriction without any semantic benefit IMHO, pushing programmers into full-on doing lambda calculus to use parameter names instead of stack variables. Names are essential to reduce the mental overhead, one of which (inversion of control flow) was cited for rejecting this RFC.

I understand the motion for the proposal after the rejection as primarily against the struct { pub fn foo... }.foo thing, which goes against multiple points in the language's zen if it's ever to be taken seriously:

  • Communicate intent precisely (intention is to define the function, not the struct)
  • Edge cases matter (instead programmers are punished for having to pass a function to a library)
  • Only one obvious way to do things (the way is nether obvious nor unique, i.e. doesn't have to be wrapped in a struct)
  • Reduce the amount one must remember (the idiom must be learnt by heart)
  • Focus on code rather than style (to prevent a certain style, the code is made cumbersome in other scenarios)

I'm not attempting to spark flames here, just want to point out that the issue can be a lot simpler than we make it to be. Since fn ... can be losslessly rewritten to struct { pub fn foo... }.foo, why not make it a syntactic sugar? I'd argue that accepted ones like the destructuring syntax are a lot bolder than this.

mlugg added a commit to mlugg/zig that referenced this issue Dec 23, 2024
The new representation is often more compact. It is also more
straightforward to understand: for instance, `extern` is represented on
the `declaration` instruction itself rather than using a special
instruction. The same applies to `var`, making both of these far more
compact.

This commit also separates the type and value bodies of a `declaration`
instruction. This is a prerequisite for ziglang#131.

In general, `declaration` now directly encodes details of the syntax
form used, and the embedded ZIR bodies are for actual expressions. The
only exception to this is functions, where ZIR is effectively designed
as if we had ziglang#1717. `extern fn` declarations are modeled as
`extern const` with a function type, and normal `fn` definitions are
modeled as `const` with a `func{,_fancy,_inferred}` instruction. This
may change in the future, but improving on this was out of scope for
this commit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests