Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: curry underscore arguments to create anonymous functions #24990

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

stevengj
Copy link
Member

@stevengj stevengj commented Dec 8, 2017

This PR addresses #554, #5571, and #22710 by "currying" underscores in function calls like f(_,y) into anonymous function expressions x -> f(x,y). (Note that _.foo works and turns into x -> x.foo since it is equivalent to a getfield call, and _[i] works and turns into x -> x[i], since it is equivalent to a getindex(_,i) call.)

This will help us get rid of functions like equalto (#23812) or occursin (#24967), is useful for "destructuring" as discussed in #22710, and should generally be convenient in lots of cases to avoid having to explicitly do x -> f(x,y).

Some simplifying design decisions that I made:

  • The currying is "tight", i.e. it only converts the immediately surrounding function call into a lambda, as suggested by @JeffBezanson (and as in Scala). So, e.g. f(g(_,y)) is equivalent to f(x -> g(x,y)). (Note that something like find(!(_ in c), y) will work fine, because the ! operator works on functions; you can also use _ ∉ c.) Any other rule seems hard to make comprehensible and consistent.

  • Only a single underscore is allowed. f(_,_) throws an error: this case seems ambiguous to me (do you want x -> f(x,x) or x,y -> f(x,y)?), so it seemed better to punt on this for now. We can always add a meaning for multiple underscores later. Similar to Scala, multiple underscores are converted into multiple arguments in the order they appear. e.g. f(_,y,_) is equivalent to (x,z) -> f(x,y,z). See rationale below.

The implementation is pretty trivial. If people are in favor, I will add

  • Tests
  • Documentation
  • Fix interaction with broadcasting f.(x, _)
  • Fix interaction with keyword arguments f(x; y=_)
  • Support chained comparisons, e.g. 3 ≤ _ ≤ 10, since they parse as a single expression?

@stevengj stevengj added the needs decision A decision on this change is needed label Dec 8, 2017
@ararslan ararslan added needs docs Documentation for this change is required needs news A NEWS entry is required for this change needs tests Unit tests are required for this change parser Language parsing and surface syntax triage This should be discussed on a triage call labels Dec 8, 2017
@stevengj
Copy link
Member Author

stevengj commented Dec 8, 2017

Note that if this is merged, using an underscore as an rvalue should probably become an error (currently it is deprecated). As an lvalue, it is fine — we can keep using it for "discarded value" (#9343).

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Dec 8, 2017

My gut feeling is that I'd rather not rush this and that a few ad hoc legacy currying functions in Base aren't going to kill us. Although I have to say that the simplicity of the rule has the right feel to me.

@stevengj
Copy link
Member Author

stevengj commented Dec 8, 2017

I agree that we don't want to rush new features like this, but I feel like this idea has been bouncing around for a long time (since 2015), the reception has been increasingly positive, and we keep running into cases where it would help as we transition to a more functional style (thanks to fast higher-order functions).

@stevengj
Copy link
Member Author

stevengj commented Dec 8, 2017

(I guess technically this is partial function application, not currying. Note that Scala does something very similar with underscores, and it allows multiple underscores. In a some circumstances Scala apparently requires you to explicitly declare the type of the underscore argument, though, at least if you want its type inference to work.)

@yurivish
Copy link
Contributor

yurivish commented Dec 8, 2017

What potential backwards incompatibilities does this rule expose us to?

I know Stefan spent a while trying to find a simple set of rules for determining the "tightness" of a partial application expression and found there were some difficulties. Merging this would close the door on changing the tightness rule in 1.0.

Is there anything else?

@stevengj
Copy link
Member Author

stevengj commented Dec 8, 2017

@yurivish, using _ as an rvalue is already deprecated. So this should be backward-compatible with non-deprecated code.

@stevengj
Copy link
Member Author

stevengj commented Dec 8, 2017

Scala's rule for "tightness" is (Scala Language Specification, version 2.11, section 6.23.1):

An expression e of syntactic category Expr binds an underscore section u, if the following two conditions hold: (1) e properly contains u, and (2) there is no other expression of syntactic category Expr which is properly contained in e and which itself properly contains u.

which seems essentially the same as the one I've used here (i.e. the innermost expression that is not itself an underscore "binds" it). Scala is more general in two ways:

  • Scala allows underscores to appear multiple times to denote multiple arguments, e.g. f(_,_,z) would be x,y -> f(x,y,z).
  • Scala allows underscores to appear in more than just function calls. e.g. you can do if _ foo; else bar; end

Both of these could easily be added later, after my PR, since they are a superset of my functionality.

Regarding Stefan's rules, I found them pretty complicated and confusing: why should 2_+3 produce x -> 2x+3, but sqrt(_)+3 produces (x->sqrt(x))+3? Anyway, as the expressions get more complicated than a single function call, it becomes less onerous to simply type x->.

@yurivish
Copy link
Contributor

yurivish commented Dec 8, 2017

@stevengj Of course, you're right – I think I misused the phrase "backwards incompatibility". I meant to ask whether there were any "terse partial function application" syntaxes that we may want to introduce in the future that would conflict with the functionality implemented in this branch.

If it turns out that future enhancements would almost certainly be supersets of this one, then, well, fantastic. 😄 I'd personally love to see something like this in the language so long as it's not limiting our options too much down the line.

Edit, written before the last paragraph in the preceding post: Hmm. Expressions with operators (like 3 * _ + 2) would be turned into (x -> 3 * x) + 2 in most cases, unless they happen to be lowered to a single "call" (like 1 + _ + 3).

@ararslan
Copy link
Member

ararslan commented Dec 9, 2017

I don't have strong feelings one way or another about this change but I've played around with it a little and I noticed something kind of odd:

julia> _ + 1
#1 (generic function with 1 method)

julia> _

WARNING: deprecated syntax "underscores as an rvalue".
ERROR: UndefVarError: _ not defined

That is, the behavior at the top level is surprising. I kind of wonder whether it might be better to adjust the parsing behavior in that context, kind of like how generators require parentheses at the top level:

julia> i for i in 1:10
ERROR: syntax: extra token "for" after end of expression

julia> (i for i in 1:10)
Base.Generator{UnitRange{Int64},getfield(, Symbol("##3#4"))}(getfield(, Symbol("##3#4"))(), 1:10)

Also, I realize this is expected based on the stated rules, but it took me by surprise:

julia> map(_, [1,2,3])
#9 (generic function with 1 method)

I'm not entirely sure what I expected with that expression, but it wasn't that. 😛

Behavior aside, I love how tiny the implementation actually is! It's impressively minimal for a potentially powerful feature.

@andyferris
Copy link
Member

Really cool :)

@ararslan The behavior you highlight as odd seems natural to me.

(An aside: what is an unbracketed Generator ambiguous with, or stated another way, why does it require brackets?)

@piever
Copy link
Contributor

piever commented Dec 9, 2017

Really nice, also for the data ecosystem where the frequently used function i->i.a can now be written as _.a (see #22710, example application mean(_.a, df) where df is an iterable of NamedTuples).

@rfourquet
Copy link
Member

As |> is being deprecated, with the idea to have it dovetail nicely with some syntax for currying functions in the future, it would be nice to be sure that this PR's change will be compatibe with the overall design. I think I would prefer to have the whole thing fleshed out before introducing the feature, which can be introduced in 1.x, and as Stefan said few "ad hoc legacy currying functions in Base" is fine in the meantime.

@stevengj
Copy link
Member Author

stevengj commented Dec 9, 2017

@ararslan, we could certainly implement (_) as a shorthand for the identity function, but I'm not sure it's worthwhile. It seems better to leave _ as an r-value (i.e. not as an argument to a function call) deprecated/disallowed. We can always add a meaning later.

I would rather not require parens around e.g. _ ≤ 1 and _.a, which are unambiguous and nicely terse already.

@stevengj
Copy link
Member Author

stevengj commented Dec 9, 2017

@rfourquet, this is totally orthogonal to the piping syntax (#20331); I'm not sure you think why the latter would affect this.

We've been discussing a currying/partial-application shorthand for literally years now, and all of the discussions seem to have been converging towards underscore syntax, which has a long track record in Scala.

@piever
Copy link
Contributor

piever commented Dec 9, 2017

The underscore syntax is actually already used, for example in in the Query package, via macros (see docs). The only difference with Query is that there there is no tight binding, for example @filter(df, _.a > _.b) would actually correspond to filter(i->i.a > i.b, df) rather than filter((i->i.a) > (i->i.b), df). It'd be really cool to also make this "loose binding" possible without macros but I'm really not sure how. There was one interesting related idea by @stevengj here to use double underscore for loose binding.

@rfourquet
Copy link
Member

this is totally orthogonal to the piping syntax (#20331); I'm not sure you think why the latter would affect this.

This link concerns a removal of the piping syntax without plan to re-introduce it later, which was discussed recently at #5571. It's true that curying syntax can exist independantly to piping syntax, but the design of piping syntax, which deals directly with curryed functions, could influence how the currying syntax should be designed.

@stevengj
Copy link
Member Author

stevengj commented Dec 9, 2017

I think it would be short-sighted to tie currying syntax to piping syntax. Currying (partial function application) is useful for lots of things beside piping.

@stevengj
Copy link
Member Author

stevengj commented Dec 9, 2017

I could imagine a "loose-binding" syntax like _(...), e.g. _(_.a > _.b). This would still be pretty terse, would be unambiguous and not restricted to certain precedence levels, is easy to implement, and would be compatible with this PR.

@bramtayl
Copy link
Contributor

bramtayl commented Dec 9, 2017

Is _(_.a > _.b) that much shorter than _ -> _.a > _.b ?

@stevengj
Copy link
Member Author

stevengj commented Dec 9, 2017

@bramtayl, possibly not. In general, I'm skeptical of the need for this kind of terse syntax beyond single function calls. And all the attempts to come up with a "loose binding" DWIM syntax seems to lead to rules that are very confusing and context-dependent.

@stevengj
Copy link
Member Author

stevengj commented Dec 9, 2017

@yurivish, in this PR, _ only works in the arguments of a function. _(arg1, arg2) is currently not allowed. It could be added later, of course.

@yurivish
Copy link
Contributor

yurivish commented Dec 9, 2017

@stevengj I deleted my comment right after posting when I realized what I said didn't make sense .(my example was map(_(arg1, arg2), list_of_functions)).

But I thought it didn't make sense for a different reason — because it would turn the entire expression into an anonymous function. It seems your approach is even more conservative than I realized. 😄

Would map(_[3], arrays) work, since it desugars to a getindex call where the _ is in argument position?

@stevengj
Copy link
Member Author

stevengj commented Dec 9, 2017

@yurivish, yes map(_[3], arrays) works as-is in this PR.

@LilithHafner
Copy link
Member

LilithHafner commented Apr 4, 2024

What if we lower with an "exactly one call" rule, lowering into a special AutoComposingFunction <: Function type, and explicitly compose AutoComposingFunctions with all the operator functions defined in Base: +, *, ^, in, ...?

The if we define ^ on Function, that would be less specific than on AutoComposingFunction, so we could have (_ + 1)^2 => x -> (x + 1)^2 and dirname^2 => x -> dirname(dirname(x)).

@rapus95
Copy link
Contributor

rapus95 commented Apr 4, 2024

What if we lower with an "exactly one call" rule, lowering into a special AutoComposingFunction <: Function type, and explicitly compose AutoComposingFunctions with all the operator functions defined in Base: +, *, ^, in, ...?

I love the idea of using language features to build flexibility while the syntax is strict!
I first thought we need it to be a subtype of Function in order to work at all. But it's the opposite. We WANT it to be subtype of Function so we can freely use it everywhere where Functions are allowed. Clever.
And everyone who wants can participate in auto composition by defining a new method for that new type. And methods that don't specialize on AutoCompose consume it as a function automatically.
Wondering if AutoCompose or AutoComposeFunction will be the better name.

@jariji
Copy link
Contributor

jariji commented Apr 4, 2024

Imho the advantage of being statically understandable probably outweighs the flexibility gained from doing that.

@rapus95
Copy link
Contributor

rapus95 commented Apr 4, 2024

Imho the advantage of being statically understandable probably outweighs the flexibility gained from doing that.

discoverability surely will be important! we could introduce a certain way of hinting which arguments take part in autocompose.
Like how we use the bang in do!(a) to hint mutation.
Here we could hint it as a certain character within the argument name aswell as generally mentioning
"the + operator takes part in autocomposition for all argument places" within the docs of the operator.

@StefanKarpinski
Copy link
Member

There's an increasing consensus on recent triage discussions to just implement the "argless lamda" version of this which starts the anonymous function with ->. This is currently a syntax error, so it can't break anything. It can be implemented entirely in lowering, and it leaves no ambiguity about where the closure expression ends. E.g. you'd write map(->_[_], arrays, inds) as shorthand for map((a, i) -> a[i], arrays, inds).

@adienes
Copy link
Contributor

adienes commented May 1, 2024

hate to be cold water but imo the marginal value of code legibility with headless -> is definitely negative

@StefanKarpinski
Copy link
Member

StefanKarpinski commented May 1, 2024

This is why this issue never gets anywhere. Any time there's something straightforward core devs can get on board with, everyone piles on with what they dislike about it and with random alternative proposals and considerations. The end result is no progress at all. I have a hard time understanding how map(_[_], arrays, inds) can be considered a nice, readable syntax but map(->_[_], arrays, inds) is that so much worse.

@o314
Copy link
Contributor

o314 commented May 1, 2024

jj-do-it

@adienes
Copy link
Contributor

adienes commented May 1, 2024

I have a hard time understanding how map(_[_], arrays, inds) can be considered a nice, readable syntax but map(->_[_], arrays, inds) isn't.

I don't think either of those are particularly legible

what's wrong with map(getindex, arrays, inds)

The end result is no progress at all

I mean 😬 maybe that's the lesson to learn here is there isn't a great solution

@aplavin
Copy link
Contributor

aplavin commented May 1, 2024

Any time there's something straightforward core devs can get on board with, everyone piles on with what they dislike about it and with random alternative proposals and considerations. The end result is no progress at all.

Tbh, I don't see why this is objectively a negative in this case.
Including something like this into Base requires careful thought due to stability commitment.

Both in terms of syntax (eg, I find it surprising that a single symbol_ is proposed to mean different things within a single expr like _.a + _.b or _[lastindex(_) - 1])
and in terms of actual implementation (eg, should _.a be an anonymous function or Base.Fix2(getproperty, :a)).

Meanwhile, there are actual existing solutions in packages, some of them quite popular (eg Accessors.jl and its macros). They may not cover the area completely, but that's typically because of the inherent complexity and unlikely to be helped by putting one of the solutions into Base.

@rapus95
Copy link
Contributor

rapus95 commented May 2, 2024

Both in terms of syntax (eg, I find it surprising that a single symbol_ is proposed to mean different things within a single expr like _.a + _.b or _[lastindex(_) - 1])

Same thing meaning appropriately different things in different places is the whole core feature of multiple dispatch. It's just about getting used to how it's used. In the cases you showed, I see there's some profit to be made by adding a way to replicate the argument into all places (which is easier than the other way around). And currying is about applying multiple arguments, not replicating a single argument into multiple slots. (Thus, for your usecase, consider combining the proposal with a replication function that just replicates a single argument into as many slots as you want it). For further thoughts, take a look at my comment in the other issue (#38713 (comment)). Plus, on a related but different note, take a look at #53946.

So, to frame it that way, the underscore is the object that stands for "a slot to curry into" within the argument-less anonymous function syntax. To me, that's a plausible perspective.

and in terms of actual implementation (eg, should _.a be an anonymous function or Base.Fix2(getproperty, :a)).

Can't be a Fix2 as that would need loads of special casing since the syntax is generic. But for that situation you referred to, #53946 will be the better approach anyway.

Meanwhile, there are actual existing solutions in packages, some of them quite popular (eg Accessors.jl and its macros). They may not cover the area completely, but that's typically because of the inherent complexity and unlikely to be helped by putting one of the solutions into Base.

Take a look at #53946 for that particular use case. And you're totally right. The cases you gave in the previous paragraphs aren't particularly well-fit for this proposal. That's why I'd not use them as examples for this proposal. Plus, as said, currying is about applying multiple arguments, not about putting a single argument in multiple places.


what's wrong with map(getindex, arrays, inds)

How it scales for cases like map(->_[:,_], arrays, inds).

This example is a particularly good one due to the closeness of the arguments arrays, inds regarding readability compared to the ordinary lambdas (with proper argument names) in my opinion. In all these cases, using argument names just duplicates the information that already could be taken from the list of properly named arguments to map. Compare:
map((array, ind)->array[:,ind], arrays, inds)
map((a, i)->a[:,i], arrays, inds)
map(->_[:,_], arrays, inds)

Also goes particularly well with the ByRow-syntax from DataFrames.jl as you can freely reorder the columns that will be fed into the function while knowing for sure that everything will be applied in order. ([:time, :offset]=>ByRow(->sinpi(2*_)+_))


The core benefit is that we already have trained eyes to scan for -> and underscores stand out a lot visually. So to me, it's more about having a clean and clear message on "what and where" (-> indicating an anonymous function and _ indicating the slots to drop into) rather than forcefully saving individual characters. This clarification also comes from separating "what happens" and "what will be fed into". Thus, it certainly benefits situations in which the used functions and operators clearly convey what the arguments will be used for and the actual arguments that will be used are close in code (as in the previous section). If that's not the case, create a named function with named arguments!

Additionally, to me, the leading -> has an integral role in clearly showing and conveying where the anonymous function will materialize, without jumping through additional hoops to track reversely, where the underscore's scope will end. I want to look at underscores to give me information (where arguments end up) instead of creating anxiety whether it materializes in the way I expect it to materialize. (Though, that point could also be made about not knowing how many underscores to expect). We still follow the idea that inline creation of callable objects is hinted by ->.


Including something like this into Base requires careful thought due to stability commitment.

Which shouldn't be a lot of effort, as far as I understand it, since it's a simple lowering pass that just collects underscores from left to right and augments the already existing -> node with generated argument names.


One of the biggest advantages I see with this approach, is (as @StefanKarpinski said it), there's nothing that could be broken by implementing it that way, since that syntax isn't used anywhere. So by introducing it, we technically cannot hurt anyone. It's just not everyone will want to use it. (Plus fence-less variants could still be added at a later point). So I'm totally on board for "Just do it", as @o314 presented it. For "how to use it when it's there" read the Tl;Dr:'s at the end.

And for all those arguing that ->_[_] is difficult to read, I still find it a lot easier than just having _[_] in the wild. So even for those not wanting to use the new syntax, it easily conveys the necessary information. Function. Arguments. And particularly for those new to the language, the former approach hints that it creates an anonymous function and if the concepts of "a function" and "indexing" are known to you, you can already guess that there's 2 arguments. That these need to come in order is just a simple learning from that point.

But to be fair, I am already sold on this perspective for its cleanliness and was one of the first appreciators of that idea (#24990 (comment)) and even created a standalone issue for it (#38713).

Tl;Dr: When to use the proposal?

Whenever the "what will happen to the arguments?" is clearly inferrable from the used functions and operators and the argument list is close in code so that using an ordinary lambda would feel like name duplication.
Examples:
map(->_[:,_], arrays, inds)
([:time, :offset]=>ByRow(->sinpi(2*_)+_))

Tl;Dr: When to not use the proposal?

Whenever the "what will happen to the arguments?" can't be easily inferred from the used functions and operators or the actually used arguments are far away. Use named functions with named arguments in that case.
Also, don't use it, if the proposal doesn't fit the problem at hand, as in these cases:

  1. Currying is specifically about applying multiple arguments one by one in order, not about applying a single argument in multiple places. (Instead, you could consider combining a replication function with currying, which would solve the problem)
  2. There's RFC: Curried getproperty syntax #53946 specifically for property access.

@tpapp
Copy link
Contributor

tpapp commented May 2, 2024

The end result is no progress at all.

Which may be socially optimal. The marginal benefit of adding syntax for a rather special case is small, so not doing it can be a reasonable choice.

Unless a proposal has a large benefit, it is perfectly fine to reserve syntax that currently errors for future expansion of the language. Options have a value, and core devs could consider not filling up every nook and cranny of the of the syntax space with some gimmick as a perfectly fine choice.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented May 2, 2024

@adienes: I don't think either of those are particularly legible

what's wrong with map(getindex, arrays, inds)

That's a valid opinion and writing map(getindex, arrays, inds) is totally fine. But there are cases where a function name doesn't exist and that's what this issue is about. @adienes, Am I to take it that you are in the camp of "I don't like this underscore anonymous function business at all and would rather no such syntax were added to the language"?

@aplavin: Both in terms of syntax (eg, I find it surprising that a single symbol_ is proposed to mean different things within a single expr like _.a + _.b or _[lastindex(_) - 1])

If repeated underscores mean the same thing each time, then, as @rapus95 said, this is not a feature for currying anymore, it's just a feature for creating single-argument functions. That's just not nearly as useful or general. It's also not how similar underscore currying syntax works in other languages that have it, such as Scala.

@adienes
Copy link
Contributor

adienes commented May 2, 2024

Am I to take it that you are in the camp of "I don't like this underscore anonymous function business at all and would rather no such syntax were added to the language"?

I think after reading through the first 100 comments or so for the first time I was in the camp of "underscore to replace a single argument in a single function sounds nice"

after the next 300 comments or so I think I am now more in the camp of "good lord it's impossible to find consensus, probably should just move on"

I definitely don't like (subjective of course!) multiple underscores in a function bc the ambiguity of whether it means the same argument or distinct arguments will always overwhelm the convenience for me

though realistically if _ worked on just infix operators and getproperty only I bet it would still cover like 80% of the proposed use cases

@stevengj
Copy link
Member Author

stevengj commented May 2, 2024

I'm in the camp of -> f(_) is less readable than x -> f(x), so the savings of 1 character aren't worth it.

I think the original sin here is that trying to handle more than a single function call with _ makes the syntax inherently too complicated to be worth it when we have a perfectly good x -> ... syntax for the general case.

@oschulz
Copy link
Contributor

oschulz commented May 2, 2024

I'm in the camp of -> f(_) is less readable than x -> f(x), so the savings of 1 character aren't worth it.

I hesitate to make another suggestion here, given the length of the discussion - but maybe just as a wild idea: If we would use \bullet, we could write f(•, b, •), which would look so nicely math-on-paper-like. \bullet is currently not a legal name in Julia, so it couldn't break anything. I do realize that there would be two major drawbacks: It's nice to read, but much longer to type than _. And there's the risk of confusion with \cdot. Like I said, just a wild idea.

@rapus95
Copy link
Contributor

rapus95 commented May 2, 2024

Unless a proposal has a large benefit, it is perfectly fine to reserve syntax that currently errors for future expansion of the language. Options have a value, and core devs could consider not filling up every nook and cranny of the of the syntax space with some gimmick as a perfectly fine choice.

Though, to be fair, having some core developers getting to a close-to consensus after more than 6 years of being an open issue and debated multiple times without getting to a consensus doesn't fit the description "filling up every nook and cranny". It rather hints that they regularly stumble over situations where they would've liked to use the particular syntax but ended up with "ah, still not implemented".

I definitely don't like (subjective of course!) multiple underscores in a function bc the ambiguity of whether it means the same argument or distinct arguments will always overwhelm the convenience for me

That ambiguity shouldn't exist in the first place because underscore is meant to be the thing that will never stick to any value. If you use it as a left-hand side, it'll just pass the value straight through to the dump and you won't be able to access the value again. This proposal just reverses sides for this approach (as lambdas go left to right while assignments go right to left). You put in an argument and it will be handed right through to the first pit(=underscore) but aside of that it will be lost, you can't access it in a later place.

I'm in the camp of -> f(_) is less readable than x -> f(x), so the savings of 1 character aren't worth it.

You're totally right that we shouldn't use this proposal in that situation. Luckily there's f in that situation which is equivalent. If we were restricted to the situation you lined out, the whole proposal would be a non-starter for me as well. But for the situation where a function has multiple arguments and you only want to pass through some arguments, aka curry it, the leading clearly shows, where the lambda will materialize. It shifts the focus on what happens to whatever will be supplied into next.
Example: ->3^_ or ->sinpi(2*_) or ->ifelse(_, _, nothing) or ->coalesce(_, 0). In all these cases the proposal provides better readability than supplying an arbitrary character/argument name because the argument name won't encode any relevant information.

Disclaimer: Arguing about the following points and to some extent invalidating them by no means shall invalidate personal preferences and personal opinions! It's intended as a non-emotional argument. If someone feels attacked by it, that's not my intent. And I'd very like to help to shift perspectives and creating missing key insights, to get to a similar conclusion/perspective as I have it, in order to see the cleanliness, mathematical elegance and benefits of it!

To date, I feel like there are 3 types of people around (excluding those in favor of the proposal)

  1. Please don't introduce any syntax (= underscores are confusing)
  2. With leading ' ->' it's not short enough (= code length minimizers)
  3. But the proposal doesn't fit my own workflow (for example, I want a single argument in all slots)

Point 3) isn't a blocker IMO, because it mixes multiple situations. Everything that is needed to make it compatible is a replication function that replicates a single object into multiple places. The other way around (dropping multiple arguments into different places if the proposal would just replicate a single argument) is harder to construct and thus less general, plus, still not currying. But I'd love to help to create the missing piece (replication) in another place so that we can mix replication, property currying (#53946) and this proposal together to make best use of all those features. For example, we could go for
(df |> replicate3) .|> [.Amount, .Name, .Price] |> ->"$_ of $_ are available for $_ each"
The only missing piece would be a shorter syntax for the argument replication. I'd give that. But there are in my opinions shorter approaches than using the whole underscore syntax for it. And yes, this might haunt some for its style. But again, no one will be forced to use it.

Point 2) Conciseness and clarity over shortness is IMO a very strong argument in favor of the -> part. Since without that, it's less obvious how many functions will be created if there are multiple underscores. With the -> part, it's always as many functions as there are heads, and they are exactly where those arrows are placed. That's clean. So I wouldn't consider that to be a blocker either.

Point 1) Well, yes, that's to some extent a matter of taste, in which I would refer to "not using it" as being a good strategy. And regarding the "it's confusing" argument, I assume that will change once the proposal has settled and is used in many places. Then it will feel natural. The only thing that prevents getting the natural feel for it right now, most presumably, is forcefully sticking to a perspective that's conflicting with that proposal. So regarding this point I'd aswell say it's a non-blocker as it will resolve itself with time.

Tl;Dr: I'm in favor of adding the argless to the language nonetheless while educating people on intended use cases (it's not meant as a general replacement for functions and lambdas or just passing the named function itself) and thinking about a concise and short syntax for the replication (but independently of the underscore proposal) for those in 3).

@mbauman
Copy link
Member

mbauman commented May 2, 2024

I think the original sin here is that trying to handle more than a single function call with _ makes the syntax inherently too complicated to be worth it

I'm totally on board with the idea of a single function call, but then the secondary sin is that Julia's syntax is fancy enough that it can be tricky to know what "counts" as a single function call. Is tuple construction (1, 2, _) a single function call? Or does ylims!(ax, (0, _)) work? What about [0, _]? Or T[0, _]? What about Expr(:curly)s? Or _()? Or _[]? Or 0 <= _ < 1? Or x + _ + y? Or x +̂ _ +̂ y? Or _'? Or x ? _ : y? Or _::T? Obviously, you can draw lines and reasons for each and every one of these things to act one way or the other, but my point is that those lines are squiggly and weave their way from one syntax to the next.

IMO, we need a clear precedence boundary. -> gives us that in a way that everyone already understands.

@stevengj
Copy link
Member Author

stevengj commented May 2, 2024

IMO, we need a clear precedence boundary. -> gives us that in a way that everyone already understands.

We already have x -> for the case where the precedence boundary is unclear. What's the point of saving 1 character?

Someone on discourse posted a quote that "all new features start at -10 points" which is pertinent here.

@rapus95
Copy link
Contributor

rapus95 commented May 2, 2024

Disclaimer again: I don't want to heat this up, so there's sincere curiosity behind my questions further down. I can't expect a sincere answer to them, but I'll know for myself that I at least tried.

We already have x -> for the case where the precedence boundary is unclear. What's the point of saving 1 character?

To quote myself

Currying is specifically about applying multiple arguments one by one in order, not about applying a single argument in multiple places.

twice

But for the situation where a function has multiple arguments and you only want to pass through some arguments, aka curry it, the leading clearly shows, where the lambda will materialize.

thrice

Conciseness and clarity over shortness

four times

How it scales for cases like map(->_[:,_], arrays, inds).

So I wonder, is there any particular strategy behind reiterating single-argument and other cases which are already accepted as bad-fit for this proposal? What would you need to discuss the (to our perspective) well-fit cases instead of bad-fit cases? Or to shift the focus away from cases which this proposal isn't designed for. It's like saying do block notation doesn't have a big benefit because why would I write

identity() do x, y
  return x+y
end

instead of

function (x,y)
  return x+y
end

Well yes, that's a miserable example to show the benefits of the do-block syntax. I'm totally with you there. And on top, the given examples could be reduced to +. Well yes, that's also true. But that's still not, what the do-block notation was designed for.

And likewise, for the current currying proposal. All examples you gave are a particularly bad fit. No one wants to disagree there. But we measure by benefits in the designed-for case, instead of "how hard can you go against the design idea". Otherwise, we'd have a restrictive totally static, non-composable, and many more less-elegant adjectives language.

@aplavin
Copy link
Contributor

aplavin commented May 2, 2024

If repeated underscores mean the same thing each time, then, as @rapus95 said, this is not a feature for currying anymore, it's just a feature for creating single-argument functions.

Ok, but is it a negative?

IMO, with multiple arguments, explicit single-letter names make the code easier to understand while not adding much overhead.
Eg, I definitely prefer map((a,i) -> a[i], ixs, arrs) to map(-> _[_], ixs, arrs). Both cleaner and allows stuff like map((i,a) -> a[i], ixs, arrs).

That's just not nearly as useful or general.

[citation needed] :)
Single-argument anonymous functions are very common in Julia, both in Base, in packages, and in user code. If there is a way to make them significantly cleaner/shorter, that would be a win.
There are packages that address it in different context (eg piping), but not sure whether a fully generic simplification is possible/desirable.

Note that this doesn't preclude multi-arg functions: they could use stuff like _2 (similar to Mathematica) or __ or unicode suffixes.

It's also not how similar underscore currying syntax works in other languages that have it, such as Scala.

It's how similar syntax works in other languages, such as anonymous functions in Mathematica: there, # means the same argument.

I'm sure there are languages leaning either way! In Julia, there is lots of prior art (in packages) with _ meaning the same argument, but is there anything at all reasonably-used with the opposite meaning?

@tpapp
Copy link
Contributor

tpapp commented May 3, 2024

Single-argument anonymous functions are very common in Julia, both in Base, in packages, and in user code. If there is a way to make them significantly cleaner/shorter, that would be a win.

Of course "significant" is a subjective term, especially in this context, but arguably just dropping a single character from beginning of an anonymous function is stretching the concept.

My take from this (and related discussions) so far is that given the syntactic complexity of Julia, "curry underscore" can do very little and the debate is about allocating this modicum of expressiveness. Coding styles differ so different people want to use it for various things, and there is no clear consensus. In which case, is it worth adding extra syntax for so little gain?

@rapus95:

To date, I feel like there are 3 types of people around (excluding those in favor of the proposal)

  1. Please don't introduce any syntax (= underscores are confusing)

I think that there is a fourth category you missed: people who may have been initially interested in this feature, but after having seen the ramifications and limitations they don't think it would improve the language.

It's not that underscores are confusing (the latest single-argument proposal with the -> boundary is particularly easy to understand), it is just that they buy very little.

@knuesel
Copy link
Member

knuesel commented May 3, 2024

(Plus fence-less variants could still be added at a later point).

If we add the "argless lambda" as in ->sinpi(2*_), I don't think we should later add fence-less _ as in hypot(3, _). Having both in the same language would be confusing: you might want to rewrite x -> g(x, f(y, _) with an argless lambda -> g(_, f(y, _)) and boom you've changed the meaning of the underscore in the f call.

Example: ->3^_ or ->sinpi(2*_) or ->ifelse(_, _, nothing) or ->coalesce(_, 0). In all these cases the proposal provides better readability than supplying an arbitrary character/argument name because the argument name won't encode any relevant information.

This is subjective, I find all the following more readable except the coalesce case: x->3^x or x->sinpi(2*x) or (x,y)->ifelse(x, y, nothing) or x->coalesce(x, 0).

To date, I feel like there are 3 types of people around (excluding those in favor of the proposal)

  1. Please don't introduce any syntax (= underscores are confusing)
  2. With leading ' ->' it's not short enough (= code length minimizers)
  3. But the proposal doesn't fit my own workflow (for example, I want a single argument in all slots)

Another category missing on top of @tpapp's 4th:

  1. fence-less _ (only for single function call) is better because it focuses on a simple but very common case with a meaningful concept (partial application of a function), and this allows removing the -> visual noise.

although @mbauman makes a very good point:

Julia's syntax is fancy enough that it can be tricky to know what "counts" as a single function call. Is tuple construction (1, 2, _) a single function call? Or does ylims!(ax, (0, _)) work? What about [0, _]? Or T[0, _]? What about Expr(:curly)s? Or _()? Or _[]? Or 0 <= _ < 1? Or x + _ + y? Or x +̂ _ +̂ y? Or _'? Or x ? _ : y? Or _::T? Obviously, you can draw lines and reasons for each and every one of these things to act one way or the other, but my point is that those lines are squiggly and weave their way from one syntax to the next.

This is probably the most powerful objection to the fence-less single call rule. Presumably the rule is not counted as "something straightforward" in @StefanKarpinski's comment because of this.

I think the answer is that we should expand the documentation on syntax sugar. Every piece of sugar that correponds to a function call should be documented there. This would explicitly define what counts as a single function call: anything not in the list would not count.

Actually documenting those sugars is something we should do anyway I think, because knowing this aspect of lowering is essential to really understand Julia syntax, and cannot be swept under the rug when you consider function dispatch (e.g. how overloading Generator can affect the behavior of (x^2 for x in 1:3)).

@MasonProtter
Copy link
Contributor

IMO this issue is going nowhere, and nobody will agree on how to make a useful syntax for this.

However, there's a different way we can go about this which is likely to be much less controversial: JuliaLang/JuliaSyntax.jl#212

@c42f has shown with various analyses that it's really rare in the ecosystem for one to have f(@m x, y) actually be meant to mean f(@m((x,y))), and we could somewhat safely transition that syntax to mean f(@m(x), y). If we did that, then it opens the door to various macros that can make different choices about what underscores mean (and it would actually fix some bugs Claire found out there)

@mbauman
Copy link
Member

mbauman commented May 3, 2024

Yeah, the crux of my point is that because intuitions vary on what this should do (see above 500 comments), we're in all the more need of an understandable and straightforward rule to clearly describe whatever we've chosen it to do. -> is just one way to do that. Macros are another. Giving up on it altogether and just using x->x is also ok in my book.

It's possible there exists a simple rule without a "fence", but goodness the edge cases are tricky.

For example, the current implementation here notes that we'd need to add special support for broadcasting and kwcalls and chained comparisons — those are are broken because they are internally composed of multiple expressions. They also all feel like they have an obvious answer. But there's so many more cases. With the current implementation here:

  • (;_...) is x->(;x...) but (_...,) is an error (intentionally disallowed by the implementation)
  • _[] is x->x[] and _{1} is x->x{1} but _() is an error (_ as rvalue)
  • _[end] is x->x[y->lastindex(y)] (an error)
  • A[_] = 1 creates a closure that does the setindex!, but then throws it away and you are only left with the value 1 (the RHS); this should probably be an error
  • A[_] .= 1 is similar, but errors as the closure ends up in internal machinery
  • _ & a is x->x & a and _ .&& a would work with broadcasting fixed, but _ && a is an error (_ as rvalue) — the analogy here makes it seem like it should work, but that's adding control flow... which then...
  • b ? _ : nothing feels like it should work but it's really no different from if b; _; else; nothing; end which seems more obvious it shouldn't
  • "$_" is x->"$x" and "$_ $_"("hello","world") even works but `$_` is an error (puts the wrong closure through some internal machinery)
  • (i for i in _) is x->(i for i in x) (actually works!) but [i for i in _] is an error (another wrong closure in internals)
  • a += _ defines a closure that tries to add itself to its argument and then names it a. I guess maybe I actually asked for that? I don't know, by now I've completely confused myself.

What's the rule? Or is it just a whole pile of special cases?

My dreams of a fenceless _ syntax are well and truly dead.

@tpapp
Copy link
Contributor

tpapp commented Jun 25, 2024

Note that #29875 (which introduces a pretty useful feature IMO, one that is consistent with current usage of _) is waiting on the resolution of this proposal. Could triage please weigh the benefits of keeping this one open vs moving on?

@oschulz
Copy link
Contributor

oschulz commented Jun 25, 2024

If #54653 becomes reality, it would be really neat if we'd emit Base.Fix objects instead of anonymous functions where possible. Some packages like InverseFunctions can already take advantage of Base.Fix1/Base.Fix2, but could not dispatch on anonymous functions, obviously.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:lowering Syntax lowering (compiler front end, 2nd stage) needs decision A decision on this change is needed triage This should be discussed on a triage call
Projects
None yet
Development

Successfully merging this pull request may close these issues.