-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consistent argument order, Vectorization #11722
Comments
This issue seems too broad to be of much practical use. |
In cases like this it's best just to give an exhaustive list of suggested changes. Everything is data, so I'm not sure what a data argument is. The argument order for |
To elaborate a bit, the argument order for |
I'm new to Julia, but I can list examples as I come across them. Anything in the match family, so |
Making the function the first argument to The match functions are more debatable, but python uses this argument order, as do other OO languages like ruby which uses You are right that julia's libraries were not designed with this chaining thing in mind. Fully admitting my lisp bias, I've never wanted many different syntaxes for function calls. |
R's apply functions and plyr functions all have data arguments first (with the notable exception of Edit: not to mention as.numeric, as.character, and family |
And the notable exception of R's |
Another one is |
Well then we appear to be at an impasse. It looks like vast numbers of key functions (write, convert, map, ...) are incompatible with this redesign. As you point out, changing all of these is not really practical. However these functions are so fundamental that it wouldn't be practical to do things differently in the future either; certainly we can't have half of our I/O functions use |
I think there's an argument to be made for switching the argument order for |
...but I'm all for consistency, though honestly I would rather have changed all functions to search for their first argument in their second one. Maybe that's just me. Anyway, for chaining, it's not clear to me whether you'd more often pass one or another (both are "data"). |
Ok, here's a make-shift solution. It allows users to either vectorize or switch around the arguments of functions or both. using Lazy
# convert singletons to a 1 entry vector
function to_array(x)
if (@> x typeof) <: AbstractArray
x
else
[x]
end
end
# switch the first and second items in a tuple
function switch_tuple(tuple)
if (@> tuple length) == 2
index = [2, 1]
else
index = [2, 1, 3:(@> arguments length)]
end
tuple[index]
end
# return an expression, which suffixes a function and reorders the arguments
function switch(function_symbol::Symbol)
suffixed_function_string = string(function_symbol) * "_s"
suffixed_function_symbol = @> suffixed_function_string parse
quote
function $suffixed_function_symbol(arguments...)
$function_symbol((@> arguments switch_tuple)...)
end
@> $suffixed_function_string parse
end
end
# return an expression, which suffixex a function
# and maps/broadcasts a function over an argument/arguments respectively
function vectorize(function_symbol::Symbol)
suffixed_function_string = string(function_symbol) * "_v"
suffixed_function_symbol = @> suffixed_function_string parse
quote
function $suffixed_function_symbol(arguments...)
arguments = map(to_array, arguments)
if length(arguments) == 1
map($function_symbol, arguments...)
else
broadcast($function_symbol, arguments...)
end
end
@> $suffixed_function_string parse
end
end
# vectorize some functions
@> :vectorize vectorize eval
@> :switch vectorize eval
@> :eval vectorize eval
# lists of functions to reverse and vectorize or just vectorize
reverse_and_vectorize = [:ismatch, :write, :convert]
just_vectorize = [:replace]
@> begin
# first reverse functions in reverse_and_vectorize
reverse_and_vectorize
switch_v
eval_v
# add in just_vectorize functions
vcat(just_vectorize)
# vectorize
vectorize_v
eval_v
end
# test
@> ["a", "b"] ismatch_s_v(r"a") See also #8450 The argument switch suffers (I think) from unnecessarily copying the arguments. Is there a way to unpack a tuple in a particular order? Edit: fixed anonymous function issue |
@bramtayl why anonymous function (bound to non-const global) |
Oops I'm still getting used to Julia style function definitions. See edit. It's there to be able to broadcast singletons. |
That's kind of clever. But I don't see how this is a huge improvement over I also think (#8450) that ideally iteration is something you do with an existing function, not something that requires a new definition for each function. Nobody should decide which functions get |
The advantage of write_s would be that you can write out a chain without naming anything. Of course, names make debugging easier, but reading code harder. @> begin
text
# a whole bunch of string processing
write_s(conn)
end With out that, your code would look like this: @as _ begin
text
# string processing with a bunch of unnecessary _'s
write(conn, _)
end |
For me, the greater regularity of reusing the same |
Given that chaining makes many function arguments implicit (and therefore makes line-local reasoning more difficult), I generally think it makes code harder to read. I also agree that having a single canonical |
If I had to map, broadcast, or for loop (!) every time I use a function iteratively (pretty much always) AND had to write code that was riddled with under-scores, I'd probably give up and go back R. Consider the chain above without any help: reverse_calls = map(switch, reverse_and_vectorize)
reverse_symbols = map(eval, reverse_calls)
both_symbols = vcat(reverse_symbols, just_vectorize)
vectorize_calls = map(vectorize, both_symbols)
vectorize_symbols = map(eval, vectorize_calls) Useful for debugging, but it doesn't seem likely that any of these items cluttering up the environment will be used again. or, with underscores: @as _ begin
reverse_and_vectorize
map(switch, _)
map(eval, _)
vcat(_, just_vectorize)
map(vectorize, _)
map(eval, _)
end Not even going to bother with for loops. Yes, obviously there is such a thing as too much chaining. You might argue that the argument switching and vectorization should be done in two separate chains. But chaining also organizes code and clarifies structure. |
To me, a short vectorization syntax is a major need and that's what #8450 is supposed to deal with. In your example, the code would be much shorter already, and you might accept suffering a few underscores for chaining if vectorization allowed merging a few lines of the chain. Also note that R does not provide any native support for chaining, so it's not like this kind of thing couldn't be done in Julia as well. |
As @JeffBezanson said before, we seem to be at an impasse. This issue seems primarily focused on code aesthetics and it seems that several other Julia developers don't share your aesthetic sensibilities. It sounds like there are several specific functions, like |
For me the decisive issue when debating these kinds of DSL-specific concerns is this: given that you want alternative surface syntax for writing identical semantics, why not just write an actual DSL that gets translated to Julia code? Why does Julia syntax need to match the syntax of your ideal DSL? I think people tend to overuse shared-parser DSL's for this kind of use case. If you want truly independent syntax, a separate-parser DSL is the way to go. It has higher start-up cost for the DSL-developer, but completely frees you from having to reach consensus with others about your preferred syntax. |
If you play your cards right, you might get armies of @hadley followers switching to Julia in the next few years, all of which are pretty used to chaining (and the kind of things in DataFramesMeta). I'm certain I couldn't tackle writing a new language. But maybe a package? |
I, for one, don't see that as a goal worth pursuing given that I work on Julia in my free time. I'd vastly prefer having a language that can be used for the things that R will never be good at than a language that tries to emulate what R can already do well enough. The problem with the idioms you're advocating for is that they don't come equipped with any fleshed out solutions to the issues of semantics that have held back work on #8450. The surface syntax of a replacement for vectorization is the least difficult part of what needs to be done to remove vectorization from Julia. The important issue is designing a set of semantics that's amenable to compilation to efficient code. That depends on progress on integrating functions into Julia's type system in such a way that multiple dispatch can operate effectively when using higher-order functions. See Jeff's thesis for some ideas about how this might be done and packages like FastAnonymous.jl for interim improvements. |
For most applications, the bottleneck is how long it takes to write the code, not how long it takes to run the code. |
That's completely false when you work at scale. |
Conceded. I thought the point of Julia was to be the best of both worlds. Otherwise, why not just write in Fortran? |
@bramtayl I think julia does give a lot of flexibility (more than I've ever seen elsewhere) to have the best of both worlds (although there are still a lot of rough edges, but those are being worked out), and maybe you can accomplish what you want in a package, with all of the power of multiple dispatch and julia macros behind you... |
The problem is that we don't have any means for reaching an agreement about what "best" means. My take on this issue is that many of the people involved in this thread have very substantial disagreements about what good code looks like. I'm skeptical that we can resolve such large disagreements about aesthetics by talking them through. |
Ok, I'll just keep the code for personal use only. |
I agree that productivity is incredibly important, but I don't see how something like chaining syntax is drastically more productive than our normal syntax. As for vectorization, if I thought writing I just realized that it's a bit odd for chaining to work on the first argument. In languages with function currying, delayed arguments are added at the end. For example you could write
because |
Maybe it's worth working for consistency in the other direction then? Is that piping to the last argument or to the second argument? |
Yes, that's quite possible. I think it should pipe to the last argument. |
Here's an extension to whole-sale vectorize all the functions in a Module. using Lazy
using DataFrames
using DataFramesMeta
@> :typeof vectorize eval
@> :eval vectorize eval
@> :string vectorize eval
@> :convert switch eval
@> :ismatch switch eval vectorize eval
function get_functions(m::Module)
df = @> begin
DataFrame(symbol = @> m names)
@transform(
is_function = ( @> begin
:symbol
eval_v
typeof_v
.==(Function) end ) ,
compatible = ( @> begin
:symbol
string_v
ismatch_s_v( r"^[A-Za-z]" )
convert_s( Vector{Bool} ) end ) )
@where(:is_function & :compatible) end
df[:symbol] end
@> Base get_functions vectorize_v eval_v |
Could somebody please explain the use of |
# @as lets you name the threaded argmument
@as _ x f(_, y) g(z, _) == g(z, f(x, y)) The benefit of _ is that you can specify exactly where you want the previous result to be piped into the next expression. It is needed in particular if there is not consistent method of figuring out where to pipe the previous result to (i.e. the first argument, the last argument, etc.). _ is only a symbol, and Jeff's currying example is from some other language, but you might assume a somewhat equivalent function. |
I'm really not a fan of this. It's clearly the wrong abstraction: instead of the function and the iteration being treated as orthogonal (which they really are), it doubles the number of definitions in a module without regard for which of the new definitions actually make sense. Concepts should be composed using general mechanisms, not by concatenating names with underscores. I continue to fail to understand the advantage of |
Yeah, I was trying to use chaining as often as possible for illustrative purposes. Might have gone a bit over-board. Agreed that doubling the number of functions in a module is a little ridiculous, but until #8450 gets sorted it it might be useful, especially if no one else starts using |
It's also worth noting that the code above can be rewritten with Lazy's using Lazy
using DataFrames
using DataFramesMeta
@> :vectorize vectorize eval
@> :eval vectorize eval
@> [:typeof, :string, :ismatch] vectorize_v eval_v
function get_functions(m::Module)
df = @> begin
DataFrame(symbol = @> m names)
@transform(
is_function = ( @>> begin
:symbol
eval_v
typeof_v
.==(Function) end ) ,
compatible = @>> begin
:symbol
string_v
ismatch_v( r"^[A-Za-z]" )
convert( Vector{Bool} ) end )
@where(:is_function & :compatible) end
df[:symbol] end
@> Base get_functions vectorize_v eval_v Edit: an extension for multiple packages: function make_functions(m::Module)
quote
@> $m get_functions switch_v eval_v
@> $m get_functions vectorize_v eval_v switch_v eval_v
end
end
@> :make_functions vectorize eval
@> [Base, Lazy] make_functions_v eval_v |
I have to say that I find this style of coding pretty inscrutable – it doesn't seem like an improvement in terms of readability or writability. But I'm glad that the macro system lets you experiment like this. |
I heard from some people that they like threading/piping because it lets them always read code "left to right, top to bottom", and with nesting/composition they have to find where the expression starts and where it continues to. Some like to reason about code by describing it with phrases and it's harder to come up with words to describe Some other people have said they get lost in nesting easily, always reading expressions all at once. |
I have no idea how to extend this to work with macro functions, seeing as you can't use the splat operator with them. |
Which also exemplifies one additional usage pattern in which this style of appending operations at the end generally helps, that is building expressions step by step at the REPL while looking at the output, shell-style (if performance is not your primary concern, obviously). |
Seems like this is a dup of #5571? |
Yes, I think this discussion can be continued in #5571. |
While 4 of those are OOP, the order of arguments is kinda mimic the functional style when the list goes first and the function goes the second. The Python and Clojure use another convention when function goes first. |
It would be convenient to have data arguments consistently as the first argument. This is particularly useful for chaining. A few examples where the argument order is puzzling:
convert
,ismatch
.The text was updated successfully, but these errors were encountered: