Consistent argument order, Vectorization #11722

bramtayl · 2015-06-16T06:52:27Z

It would be convenient to have data arguments consistently as the first argument. This is particularly useful for chaining. A few examples where the argument order is puzzling: convert, ismatch.

The text was updated successfully, but these errors were encountered:

StefanKarpinski · 2015-06-16T06:58:06Z

This issue seems too broad to be of much practical use.

JeffBezanson · 2015-06-16T15:15:04Z

In cases like this it's best just to give an exhaustive list of suggested changes. Everything is data, so I'm not sure what a data argument is.

The argument order for convert is very firmly established, and this function is extremely important so we're not going to change it. But please feel free to list other examples.

JeffBezanson · 2015-06-16T15:16:38Z

To elaborate a bit, the argument order for convert matches call; convert(T,x) and T(x) are related.

bramtayl · 2015-06-16T15:47:14Z

I'm new to Julia, but I can list examples as I come across them. Anything in the match family, so ismatch, match, eachmatch, matchall. map and broadcast are also not very compatible with chaining, but I can see why the argument order makes sense. I guess by data I mean the argument that's most likely to be chained. This might be a keep-in-mind-for-the-future issue rather than a go-back-and-change-everything issue. It seems like in Julia, function arguments and type arguments tend to come first (perhaps in resonance with call), which has the side effect of filling my code with a bunch of underscores (via @as _ begin).

JeffBezanson · 2015-06-16T16:13:29Z

Making the function the first argument to map is universal. Is there even one language that doesn't do that?

The match functions are more debatable, but python uses this argument order, as do other OO languages like ruby which uses re.match(string).

You are right that julia's libraries were not designed with this chaining thing in mind. Fully admitting my lisp bias, I've never wanted many different syntaxes for function calls.

bramtayl · 2015-06-16T16:23:11Z

R's apply functions and plyr functions all have data arguments first (with the notable exception of mapply).

Edit: not to mention as.numeric, as.character, and family

johnmyleswhite · 2015-06-16T16:24:27Z

And the notable exception of R's Map function.

bramtayl · 2015-06-16T18:39:49Z

Another one is write

JeffBezanson · 2015-06-16T18:50:15Z

Well then we appear to be at an impasse. It looks like vast numbers of key functions (write, convert, map, ...) are incompatible with this redesign. As you point out, changing all of these is not really practical. However these functions are so fundamental that it wouldn't be practical to do things differently in the future either; certainly we can't have half of our I/O functions use f(obj, io) and half go the other way.

simonster · 2015-06-16T20:16:34Z

I think there's an argument to be made for switching the argument order for match. At present it's inconsistent with search, replace, and the rest of the string functions.

nalimilan · 2015-06-16T20:38:01Z

...but findin uses the same order as match. The good thing with match and co. is that their arguments can easily be swapped and a deprecation added, there's no ambiguity thanks to the Regex argument type. That's harder for findin, though the function could also be renamed to work around this (cf. #10593).

I'm all for consistency, though honestly I would rather have changed all functions to search for their first argument in their second one. Maybe that's just me. Anyway, for chaining, it's not clear to me whether you'd more often pass one or another (both are "data").

bramtayl · 2015-06-18T04:02:50Z

Ok, here's a make-shift solution. It allows users to either vectorize or switch around the arguments of functions or both.

using Lazy

# convert singletons to a 1 entry vector
function to_array(x)
    if (@> x typeof) <: AbstractArray
      x
    else
      [x]
    end
end

# switch the first and second items in a tuple
function switch_tuple(tuple)
  if (@> tuple length) == 2
    index = [2, 1]
  else
    index = [2, 1, 3:(@> arguments length)]
  end
  tuple[index]
end

# return an expression, which suffixes a function and reorders the arguments
function switch(function_symbol::Symbol)
  suffixed_function_string = string(function_symbol) * "_s"
  suffixed_function_symbol = @> suffixed_function_string parse

  quote
    function $suffixed_function_symbol(arguments...)
      $function_symbol((@> arguments switch_tuple)...)
    end
    @> $suffixed_function_string parse
  end
end

# return an expression, which suffixex a function 
# and maps/broadcasts a function over an argument/arguments respectively
function vectorize(function_symbol::Symbol)
  suffixed_function_string = string(function_symbol) * "_v"
  suffixed_function_symbol = @> suffixed_function_string parse
  quote
    function $suffixed_function_symbol(arguments...)
      arguments = map(to_array, arguments)
      if length(arguments) == 1
        map($function_symbol, arguments...)
      else
        broadcast($function_symbol, arguments...)
      end
    end
    @> $suffixed_function_string parse
  end
end

# vectorize some functions
@> :vectorize vectorize eval
@> :switch vectorize eval
@> :eval vectorize eval

# lists of functions to reverse and vectorize or just vectorize
reverse_and_vectorize = [:ismatch, :write, :convert]
just_vectorize = [:replace]

@> begin
  # first reverse functions in reverse_and_vectorize
  reverse_and_vectorize
  switch_v
  eval_v
  # add in just_vectorize functions
  vcat(just_vectorize)
  # vectorize
  vectorize_v
  eval_v
end

# test
@> ["a", "b"] ismatch_s_v(r"a")

See also #8450

The argument switch suffers (I think) from unnecessarily copying the arguments. Is there a way to unpack a tuple in a particular order?

Edit: fixed anonymous function issue

yuyichao · 2015-06-18T04:10:22Z

@bramtayl why anonymous function (bound to non-const global) to_array ?

bramtayl · 2015-06-18T04:11:47Z

Oops I'm still getting used to Julia style function definitions. See edit. It's there to be able to broadcast singletons.

JeffBezanson · 2015-06-18T04:38:07Z

That's kind of clever. But I don't see how this is a huge improvement over write(io, _), or just using normal syntax. If you know about write, it's easy to see what write(io,_) does, while write_s(io) and @> seem pretty obscure to me.

I also think (#8450) that ideally iteration is something you do with an existing function, not something that requires a new definition for each function. Nobody should decide which functions get _v versions; you can write map(f, x) when needed. Or maybe this could be part of the operator; for example @.> A write(io, _) could mean for x in A; write(io, x); end. But again I would argue the for loop version is intelligible even to people who don't know the language.

bramtayl · 2015-06-18T04:43:46Z

The advantage of write_s would be that you can write out a chain without naming anything. Of course, names make debugging easier, but reading code harder.

@> begin
  text
  # a whole bunch of string processing
  write_s(conn)
end

With out that, your code would look like this:

@as _ begin
  text
  # string processing with a bunch of unnecessary _'s
  write(conn, _)
end

JeffBezanson · 2015-06-18T04:50:23Z

For me, the greater regularity of reusing the same write function, and not needing to set up a definition to make write_s exist, make the second version the winner. Maybe others will weigh in.

johnmyleswhite · 2015-06-18T05:09:50Z

Given that chaining makes many function arguments implicit (and therefore makes line-local reasoning more difficult), I generally think it makes code harder to read. I also agree that having a single canonical write function is more important than accommodating macro-based DSL's.

bramtayl · 2015-06-18T06:41:06Z

If I had to map, broadcast, or for loop (!) every time I use a function iteratively (pretty much always) AND had to write code that was riddled with under-scores, I'd probably give up and go back R. Consider the chain above without any help:

reverse_calls = map(switch, reverse_and_vectorize)
reverse_symbols = map(eval, reverse_calls)
both_symbols = vcat(reverse_symbols, just_vectorize)
vectorize_calls = map(vectorize, both_symbols)
vectorize_symbols = map(eval, vectorize_calls)

Useful for debugging, but it doesn't seem likely that any of these items cluttering up the environment will be used again.

or, with underscores:

@as _ begin
  reverse_and_vectorize
  map(switch, _)
  map(eval, _)
  vcat(_, just_vectorize)
  map(vectorize, _)
  map(eval, _)
end

Not even going to bother with for loops.

Yes, obviously there is such a thing as too much chaining. You might argue that the argument switching and vectorization should be done in two separate chains. But chaining also organizes code and clarifies structure.

nalimilan · 2015-06-18T09:46:42Z

If I had to map, broadcast, or for loop (!) every time I use a function iteratively (pretty much always) AND had to write code that was riddled with under-scores, I'd probably give up and go back R.

To me, a short vectorization syntax is a major need and that's what #8450 is supposed to deal with. In your example, the code would be much shorter already, and you might accept suffering a few underscores for chaining if vectorization allowed merging a few lines of the chain.

Also note that R does not provide any native support for chaining, so it's not like this kind of thing couldn't be done in Julia as well.

johnmyleswhite · 2015-06-18T13:40:34Z

As @JeffBezanson said before, we seem to be at an impasse. This issue seems primarily focused on code aesthetics and it seems that several other Julia developers don't share your aesthetic sensibilities.

It sounds like there are several specific functions, like match, that people would consider changing for consistency. But consistency for the sake of simplifying chaining doesn't seem sufficient to justify making so many breaking changes.

johnmyleswhite · 2015-06-18T13:55:26Z

For me the decisive issue when debating these kinds of DSL-specific concerns is this: given that you want alternative surface syntax for writing identical semantics, why not just write an actual DSL that gets translated to Julia code? Why does Julia syntax need to match the syntax of your ideal DSL?

I think people tend to overuse shared-parser DSL's for this kind of use case. If you want truly independent syntax, a separate-parser DSL is the way to go. It has higher start-up cost for the DSL-developer, but completely frees you from having to reach consensus with others about your preferred syntax.

bramtayl · 2015-06-18T14:14:24Z

If you play your cards right, you might get armies of @hadley followers switching to Julia in the next few years, all of which are pretty used to chaining (and the kind of things in DataFramesMeta). I'm certain I couldn't tackle writing a new language. But maybe a package?

johnmyleswhite · 2015-06-18T14:41:43Z

I, for one, don't see that as a goal worth pursuing given that I work on Julia in my free time. I'd vastly prefer having a language that can be used for the things that R will never be good at than a language that tries to emulate what R can already do well enough.

The problem with the idioms you're advocating for is that they don't come equipped with any fleshed out solutions to the issues of semantics that have held back work on #8450. The surface syntax of a replacement for vectorization is the least difficult part of what needs to be done to remove vectorization from Julia. The important issue is designing a set of semantics that's amenable to compilation to efficient code. That depends on progress on integrating functions into Julia's type system in such a way that multiple dispatch can operate effectively when using higher-order functions. See Jeff's thesis for some ideas about how this might be done and packages like FastAnonymous.jl for interim improvements.

bramtayl · 2015-06-18T17:13:45Z

For most applications, the bottleneck is how long it takes to write the code, not how long it takes to run the code.

johnmyleswhite · 2015-06-18T17:14:54Z

That's completely false when you work at scale.

bramtayl · 2015-06-18T17:19:56Z

Conceded. I thought the point of Julia was to be the best of both worlds. Otherwise, why not just write in Fortran?

ScottPJones · 2015-06-18T17:24:24Z

@bramtayl I think julia does give a lot of flexibility (more than I've ever seen elsewhere) to have the best of both worlds (although there are still a lot of rough edges, but those are being worked out), and maybe you can accomplish what you want in a package, with all of the power of multiple dispatch and julia macros behind you...

johnmyleswhite · 2015-06-18T17:53:07Z

The problem is that we don't have any means for reaching an agreement about what "best" means. My take on this issue is that many of the people involved in this thread have very substantial disagreements about what good code looks like. I'm skeptical that we can resolve such large disagreements about aesthetics by talking them through.

bramtayl · 2015-06-18T19:23:51Z

Ok, I'll just keep the code for personal use only.

JeffBezanson · 2015-06-18T20:10:52Z

I agree that productivity is incredibly important, but I don't see how something like chaining syntax is drastically more productive than our normal syntax. As for vectorization, if I thought writing map every time was a good solution, then #8450 would not be an open issue.

I just realized that it's a bit odd for chaining to work on the first argument. In languages with function currying, delayed arguments are added at the end. For example you could write

x |> map(switch) |> map(eval) |> vcat(_, just_vectorize) |> map(vectorize) |> map(eval)

because map(f) means x->map(f,x). Maybe our functions are designed more for this style.

bramtayl · 2015-06-18T21:03:46Z

Maybe it's worth working for consistency in the other direction then? Is that piping to the last argument or to the second argument?

JeffBezanson · 2015-06-18T21:06:22Z

Yes, that's quite possible. I think it should pipe to the last argument.

bramtayl · 2015-06-18T23:36:26Z

Here's an extension to whole-sale vectorize all the functions in a Module.

using Lazy
using DataFrames
using DataFramesMeta

@> :typeof vectorize eval
@> :eval vectorize eval
@> :string vectorize eval
@> :convert switch eval
@> :ismatch switch eval vectorize eval

function get_functions(m::Module)
  df = @> begin
    DataFrame(symbol = @> m names)
    @transform(
      is_function = ( @> begin
                       :symbol
                       eval_v
                       typeof_v
                      .==(Function) end ) ,
      compatible = ( @> begin
                      :symbol
                      string_v
                      ismatch_s_v( r"^[A-Za-z]" )
                      convert_s( Vector{Bool} ) end ) )
    @where(:is_function & :compatible) end

  df[:symbol] end

@> Base get_functions vectorize_v eval_v

ScottPJones · 2015-06-18T23:42:43Z

Could somebody please explain the use of _ in the above example? (again, sorry for the newbie question, it's just that the only thing I can find with Google is about IJulia history variables, and the JuliaLang docs can't seem to find anything that isn't a alphanumeric string...)

bramtayl · 2015-06-18T23:45:50Z

@as is a function from Lazy.jl. The example given in the Readme (worth checking out for context) is:

# @as lets you name the threaded argmument
@as _ x f(_, y) g(z, _) == g(z, f(x, y))

The benefit of _ is that you can specify exactly where you want the previous result to be piped into the next expression. It is needed in particular if there is not consistent method of figuring out where to pipe the previous result to (i.e. the first argument, the last argument, etc.). _ is only a symbol, and @as ~ would work equally as well were it not for interfering with formulas.

Jeff's currying example is from some other language, but you might assume a somewhat equivalent function.

JeffBezanson · 2015-06-19T00:23:56Z

whole-sale vectorize all the functions in a Module

I'm really not a fan of this. It's clearly the wrong abstraction: instead of the function and the iteration being treated as orthogonal (which they really are), it doubles the number of definitions in a module without regard for which of the new definitions actually make sense. Concepts should be composed using general mechanisms, not by concatenating names with underscores.

I continue to fail to understand the advantage of @> m names over names(m). Isn't this just deliberately obscure?

bramtayl · 2015-06-19T00:26:30Z

Yeah, I was trying to use chaining as often as possible for illustrative purposes. Might have gone a bit over-board. Agreed that doubling the number of functions in a module is a little ridiculous, but until #8450 gets sorted it it might be useful, especially if no one else starts using _v for something else.

bramtayl · 2015-06-19T14:14:10Z

It's also worth noting that the code above can be rewritten with Lazy's @>> which pipes to the last argument. This wouldn't have worked for other string processing functions like replace, though.

using Lazy
using DataFrames
using DataFramesMeta

@> :vectorize vectorize eval
@> :eval vectorize eval
@> [:typeof, :string, :ismatch] vectorize_v eval_v

function get_functions(m::Module)
  df = @> begin
    DataFrame(symbol = @> m names)
    @transform(
      is_function = ( @>> begin
                       :symbol
                       eval_v
                       typeof_v
                      .==(Function) end ) ,
      compatible = @>> begin
                      :symbol
                      string_v
                      ismatch_v( r"^[A-Za-z]" )
                      convert( Vector{Bool} ) end )
    @where(:is_function & :compatible) end

  df[:symbol] end

@> Base get_functions vectorize_v eval_v

Edit: an extension for multiple packages:

function make_functions(m::Module)
  quote
    @> $m get_functions switch_v eval_v
    @> $m get_functions vectorize_v eval_v switch_v eval_v
  end
end

@> :make_functions vectorize eval

@> [Base, Lazy] make_functions_v eval_v

StefanKarpinski · 2015-06-19T14:56:55Z

I have to say that I find this style of coding pretty inscrutable – it doesn't seem like an improvement in terms of readability or writability. But I'm glad that the macro system lets you experiment like this.

fcard · 2015-06-19T16:30:36Z

I heard from some people that they like threading/piping because it lets them always read code "left to right, top to bottom", and with nesting/composition they have to find where the expression starts and where it continues to.

Some like to reason about code by describing it with phrases and it's harder to come up with words to describe print(sum(map(x->x-10, map(x->2x, A)))) than it is to describe @>> A map(x->2x) map(x->x-10) sum print, the latter is pretty straight forward: "I have A, I multiply every element by 2, then I subtract 10 from every element, then I sum it, then I print it."

Some other people have said they get lost in nesting easily, always reading expressions all at once.
And some other people just said threading is cooler looking :P

bramtayl · 2015-06-19T23:32:45Z

I have no idea how to extend this to work with macro functions, seeing as you can't use the splat operator with them.

carlobaldassi · 2015-06-20T00:31:18Z

the latter is pretty straight forward: "I have A, I multiply every element by 2, then I subtract 10 from every element, then I sum it, then I print it."

Which also exemplifies one additional usage pattern in which this style of appending operations at the end generally helps, that is building expressions step by step at the REPL while looking at the output, shell-style (if performance is not your primary concern, obviously).

kmsquire · 2015-06-20T00:53:55Z

Seems like this is a dup of #5571?

JeffBezanson · 2015-06-20T22:00:35Z

Yes, I think this discussion can be continued in #5571.

apetrushin · 2016-03-30T08:13:38Z

Making the function the first argument to map is universal. Is there even one language that doesn't do that?

Ruby list.map! {|x| x + 1 }.
Elixir Enum.map list, fn(x) -> x + 1 end.
JS list.map(function(x) { x + 1}).
Ugly Java list.stream().map(x -> x + 1).toArray().

While 4 of those are OOP, the order of arguments is kinda mimic the functional style when the list goes first and the function goes the second.

The Python and Clojure use another convention when function goes first.

bramtayl changed the title ~~Consistent argument order~~ Consistent argument order/Vectorization Jun 18, 2015

bramtayl changed the title ~~Consistent argument order/Vectorization~~ Consistent argument order, Vectorization Jun 18, 2015

JeffBezanson closed this as completed Jun 20, 2015

Consistent argument order, Vectorization #11722

Consistent argument order, Vectorization #11722

Comments

bramtayl commented Jun 16, 2015

StefanKarpinski commented Jun 16, 2015

JeffBezanson commented Jun 16, 2015

JeffBezanson commented Jun 16, 2015

bramtayl commented Jun 16, 2015

JeffBezanson commented Jun 16, 2015

bramtayl commented Jun 16, 2015

johnmyleswhite commented Jun 16, 2015

bramtayl commented Jun 16, 2015

JeffBezanson commented Jun 16, 2015

simonster commented Jun 16, 2015

nalimilan commented Jun 16, 2015

bramtayl commented Jun 18, 2015

yuyichao commented Jun 18, 2015

bramtayl commented Jun 18, 2015

JeffBezanson commented Jun 18, 2015

bramtayl commented Jun 18, 2015

JeffBezanson commented Jun 18, 2015

johnmyleswhite commented Jun 18, 2015

bramtayl commented Jun 18, 2015

nalimilan commented Jun 18, 2015

johnmyleswhite commented Jun 18, 2015

johnmyleswhite commented Jun 18, 2015

bramtayl commented Jun 18, 2015

johnmyleswhite commented Jun 18, 2015

bramtayl commented Jun 18, 2015

johnmyleswhite commented Jun 18, 2015

bramtayl commented Jun 18, 2015

ScottPJones commented Jun 18, 2015

johnmyleswhite commented Jun 18, 2015

bramtayl commented Jun 18, 2015

JeffBezanson commented Jun 18, 2015

bramtayl commented Jun 18, 2015

JeffBezanson commented Jun 18, 2015

bramtayl commented Jun 18, 2015

ScottPJones commented Jun 18, 2015

bramtayl commented Jun 18, 2015

JeffBezanson commented Jun 19, 2015

bramtayl commented Jun 19, 2015

bramtayl commented Jun 19, 2015

StefanKarpinski commented Jun 19, 2015

fcard commented Jun 19, 2015

bramtayl commented Jun 19, 2015

carlobaldassi commented Jun 20, 2015

kmsquire commented Jun 20, 2015

JeffBezanson commented Jun 20, 2015

apetrushin commented Mar 30, 2016