Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EEP 78: Multi-valued comprehensions #75

Merged
merged 1 commit into from
Mar 12, 2025

Conversation

michalmuskala
Copy link
Contributor

TL;DR: This EEP proposes enhancing the comprehension syntax to allow emitting multiple
elements in a single iteration of the comprehension loop - effectively enhancing
comprehensions to implement flatmap with a fixed number of elements, for example:

[X + 1, X + 2, ... || X <- Xs]

@elbrujohalcon
Copy link

Not as a criticism, just out of curiosity… What will these expressions evaluate to?

[ Z = binary_to_integer(X), Z || X <- [<<"1">>, <<"2">>, <<"3">>] ].
%% My guess: [1, 1, 2, 2, 3, 3].
[ hello, goodbye || true ].
%% My guess: [hello, goodbye].
#{hello => Hello = hello, Hello => goodbye} || true }.
%% My guess: #{hello => goodbye}.

Maybe you can even add some of them as tests in erlang/otp#9374

@michalmuskala
Copy link
Contributor Author

The first expression will be an error:

test.erl:6:31: variable 'Z' is unbound
%    6|   [ Z = binary_to_integer(X), Z || X <- [<<"1">>, <<"2">>, <<"3">>] ].
%     |                               ^

Similar how it would be in a plain list [X = binary_to_integer(~"1"), X].

The second will indeed yield as you expected:

3> [ hello, goodbye || true ].
[hello,goodbye]

The third will again error with Hello variable undefined:

test.erl:6:29: variable 'Hello' is unbound
%    6|   #{hello => Hello = hello, Hello => goodbye || true }.
%     |                             ^

The same would happen if you tried this with a plain map #{hello => Hello = hello, Hello => goodbye} won't compile.

In other words, similar to all "containers" expressions are evaluated "in parallel" not "in sequence" as far as variable scoping is concerned.

@Maria-12648430
Copy link
Contributor

Hm, there is an unconvenient limitation, namely that it is not possible to decide if something should be inserted once, twice, thrice etc, or not at all (yes, that can and still has to be done with filters). That is, with the proposed syntax, it is only possible to add elements a fixed number of times, which IMO puts a severe limitation on the ergonomics the extension would otherwise provide.

That said, I also find it somewhat confusing. It looks like a set of expressions where, everywhere else, the last would be the result, but here the result of each expression is inserted.

@elbrujohalcon
Copy link

That said, I also find it somewhat confusing. It looks like a set of expressions where, everywhere else, the last would be the result, but here the result of each expression is inserted.

I guess you should think of it as the list of expressions at the "head" of a cons operator…

1> [1, 2, 3 | lists:seq(4, 10)].
[1,2,3,4,5,6,7,8,9,10]

I presume that's what @michalmuskala was using as inspiration.

@Maria-12648430
Copy link
Contributor

I guess you should think of it as the list of expressions at the "head" of a cons operator…

Ok, makes sense :)

Comment on lines +47 to +50
Binary comprehensions already support this, and thus there's no enhancement to their syntax
suggested in this EEP, for example:

<< <<(X + 1), (X + 2)>> || <<X>> <= Bin>>.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true, but also the multi-value syntax should be applicable to it, if only for the sake of consistency and completeness.

I think a binary comprehension like...

<< <<(X + 1)>>, <<(X + 2)>> || <<X>> <= Bin >>.

... should be accepted and result in a concatenation of the generated binaries, that is, same as if written <<(X + 1), (X + 2)>>.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's a good point. TBH the syntax for binary comprehensions is quite noisy already and I didn't want to muddle it more

@kikofernandez
Copy link
Contributor

Hi @michalmuskala , thanks for your contribution.
Did you create a topic in the ErlangForums? I did not see, but maybe I was not looking too much in depth

@michalmuskala
Copy link
Contributor Author

@kikofernandez I've created one now: https://erlangforums.com/t/eep-78-multi-valued-comprehensions/4537

@TD5
Copy link
Contributor

TD5 commented Feb 25, 2025

The proposal as stated by @michalmuskala would be incredibly useful for me, since I have a lot of hot-code which ends up with precisely the style/performance trap described with:

lists:append([[X + 1, X + 2] || X <- Xs]

or

[Tmp || X <- Xs, Tmp <- [X + 1, X + 2]]

Hm, there is an unconvenient limitation, namely that it is not possible to decide if something should be inserted once, twice, thrice etc, or not at all (yes, that can and still has to be done with filters). That is, with the proposed syntax, it is only possible to add elements a fixed number of times, which IMO puts a severe limitation on the ergonomics the extension would otherwise provide.

I am not sure about an arbitrary number of elements, but for a closed-set of elements to be prefixed, I wonder whether something akin to this is possible:

[
  case X of
    one -> X + 1;
    two -> X + 1, X + 2;
    three -> X + 1, X + 2, X + 3
  end
|| X <- List]

Of course, this precise syntax is not a good idea at all, since there's now a strange new kind of expression in the tail position of a case-expression, but presumably we could devise some syntax for inside comprehension to conditionally determine the elements to emit.

@kikofernandez
Copy link
Contributor

@kikofernandez I've created one now: https://erlangforums.com/t/eep-78-multi-valued-comprehensions/4537

I will merge the proposal into the EEP repo Monday next week. Just to leave some time in case there is more feedback from the community towards it :)

Thanks for your contribution!

@Maria-12648430
Copy link
Contributor

Of course, this precise syntax is not a good idea at all, since there's now a strange new kind of expression in the tail position of a case-expression, but presumably we could devise some syntax for inside comprehension to conditionally determine the elements to emit.

It's not a new kind of expression, this is valid today, with the last expression of a case branch being the one that gets inserted in the list. Which is definitely worse in more than one regard than strange new expressions.

That said, I am also not a huge fan of introducing brand new syntaxes that only can be used in this particular context of comprehensions. I would argue that with things getting as complex as this, it is out of the scope of what is at the end of the day a shorthand syntax, and explicit recursion or a fold should be used.

@kikofernandez kikofernandez merged commit 1de4e51 into erlang:master Mar 12, 2025
1 check passed
@michalmuskala michalmuskala deleted the multi-comp branch March 12, 2025 12:13
@richcarl
Copy link
Contributor

As @TD5 noted, this is limited to static repetitions, and the more general and common pattern that it would be nice to address is when you generate varying sublists to be joined, as in lists:append([f(X) || X <- ... ]). If there existed an appending comprehension (using e.g. <|>), it could be written [f(X) <|> X <- ... ]. The question is however whether we can gain any efficiency. Unless we can hope for deforestation/stream fusion optimization to kick in (unlikely), there will still need to be intermediate sublists under the hood as return values from f(X). At some point these sublists must be traversed to build the final result. Making a temporary list of lists and appending them at the end is in fact among the cheapest possible ways to implement this - like IOlists and scatter/gather in general. (The fastest is to build the sublists in reverse and join them up with lists:reverse/2 - also known as "reverse onto", if you really must have all the speed.)

Suppose you write a "deforested" version: if it's recursive it will need to allocate a stack frame per element to hold intermediate data, and if it's tail recursive it will need to build an accumulator list and reverse it in the end. The amount of work is much the same as the appending version - the only real difference would be a shorter syntax (with yet another symbol <|>). So to summarize: the EEP's suggestion is simple and can be efficient, though limited, but maybe it's the best we can do. For the general case, don't fear the append.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants