Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: forall with filtering appears not to be parallel #26384

Open
mppf opened this issue Dec 10, 2024 · 2 comments
Open

[Bug]: forall with filtering appears not to be parallel #26384

mppf opened this issue Dec 10, 2024 · 2 comments

Comments

@mppf
Copy link
Member

mppf commented Dec 10, 2024

Summary of Problem

Description:
I'm working on an algorithm that would benefit from a parallel filter. However with a quick check, it looks like this is not currently parallel at all. Should it be? Should it be warned about? What would it take to get it working, including with a distributed-memory forall ?

I'm talking about syntax like the spec describes

[i in 1..s.size] if i % 2 == 1 then s(i)

Is this issue currently blocking your progress?
no

Steps to Reproduce

Source Code:

use Random;
use Time;

config const n = 1000000000;

proc main() {
  var A:[0..<n] int;
  fillRandom(A, min=0, max=9, seed=1);

  // count the zeros
  writeln("counting");
  var numZeros = 0;
  forall x in A with (+ reduce numZeros) {
    if x == 0 then numZeros += 1;
  }

  // store the positions of the zeros in parallel
  writeln("forall loop with filter stored in untyped");
  var t: stopwatch;
  t.start();
  var ZerosP1 = [i in A.domain] if A[i] == 0 then sqrt(sqrt(i:real));
  t.stop();
  writeln("elapsed ", t.elapsed());
}

I see best performance with --dataParTasksPerLocale=1.

Compile command:
chpl --fast aa.chpl

Execution command:

./aa --n=1000000000 --dataParTasksPerLocale=8
./aa --n=1000000000 --dataParTasksPerLocale=1

Associated Future Test(s):
TODO

Configuration Information

2.3 pre-release on an Ubuntu PC.

@bradcray
Copy link
Member

bradcray commented Dec 10, 2024

I'm curious: is it not parallel only when capturing the result into a new variable, or even when simply doing something standalone like [i in -10..10] if i <= 10 then writeln(i);? Checking, it looks like a standalone case is parallel, as expected/hoped.

I often wonder about just this case because (I believe) our current approach to capturing a filtered iterator expression into a new variable (*) is to do recursive doubling of the array as we compute it, and it's hard to imagine that proceeding in parallel. I'm guessing that's why this isn't parallelized. [Edit: Actually, as noted in #26385, even when the array is typed and we know its size, it's serialized because we can't anticipate how the filtered items will be numbered if we chunk of the iteration space in our normal way].

If the expression were written as var ZerosP1 = forall i in A.domain do if A[i] == 0 then sqrt(sqrt(i:real));, was serialized, and we didn't generate a warning, I think we definitely should in that case because forall is meant to assert parallelism. For the square bracket version, I think we probably should warn as well because we typically say that the interpretation is like forall if there's a parallel iterator (and ranges have one), a for otherwise. Though I could imagine an argument that we could say "square bracket expressions are parallel if possible, serial if not" and that maybe this is a case where we quietly fall back to the serial implementation. Still, I'm not wild about that.

Should it be parallelized in cases that can be? I think that sounds attractive, but it's not obvious to me what implementation we should use for efficiency and to maintain ordering. Do you have thoughts there?

(* = or really, maybe, any iterable expression in which we can't compute the size a priori?)

@bradcray
Copy link
Member

@jabraham17 and I had a conversation that was thematically similar to this issue today in which they wanted to capture the results of a parallel iterator into an array and didn't care about the order in which the results were captured into the array (motivated by today's day11 AoC problem). In the conversation, we discussed that a challenge in such cases is not only not knowing how many things will be yielded by the iterator, but also the array nature tending to require that ordering is preserved, by some definition. That led us to wonder whether initializing a data structure that did not have ordering as part of its key properties—such as an associative domain, set, or potentially associative array—should support init= that took an iterable expression and was able to execute it in parallel, adding the results into the set/whatever without worrying about ordering (just thread safety).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants