Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented a variety of segmented algorithms #2859

Closed
wants to merge 11 commits into from
Closed

Implemented a variety of segmented algorithms #2859

wants to merge 11 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Aug 24, 2017

Implemented following segmented algorithms -

  1. all_of, any_of, none_of
  2. binary transform_reduce
  3. adjacent_find
  4. adjacent_difference

Unit tests provided for each.

See #1338

@ghost
Copy link
Author

ghost commented Aug 24, 2017

@hkaiser, @mcopik Please have a look.

@mcopik
Copy link
Contributor

mcopik commented Aug 24, 2017

This implements several missing algorithms from #1338

Copy link
Contributor

@mcopik mcopik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work Ajai! I have left few comments suggesting improvements. Most of them are minor nitpicks, except the question of invoking the binary operator in algorithms from adjacent family. We should discuss this issue further.

dest, std::forward<Op>(op), is_seq());
}

// forward declare the non-segmented version of this algorithm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there a forward declaration if you are including the definition with algorithms/adjacent_difference.hpp?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is being done in all the segmented algorithms I have implemented. I started with the existing for_each as base, which contained this forward declaration. It is also there in the existing scans and the generate etc. I am not sure if it is needed, but I have just been following the standard. :P

@@ -0,0 +1,100 @@
// Copyright (c) 2017 Ajai V George
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests include only a case where the difference in adjacent pairs is constant. I think it should be extended with at least one case where it is either constantly increasing/decreasing or random.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test case with an operator (x, y) -> y would produce non-constant results which can be easily checked on the vector with N elements (0, 1, ..., n-1).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the current test cases.

std::true_type(), beg, end, ldest, op);

beginning = traits1::compose(sit, beg);
if(beginning != last)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, here you apply the difference operator to the adjacent pair (the last value in the previous segment, the first value in the current segment)? I think it will cause an unnecessary communication between localities. I think this could be simplified by dispatching a modified function object which accepts an additional parameter and returns the last value in the segment. I think it's worth saving time by sending data together.

@hkaiser, what do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, So i will create another function object for both adjacent difference and adjacent find which accept a parameter for the previous value. Also just to confirm, the function object should return the last value right? Not an iterator to the last input value? Currently the function object returns an iterator to the last output.

What about the case of the parallel version? As you know parallel and sequential versions cannot call different function objects. So the parallel version will be an exact copy of the existing parallel version, right?

while(start != between_segments.end())
{
FwdIter2 curr = dest;
std::advance(curr, std::distance(first, *start));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're invoking synchronously op on the executing thread, the execution time of this code depends on the latency of communication between localities. Two reads and a one write are necessary for each call to op. There are different ways to do the same thing, though. For example, could we attach a continuation to each future which would immediately write the last value of adjacent_difference on a given segment to the first element of next segment? It would only require disabling the first copy in adjacent_difference function object to avoid a race condition.

Perhaps it would be good to make a quick benchmark comparing how much time could be spent in this section, compared to the parallel execution. I might be wrong about this.

@hkaiser, do you have an opinion on this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so the parallel function object will be exactly same except the first value is ignored. Instead I will attach a then clause to the future, which will call the function object on the first value of current segment and last value of previous segment and write the result to the correct position. Am I right? Also something similar in adjacent find right?

output = traits::compose(sit, out);
}
}
FwdIter ending = traits::compose(sit, std::prev(end));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that arguments of this logical conjunction should be reversed. The comparison should be performed only iff found = false.

std::vector<bool> res =
hpx::util::unwrap(std::move(r));
auto it = res.begin();
while (it != res.end())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a problem, but a small remark: this loop could be replaced by one call to std::all_of.

>::call(r, errors);
std::vector<bool> res =
hpx::util::unwrap(std::move(r));
auto it = res.begin();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, std::any_of.

local_iterator_type1 end1 = traits1::local(last1);
if (beg1 != end1)
{
overall_result = hpx::util::invoke(red_op, overall_result,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation and braces locations are slightly confusing here. The style used later in this function (line 292) is much easier to read.

> forced_seq;


std::vector<shared_future<T> > segments;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as any_of etc. Is a shared_future really necessary here?

template <typename T>
void initialize(hpx::partitioned_vector<T> & xvalues)
{
T init_array[SIZE] = {1,2,3,4, 5,1,2,3, 1,5,2,3, 4,2,3,2, 1,2,3,4, 5,6,5,6,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test using the binary predicate?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the current test cases.

@ghost
Copy link
Author

ghost commented Aug 28, 2017

@mcopik, I will try to fix the minor issues by today and then work on the new function objects.

@hkaiser
Copy link
Member

hkaiser commented Sep 5, 2017

Now that GSoC is formally over - what is the state of this PR?

@mcopik
Copy link
Contributor

mcopik commented Sep 6, 2017

@hkaiser @ajaivgeorge has fixed some issues which I have mentioned but I believe that two algorithms should be improved. Right now, there is an additional round of communication between segments after finishing the dispatched work. I think it should be possible to hide latencies introduced by this additional communication.

@ghost
Copy link
Author

ghost commented Sep 7, 2017

@hkaiser This PR is almost ready for merging.

@mcopik I am working on eliminating the additional round of communication in the sequential versions of adjacent_find and adjacent_difference with a custom function object which returns the first value (as you described). Will try to get that done by evening.

Regarding the parallel version, I do not see how appending a future will eliminate the extra round of communication.

@ghost
Copy link
Author

ghost commented Sep 11, 2017

@mcopik I have been trying to implement the custom function object for the sequential version to reduce communication overhead, as you suggested. For this, the new function objects return the last value of each segment, which is taken as a parameter by the function object working on the next segment. An issue is that, I also need to finally return an iterator to the last destination computed as per the specification of adjacent_find/adjacent_difference. This is easy enough to compute when all the segments are successfully computed. But what happens if one of the segments errors out and is unable to compute the entire result. So last computed dest != end of dest segment. How will this error be found out if the function object does not return this iterator. I suppose I could return a pair<last element in input, last computed iterator in dest> but this does not seem very elegant. What do you suggest?

@hkaiser
Copy link
Member

hkaiser commented Jan 26, 2018

@ajaivgeorge, @mcopik what will happen to this PR? Should we abandon it? How much work is left to get this into the main branch?

@mcopik
Copy link
Contributor

mcopik commented Jan 29, 2018

@hkaiser Ignoring merge conflicts, the PR is ready - all algorithms seem to be correctly implemented. I have few doubts about the most optimal way to achieve that, that's all.

@mcopik
Copy link
Contributor

mcopik commented Jan 29, 2018

@ajaivgeorge Can you resolve those conflicts?

@msimberg msimberg removed this from the 1.1.0 milestone Mar 22, 2018
@msimberg
Copy link
Contributor

@mcopik, @ajaivgeorge I see the source repo has been deleted now. Is there anything we could do to help getting this merged? Would it be possible to for example leave out the problematic parts so that we could have at least some of the work in master? Would be a shame to have all the work thrown away.

@mcopik
Copy link
Contributor

mcopik commented Sep 10, 2018

@ajaivgeorge Ajai, why did you delete the repository?

@msimberg I might have a fork on my old machine. I didn't fork his repo on GH.

@hkaiser
Copy link
Member

hkaiser commented Sep 11, 2018

@msimberg we should still be able to get the patch (we have the hashes of all commits).

@msimberg
Copy link
Contributor

@hkaiser, awesome, thanks!

@hkaiser hkaiser mentioned this pull request Nov 5, 2018
@msimberg
Copy link
Contributor

Closing this in favor of #3525.

@msimberg msimberg closed this Nov 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants