-
-
Notifications
You must be signed in to change notification settings - Fork 438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented a variety of segmented algorithms #2859
Conversation
This implements several missing algorithms from #1338 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work Ajai! I have left few comments suggesting improvements. Most of them are minor nitpicks, except the question of invoking the binary operator in algorithms from adjacent family
. We should discuss this issue further.
dest, std::forward<Op>(op), is_seq()); | ||
} | ||
|
||
// forward declare the non-segmented version of this algorithm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there a forward declaration if you are including the definition with algorithms/adjacent_difference.hpp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is being done in all the segmented algorithms I have implemented. I started with the existing for_each as base, which contained this forward declaration. It is also there in the existing scans and the generate etc. I am not sure if it is needed, but I have just been following the standard. :P
@@ -0,0 +1,100 @@ | |||
// Copyright (c) 2017 Ajai V George |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests include only a case where the difference in adjacent pairs is constant. I think it should be extended with at least one case where it is either constantly increasing/decreasing or random.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A test case with an operator (x, y) -> y would produce non-constant results which can be easily checked on the vector with N elements (0, 1, ..., n-1).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check the current test cases.
std::true_type(), beg, end, ldest, op); | ||
|
||
beginning = traits1::compose(sit, beg); | ||
if(beginning != last) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, here you apply the difference operator to the adjacent pair (the last value in the previous segment, the first value in the current segment)? I think it will cause an unnecessary communication between localities. I think this could be simplified by dispatching a modified function object which accepts an additional parameter and returns the last value in the segment. I think it's worth saving time by sending data together.
@hkaiser, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, So i will create another function object for both adjacent difference and adjacent find which accept a parameter for the previous value. Also just to confirm, the function object should return the last value right? Not an iterator to the last input value? Currently the function object returns an iterator to the last output.
What about the case of the parallel version? As you know parallel and sequential versions cannot call different function objects. So the parallel version will be an exact copy of the existing parallel version, right?
while(start != between_segments.end()) | ||
{ | ||
FwdIter2 curr = dest; | ||
std::advance(curr, std::distance(first, *start)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're invoking synchronously op
on the executing thread, the execution time of this code depends on the latency of communication between localities. Two reads and a one write are necessary for each call to op
. There are different ways to do the same thing, though. For example, could we attach a continuation to each future which would immediately write the last value of adjacent_difference
on a given segment to the first element of next segment? It would only require disabling the first copy in adjacent_difference
function object to avoid a race condition.
Perhaps it would be good to make a quick benchmark comparing how much time could be spent in this section, compared to the parallel execution. I might be wrong about this.
@hkaiser, do you have an opinion on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, so the parallel function object will be exactly same except the first value is ignored. Instead I will attach a then clause to the future, which will call the function object on the first value of current segment and last value of previous segment and write the result to the correct position. Am I right? Also something similar in adjacent find right?
output = traits::compose(sit, out); | ||
} | ||
} | ||
FwdIter ending = traits::compose(sit, std::prev(end)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that arguments of this logical conjunction should be reversed. The comparison should be performed only iff found = false
.
std::vector<bool> res = | ||
hpx::util::unwrap(std::move(r)); | ||
auto it = res.begin(); | ||
while (it != res.end()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a problem, but a small remark: this loop could be replaced by one call to std::all_of
.
>::call(r, errors); | ||
std::vector<bool> res = | ||
hpx::util::unwrap(std::move(r)); | ||
auto it = res.begin(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, std::any_of
.
local_iterator_type1 end1 = traits1::local(last1); | ||
if (beg1 != end1) | ||
{ | ||
overall_result = hpx::util::invoke(red_op, overall_result, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation and braces locations are slightly confusing here. The style used later in this function (line 292) is much easier to read.
> forced_seq; | ||
|
||
|
||
std::vector<shared_future<T> > segments; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as any_of
etc. Is a shared_future
really necessary here?
template <typename T> | ||
void initialize(hpx::partitioned_vector<T> & xvalues) | ||
{ | ||
T init_array[SIZE] = {1,2,3,4, 5,1,2,3, 1,5,2,3, 4,2,3,2, 1,2,3,4, 5,6,5,6, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a test using the binary predicate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check the current test cases.
@mcopik, I will try to fix the minor issues by today and then work on the new function objects. |
Now that GSoC is formally over - what is the state of this PR? |
@hkaiser @ajaivgeorge has fixed some issues which I have mentioned but I believe that two algorithms should be improved. Right now, there is an additional round of communication between segments after finishing the dispatched work. I think it should be possible to hide latencies introduced by this additional communication. |
@hkaiser This PR is almost ready for merging. @mcopik I am working on eliminating the additional round of communication in the sequential versions of adjacent_find and adjacent_difference with a custom function object which returns the first value (as you described). Will try to get that done by evening. Regarding the parallel version, I do not see how appending a future will eliminate the extra round of communication. |
@mcopik I have been trying to implement the custom function object for the sequential version to reduce communication overhead, as you suggested. For this, the new function objects return the last value of each segment, which is taken as a parameter by the function object working on the next segment. An issue is that, I also need to finally return an iterator to the last destination computed as per the specification of adjacent_find/adjacent_difference. This is easy enough to compute when all the segments are successfully computed. But what happens if one of the segments errors out and is unable to compute the entire result. So last computed dest != end of dest segment. How will this error be found out if the function object does not return this iterator. I suppose I could return a pair<last element in input, last computed iterator in dest> but this does not seem very elegant. What do you suggest? |
@ajaivgeorge, @mcopik what will happen to this PR? Should we abandon it? How much work is left to get this into the main branch? |
@hkaiser Ignoring merge conflicts, the PR is ready - all algorithms seem to be correctly implemented. I have few doubts about the most optimal way to achieve that, that's all. |
@ajaivgeorge Can you resolve those conflicts? |
@mcopik, @ajaivgeorge I see the source repo has been deleted now. Is there anything we could do to help getting this merged? Would it be possible to for example leave out the problematic parts so that we could have at least some of the work in master? Would be a shame to have all the work thrown away. |
@ajaivgeorge Ajai, why did you delete the repository? @msimberg I might have a fork on my old machine. I didn't fork his repo on GH. |
@msimberg we should still be able to get the patch (we have the hashes of all commits). |
@hkaiser, awesome, thanks! |
Closing this in favor of #3525. |
Implemented following segmented algorithms -
Unit tests provided for each.
See #1338