-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-5036][Graphx]Better support sending partial messages in Pregel API #3866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
test this please |
|
Test build #25029 has finished for PR 3866 at commit
|
|
ok to test |
|
Test build #25034 has finished for PR 3866 at commit
|
|
Test build #25042 has finished for PR 3866 at commit
|
|
Test build #25045 has finished for PR 3866 at commit
|
1cb8770 to
6da1939
Compare
|
Test build #25098 has finished for PR 3866 at commit
|
|
Test build #25099 has finished for PR 3866 at commit
|
|
I'm not sure I understand the problems you're trying to solve in this PR. Is this an accurate summary:
|
|
@ankurdave thanks for reviewing :)
|
|
@ankurdave
2 |
|
@rxin @ankurdave have any other problem? |
|
cc @andrewor14 |
|
Can one of the admins verify this patch? |
|
I'm going to close this pull request. If this is still relevant and you are interested in pushing it forward, please open a new pull request. Thanks! |
Better support sending partial messages in Pregel API
1. the reqirement
In many iterative graph algorithms, only a part of the vertexes (we call them ActiveVertexes) need to send messages to their neighbours in each iteration. In many cases, ActiveVertexes are the vertexes that their attributes do not change between the previous and current iteration. To implement this requirement, we can use Pregel API + a flag (e.g.,
bool isAttrChanged) in each vertex's attribute.However, after
aggregateMessageormapReduceTripletsof each iteration, we need to reset this flag to the init value in every vertex, which needs a heavyjoinVertices.We find a more efficient way to meet this requirement and want to discuss it here.
Look at a simple example as follows:
In i-th iteartion, the previous attribute of each vertex is
Attrand the newly computed attribute isNewAttr:Our requirement is that:
Attrto beNewAttrin i-th iterationAttr!=NewAttr, send message to its neighbours in the next iteration'saggregateMessage.We found it is hard to implement this requirment using current Pregel API efficiently. The reason is that we not only need to perform
pregel()to compute theNewAttr(2) but also need to performoutJoin()to satisfy (1).A simple idea is to keep a
isAttrChanged:Boolean(solution 1) orflag:Int(solution 2) in each vertex's attribute.2. two solution
2.1 solution 1: label and reset
isAttrChanged:Booleanof Vertex Attrinit message by
aggregateMessageit return a messageRDD
innerJoincompute the messages on the received vertex, return a new VertexRDD which have the computed value by customed logic function
vprog, setisAttrChanged = trueouterJoinVerticesupdate the changed vertex to the whole graph. now the graph is new.
aggregateMessage. it return a messageRDDjoinVerticesreset ereryisAttrChangedof Vertex attr to falsehere need to reset the vertex attribute object's variable as false
if don't reset the
isAttrChanged, it will send message next iteration directly.result:
solution 2. color vertex. (Choose this)
iterate process:
vprogusing as a partial function, looks likevprog(curIter, _: VertexId, _: VD, _: A)i = i + 1; val curIter = i.in
vprog, user can fetchcurIterand assign tofalg.graph = graph.outerJoinVertices(changedVerts) { (vid, old, newOpt) => newOpt.getOrElse(old)}.cache()sendMsg is partial function, looks like
sendMsg(curIter, _: EdgeContext[VD, ED, A]in
sendMsg, comparecurIterwithflag, determine whether sending messageresult
raw data from
the end
i think the second solution(Pregel + a flag) is better.
this can really support the iterative graph algorithms which only part of the vertexes send messages to their neighbours in each iteration.