-
Notifications
You must be signed in to change notification settings - Fork 29.2k
[SPARK-20994] Remove redundant characters in OpenBlocks to save memory for shuffle service. #18231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
96d07aa
dcf156a
1e53262
8170c8a
5dd0e77
1e72eab
a2af617
6677bc9
2592ef4
5b0ce67
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -44,7 +44,6 @@ | |
| import static org.apache.spark.network.util.NettyUtils.getRemoteAddress; | ||
| import org.apache.spark.network.util.TransportConf; | ||
|
|
||
|
|
||
| /** | ||
| * RPC Handler for a server which can serve shuffle blocks from outside of an Executor process. | ||
| * | ||
|
|
@@ -91,26 +90,8 @@ protected void handleMessage( | |
| try { | ||
| OpenBlocks msg = (OpenBlocks) msgObj; | ||
| checkAuth(client, msg.appId); | ||
|
|
||
| Iterator<ManagedBuffer> iter = new Iterator<ManagedBuffer>() { | ||
| private int index = 0; | ||
|
|
||
| @Override | ||
| public boolean hasNext() { | ||
| return index < msg.blockIds.length; | ||
| } | ||
|
|
||
| @Override | ||
| public ManagedBuffer next() { | ||
| final ManagedBuffer block = blockManager.getBlockData(msg.appId, msg.execId, | ||
| msg.blockIds[index]); | ||
| index++; | ||
| metrics.blockTransferRateBytes.mark(block != null ? block.size() : 0); | ||
| return block; | ||
| } | ||
| }; | ||
|
|
||
| long streamId = streamManager.registerStream(client.getClientId(), iter); | ||
| long streamId = streamManager.registerStream(client.getClientId(), | ||
| new ManagedBufferIterator(msg.appId, msg.execId, msg.blockIds)); | ||
| if (logger.isTraceEnabled()) { | ||
| logger.trace("Registered streamId {} with {} buffers for client {} from host {}", | ||
| streamId, | ||
|
|
@@ -209,4 +190,52 @@ public Map<String, Metric> getMetrics() { | |
| } | ||
| } | ||
|
|
||
| private class ManagedBufferIterator implements Iterator<ManagedBuffer> { | ||
|
|
||
| private int index = 0; | ||
| private String appId; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I will refine. |
||
| private String execId; | ||
| private String shuffleId; | ||
| // An array containing mapId and reduceId pairs. | ||
| private int[][] mapIdAndReduceIds; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, I mean a single array. e.g. Reason being that if you really have millions of these, each "child" array in your two-dimensional array wastes 16 (or 20?) bytes (16 bytes of object overhead + 4 bytes for the array length). Looking in jvisualvm, an empty array actually consumes 24 bytes, so it seems the JVM is aligning things and wasting an extra 4 bytes per array... |
||
|
|
||
| ManagedBufferIterator(String appId, String execId, String[] blockIds) { | ||
| this.appId = appId; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wonder if you see a lot of these in your heap dump too? You could potentially intern
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @vanzin |
||
| this.execId = execId; | ||
| String[] blockId0Parts = blockIds[0].split("_"); | ||
| if (blockId0Parts.length < 4) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How about use require(blockId0Parts.length < 4, "Unexpected block id format: " + blockIds[0]) instead?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was thinking to throw the
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nvm, didn't notice they are java code.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. shall we be more strict and use |
||
| throw new IllegalArgumentException("Unexpected block id format: " + blockIds[0]); | ||
| } else if (!blockId0Parts[0].equals("shuffle")) { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You don't need the 'else' here
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We have some kinds of
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think Sean means that since you're throwing in the previous block, |
||
| throw new IllegalArgumentException("Expected shuffle block id, got: " + blockIds[0]); | ||
| } | ||
| this.shuffleId = blockId0Parts[1]; | ||
| mapIdAndReduceIds = new int[blockIds.length][2]; | ||
| if (blockIds.length > 0) { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is superfluous |
||
| for (int i = 0; i< blockIds.length; i++) { | ||
| String[] blockIdParts = blockIds[i].split("_"); | ||
| if (!blockIdParts[1].equals(shuffleId)) { | ||
| throw new IllegalArgumentException("Expected shuffleId=" + shuffleId + | ||
| ", got:" + blockIds[i]); | ||
| } | ||
| mapIdAndReduceIds[i][0] = Integer.parseInt(blockIdParts[2]); | ||
| mapIdAndReduceIds[i][1] = Integer.parseInt(blockIdParts[3]); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| @Override | ||
| public boolean hasNext() { | ||
| return index < mapIdAndReduceIds.length; | ||
| } | ||
|
|
||
| @Override | ||
| public ManagedBuffer next() { | ||
| String blockId = "shuffle_" + shuffleId + "_" + mapIdAndReduceIds[index][0] + "_" + mapIdAndReduceIds[index][1]; | ||
| final ManagedBuffer block = blockManager.getBlockData(appId, execId, blockId); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it too big a change to make
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was thinking about this. I added a new |
||
| index++; | ||
| metrics.blockTransferRateBytes.mark(block != null ? block.size() : 0); | ||
| return block; | ||
| } | ||
| } | ||
|
|
||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -202,7 +202,7 @@ public void onBlockFetchFailure(String blockId, Throwable t) { | |
| } | ||
| }; | ||
|
|
||
| String[] blockIds = { "shuffle_2_3_4", "shuffle_6_7_8" }; | ||
| String[] blockIds = { "shuffle_0_1_2", "shuffle_0_3_4" }; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's the purpose of this change?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. With this change ,we cannot shuffle blocks with multiple |
||
| OneForOneBlockFetcher fetcher = | ||
| new OneForOneBlockFetcher(client1, "app-2", "0", blockIds, listener, conf, null); | ||
| fetcher.start(); | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why break this out -- it's not necessary for the change right? just for clarity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the iterator is becoming a little bit complicated. So I break this out and give a constructor.