[SPARK-19991]FileSegmentManagedBuffer performance improvement#17329
[SPARK-19991]FileSegmentManagedBuffer performance improvement#17329witgo wants to merge 3 commits intoapache:masterfrom
Conversation
|
Test build #74714 has finished for PR 17329 at commit
|
|
Test build #74715 has finished for PR 17329 at commit
|
|
Test build #74727 has started for PR 17329 at commit |
|
Test FAILed. |
|
Out of curiosity why does each call generate NoSuchElementException ? |
public class HadoopConfigProvider extends ConfigProvider {
private final Configuration conf;
public HadoopConfigProvider(Configuration conf) {
this.conf = conf;
}
@Override
public String get(String name) {
String value = conf.get(name);
// When do not set the value of spark.storage.memoryMapThreshold or spark.shuffle.io.lazyFD,
// the value of `value` is null
if (value == null) {
throw new NoSuchElementException(name);
}
return value;
}
@Override
public Iterable<Map.Entry<String, String>> getAll() {
return conf;
}
} |
| this(conf.lazyFileDescriptor(), conf.memoryMapBytes(), file, offset, length); | ||
| } | ||
|
|
||
| public FileSegmentManagedBuffer( |
There was a problem hiding this comment.
Yeah that makes sense then but I don't think you need a new public constructor for this
There was a problem hiding this comment.
Oh, do you have a better idea?
There was a problem hiding this comment.
Just don't add a new constructor. The existing one can set the new fields.
There was a problem hiding this comment.
That will change a lot of code, right?
There was a problem hiding this comment.
No, why? I mean keep exactly the same constructors, no more no less. No code would change. You just set your two new fields in the current constructor. It actually means you don't need some of the changes you made here.
There was a problem hiding this comment.
I'm sorry, I probably did not make it clear.
Suppose there are E Executor in the cluster, a shuffle process has M Map task, R reduce task, in the master branch will be created:
- Up to M * R FileSegmentManagedBuffer instances
- Up to 2 * M * R NoSuchElementException instances
in this PR will be created:
- Up to M * R FileSegmentManagedBuffer instances
- Up to 2 * NoSuchElementException instances (ExternalShuffleBlockResolver and IndexShuffleBlockResolver are created once executor starts and They call the new constructor to create a FileSegmentManagedBuffer instance)
There was a problem hiding this comment.
This still doesn't address the point. What you say is true even if you make the change I suggest, which is to remove the superfluous constructor. The performance is exactly the same.
There was a problem hiding this comment.
Sorry,I didn't get your idea. Can you write some code?
There was a problem hiding this comment.
I'm simply describing what you proposed above at #17329 (comment)
|
OK. I think anyone's welcome to make this change without the new constructor. It sounded fine otherwise. |
…ement ## What changes were proposed in this pull request? Avoid `NoSuchElementException` every time `ConfigProvider.get(val, default)` falls back to default. This apparently causes non-trivial overhead in at least one path, and can easily be avoided. See #17329 ## How was this patch tested? Existing tests Author: Sean Owen <sowen@cloudera.com> Closes #17567 from srowen/SPARK-19991.
FileSegmentManagedBuffer performance improvement.
What changes were proposed in this pull request?
When we do not set the value of the configuration items
spark.storage.memoryMapThresholdandspark.shuffle.io.lazyFD,each call to the cFileSegmentManagedBuffer.nioByteBuffer or FileSegmentManagedBuffer.createInputStream method creates a NoSuchElementException instance. This is a more time-consuming operation.
In the use case, this PR can improve the performance of about 3.5%
The test code:
and
spark-defaults.conffile:The test results are as follows
How was this patch tested?
Existing tests.