-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HADOOP-17125. Using snappy-java in SnappyCodec #2201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
💔 -1 overall
This message was automatically generated. |
|
thanks for the proposal and thanks for the PR |
|
@hemanthboyina The performance should be similar since the java-snappy also uses native lib that is bundled in the jar file. I don't benchmark yet, and I will try to find time to do it. If possible, feel free to do the testing and post the result back. |
|
ideally there shouldn't be any difference in performance , better we have a benchmark result of native snappy and java-snappy |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
@sunchao thanks! That helps a lot. I rebased on trunk, and modified the script to remove snappy native build. Let's see how it works. |
|
(!) A patch to the testing environment has been detected. |
|
(!) A patch to the testing environment has been detected. |
|
💔 -1 overall
This message was automatically generated. |
|
(!) A patch to the testing environment has been detected. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
(!) A patch to the testing environment has been detected. |
|
💔 -1 overall
This message was automatically generated. |
|
(!) A patch to the testing environment has been detected. |
|
(!) A patch to the testing environment has been detected. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
(!) A patch to the testing environment has been detected. |
|
💔 -1 overall
This message was automatically generated. |
|
Thanks for working on this, @dbtsai. Then patch breaks build if snappy(-devel) is installed on the env. hadoop-common-project/hadoop-common/src/CMakeLists.txt must be updated to address this. |
|
Removing the snappy related code from hadoop-mapreduce-client-nativetask breaks existing deployment using the feature? Since imcompatible change is hard to get consensus and takes time to come in, how about fixing only hadoop-common first and keeping hadoop-mapreduce-client-nativetask as is here? BUILDING.txt should refer snappy-dev(el) as optional dependency only used for hadoop-mapreduce-client-nativetask in that case. |
|
@iwasakims Good point. I'll revert the |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
| compressedDirectBuf.position(0).limit(compressedDirectBufLen); | ||
| // There is compressed input, decompress it now. | ||
| int size = Snappy.uncompressedLength((ByteBuffer) compressedDirectBuf); | ||
| if (size > uncompressedDirectBuf.capacity()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check with uncompressedDirectBuf.remaining instead of capacity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before calling decompressBytesDirect, we always reset uncompressedDirectBuf. So both method returns the same result. But you are right, I think .remaining is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decompressDirect also calls decompressBytesDirect, but the uncompressedDirectBuf is passed in as an argument. I think it has danger to assume uncompressedDirectBuf is always reset.
| return 0; | ||
| } else { | ||
| // Set the position and limit of `compressedDirectBuf` for reading | ||
| compressedDirectBuf.position(0).limit(compressedDirectBufLen); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we need to set position and limit here? If compressedDirectBuf already has been set with position and limit before calling decompressBytesDirect? Won't we read wrong data from this buffer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decompressBytesDirect is the only call that reads the data from compressedDirectBuf, so I think we should read it from the beginning. In compressedDirectBuf, we should fully decompress the compressedDirectBuf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For decompressDirect, compressedDirectBuf is set up correctly before calling decompressBytesDirect. So we don't need to do it here again. For decompress, I'm not sure, but looks like setInputFromSavedData also takes care about compressedDirectBuf reseting. I guess we don't need to do it in decompressBytesDirect.
|
Closing this PR in favor of #2297 |
See https://issues.apache.org/jira/browse/HADOOP-17125 for details