-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HADOOP-17657: implement StreamCapabilities in SequenceFile.Writer and fall back to flush, if hflush is not supported #2949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ther they want to use hflush or flush
|
LGTM. Is there an easy way to test this? |
|
checkstyle |
steveloughran
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the test, I actually think you've done it too well and that mockito can be avoided. This is important as mockito tests are painfully brittle and a large source of false failures
| } | ||
|
|
||
| @Test | ||
| public void testSequenceFileWriter() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you are trying to do here and think it's actually overkill. And as mockito backporting is an utter nightmare (believe me), I don't want mockito code unless need be.
how about just creating a writer on the localfs (no mocking), verify that you can verify that the stream supports "HSYNC"; call hflush and hsync on it.
Now, a full end to end test would be to write data, flush it and verify that the file length is now > 0.
writer.write()
writer.hflush();
writer.hsync();
Assertions.assertThat(fs.getFileStatus(p).getLen()).isGreaterThan(0);
that should be enough. Probe passthrough and the write goes through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@steveloughran Thank you for the feedback. I have addressed all your concerns. Please take a look.
| writer.hflush(); | ||
| Assert.assertFalse(writer.hasCapability(StreamCapabilities.HFLUSH)); | ||
|
|
||
| writer.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to use try-with-resources so even when an assertion is thrown the writer is closed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| return new CompressionOption(value, codec); | ||
| } | ||
|
|
||
| @org.apache.hadoop.thirdparty.com.google.common.annotations.VisibleForTesting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm proposing elsewhere to not use mockito, which avoids exposing this operation...I'd be worried about exposing something into a broadly used API, as inevitably someone will use it in production
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense.
|
💔 -1 overall
This message was automatically generated. |
|
I do like the new test, now there's just the little detail that it's not quite working yet
|
steveloughran
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
+1 pending Yetus being happy.
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
… fall back to flush, if hflush is not supported (#2949) Co-authored-by: Kishen Das <[email protected]> Reviewed-by: Steve Loughran <[email protected]> (cherry picked from commit e571025)
… fall back to flush, if hflush is not supported (apache#2949) Co-authored-by: Kishen Das <[email protected]> Reviewed-by: Steve Loughran <[email protected]>
…e.Writer and fall back to flush, if hflush is not supported (apache#2949) Contributed by: Kishen Das <[email protected]> (cherry picked from commit 14a9fa0d48de210c0d40e40e35ac912a2619e038) Signed-off-by: Arpit Agarwal <[email protected]> Change-Id: I310ff5f4be9673b8970a89eddf99765d33ec752c
Following exception is thrown whenever we invoke ProtoMessageWriter.hflush on S3 from Tez, which internally calls org.apache.hadoop.io.SequenceFile$Writer.hflush -> org.apache.hadoop.fs.FS DataOutputStream.hflush -> S3ABlockOutputStream.hflush which is not implemented and throws java.lang.UnsupportedOperationException.
bdffe22d96ae [mdc@18060 class="yarn.YarnUncaughtExceptionHandler" level="ERROR" thread="HistoryEventHandlingThread"] Thread Thread[HistoryEventHandlingThread, 5,main] threw an Exception.^Mjava.lang.UnsupportedOperationException: S3A streams are not Syncable^M at org.apache.hadoop.fs.s3a.S3ABlockOutputStream.hflush(S3ABlockOutputStream.java:657)^M at org.apache.hadoop.fs.FS DataOutputStream.hflush(FSDataOutputStream.java:136)^M at org.apache.hadoop.io.SequenceFile$Writer.hflush(SequenceFile.java:1367)^M at org.apache.tez.dag.history.logging.proto.ProtoMessageWriter.hflush(ProtoMessageWr iter.java:64)^M at org.apache.tez.dag.history.logging.proto.ProtoHistoryLoggingService.finishCurrentDag(ProtoHistoryLoggingService.java:239)^M at org.apache.tez.dag.history.logging.proto.ProtoHistoryLoggingService.han dleEvent(ProtoHistoryLoggingService.java:198)^M at org.apache.tez.dag.history.logging.proto.ProtoHistoryLoggingService.loop(ProtoHistoryLoggingService.java:153)^M at java.lang.Thread.run(Thread.java:748)^M
In order to fix this issue we should implement StreamCapabilities in SequenceFile.Writer. Also, we should fall back to flush(), if hflush() is not supported.