-
Notifications
You must be signed in to change notification settings - Fork 3k
Dell: Add Dell EMC EcsFileIO. #3376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2. Change name of some methods .
|
The test mock used a client-side mock that is easier for this repo. |
jackye1995
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for separating the PR out! Here are some general comments I have:
- for
EcsFile, I think it's better to have separatedEcsInputFileandEcsOutputFilewith a baseBaseEcsFile, which follows the pattern of S3 and Hadoop and is more clear to code readers. - some classes can be made package private instead of public, such as the stream classes, please make changes accordingly.
- For tests, please use more expressive test names, and provide messages for each
Assert. For testing errors, we prefer to useAssertHelper.assertThrowsinstead of@Test(expected=xxx)or try-catch block to verify error messages. - please add a newline after each control statement like
if,for,try, etc.
dell/src/main/java/org/apache/iceberg/dell/EcsClientProperties.java
Outdated
Show resolved
Hide resolved
2. Move EcsAppendOutputStream and EcsSeekableInputStream to package private. 3. Fix tests with @test(expected) 4. Add new line for all control block statements. 5. Add ecs prefix and properties key in EcsClientProperties. 6. Inline LocationUtils.ECS_SCHEMA.
|
According to @jackye1995 's suggestion. |
dell/src/main/java/org/apache/iceberg/dell/EcsClientProperties.java
Outdated
Show resolved
Hide resolved
dell/src/main/java/org/apache/iceberg/dell/EcsSeekableInputStream.java
Outdated
Show resolved
Hide resolved
dell/src/main/java/org/apache/iceberg/dell/EcsSeekableInputStream.java
Outdated
Show resolved
Hide resolved
dell/src/test/java/org/apache/iceberg/dell/EcsAppendOutputStreamTest.java
Outdated
Show resolved
Hide resolved
| /** | ||
| * A {@link java.io.Externalizable} FileIO of ECS S3 object client. | ||
| */ | ||
| public class EcsFileIO implements FileIO, Externalizable, AutoCloseable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we follow the pattern of (1) directly serialize an object like client as a function, example: https://github.com/apache/iceberg/blob/master/aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIO.java#L42-L45. (2) if absolutely necessary, directly overwrite serialization private methods instead of using Externalize, example: https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/hadoop/SerializableConfiguration.java#L39-L48. Is it possible for you to use the same pattern here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a look at the javadoc for Externalizable, it seems to be a valid usage here, so I will leave it as is for now and let other reviewers decide.
2. Merge LocationUtils to EcsURI. 3. Remove redundant dependencies in Gradle. 4. Some typo and style changes.
dell/src/test/java/org/apache/iceberg/dell/TestEcsSeekableInputStream.java
Outdated
Show resolved
Hide resolved
dell/src/test/java/org/apache/iceberg/dell/mock/TestExceptionCode.java
Outdated
Show resolved
Hide resolved
2. Move BaseEcsFile, EcsInputFile, EcsOutputFile to package private. 3. Separate test from single method to multiple method. Move related test rule to class-level. 4. Remove all static-imports.
|
@rdblue I think this looks good to me overall after a few rounds of review, could you also take a look? |
|
@rdblue Ryan, after several rounds's review with @jackye1995 , I think we are good to merge this PR. However this is a new feature for iceberg to support on-perm product, would you please help to take a look and give the green light ? I think the 0.13 window still open and we would like to get ready soon for our customers. |
jackye1995
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a few new comments based on the new addition of the GcsFileIO and also some discussions around aligning different vendor properties. Please let me know if you have any concern.
dell/src/main/java/org/apache/iceberg/dell/EcsClientProperties.java
Outdated
Show resolved
Hide resolved
dell/src/main/java/org/apache/iceberg/dell/EcsClientProperties.java
Outdated
Show resolved
Hide resolved
dell/src/main/java/org/apache/iceberg/dell/EcsClientProperties.java
Outdated
Show resolved
Hide resolved
dell/src/main/java/org/apache/iceberg/dell/EcsClientProperties.java
Outdated
Show resolved
Hide resolved
| /** | ||
| * A {@link java.io.Externalizable} FileIO of ECS S3 object client. | ||
| */ | ||
| public class EcsFileIO implements FileIO, Externalizable, AutoCloseable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to directly serialize an object like client as a function instead of using Externalizable. I understand this method is likely correct, but given the fact that all the other implementations are using that approach, it's better to be consistent so that it's easier to maintain in the future.
examples:
# Conflicts: # settings.gradle # versions.props
| /** | ||
| * S3 Access key id of Dell EMC ECS | ||
| */ | ||
| public static final String ACCESS_KEY_ID = "ecs.s3.access-key-id"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be ECS_S3_ACCESS_KEY_ID?
and similar comments to the other variable names
|
@rdblue any comments on this PR? |
openinx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost looks good to me, just left few comments.
dell/src/main/java/org/apache/iceberg/dell/ecs/EcsAppendOutputStream.java
Outdated
Show resolved
Hide resolved
dell/src/main/java/org/apache/iceberg/dell/ecs/EcsAppendOutputStream.java
Outdated
Show resolved
Hide resolved
dell/src/main/java/org/apache/iceberg/dell/DellClientFactory.java
Outdated
Show resolved
Hide resolved
2. Remove hashcode and equals of EcsURI.
openinx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR looks good to me now. I will get this merged. Thanks @wang-x-xia for contribution, and thanks @jackye1995 for reviewing !
|
@wang-x-xia Would you also like to add a new doc page to show people how to access apache iceberg tables backed with Dell EMC ECS storage in this https://github.com/apache/iceberg-docs project ? ( Then it will display the rendered page in the apache official page). |
Sure, we will work on that soon. |
|
@openinx Fine! I'll finish the doc and also the catalog implementation! |
Use Dell EMC ECS object SDK to implement File IO APIs:
EcsAppendOutputStreamwhich uses Put and Append API to concat data bytes without local file cache.EcsSeekableInputStreamwhich uses Get-by-range API to fetch data bytes from object.EcsURI,EcsFileandEcsFileIOfor that basic functions.To use the location like
ecs://bucket/object_nameto consistent with other vendors.Due to Dell EMC ECS is an S3 compatible storage. The location is also compatible with schema "s3", "s3a", and "s3n".