Skip to content

Conversation

@xingbowu
Copy link
Contributor

@xingbowu xingbowu commented Sep 3, 2021

Aliyun OSS is popular in public cloud service, especially the users from china. Many use cases of integrating iceberg make aliyun oss as backend storage. So it benefits community to integrate iceberg with aliyun oss. I would like to contribute several PRs and complete this work. Here is 1st step : Mock aliyun OSS in UT

Aliyun OSS SDK doesn't support mock local environment, and no any plan to develop this feature recently.
To make unit test of iceberg integration with oss efficiently, this PR mocks a local lightweight aliyun oss behavior for UT similar with s3mock

@github-actions github-actions bot added the build label Sep 3, 2021
@xingbowu
Copy link
Contributor Author

xingbowu commented Sep 6, 2021

@rdblue @openinx Could you help to take a look at this PR? Thanks in advance.

Copy link
Member

@openinx openinx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xingbowu to make a smaller PR for the Aliyun OSS integration work. I skimmed the whole PR, this PR is trying to introduce the aliyun oss mock app server, so that we could build oss test cases on top of it (We don't need to mock all the called OSS API in a relative complex test cases , such as multi-upload test cases). It's a good thing for us to introduce a simple simulator to align the local mock app and aliyun online oss server.

The most important thing for me is: How do we guarantee the local mock app is aligning correctly to the aliyun online oss server ? In the parent PR #2230 , we introduced a class test rule named OSSTestRule and the rule has two different impl:

  • OSSMockRule: The rule will start a local mini aliyun oss server, which could serving the remote OSS client http requests.
  • OSSIntegrationTestRule: The rule will prepare testing buckets in the remote oss server, so that the test cases could write real data to.

For the local oss application, we provide tests cases in TestLocalOSS , which was designed to run different TestRule according to the intentional environment variables (will run local mock app by default). If the tests could be passes on both local env and remote oss env, then we could definitely ensure the local oss app is implemented correctly.

@openinx openinx changed the title Mock aliyun OSS(Object Storage Service) Aliyun: Mock aliyun OSS(Object Storage Service) Sep 6, 2021
@xingbowu xingbowu force-pushed the aliyunoss branch 2 times, most recently from 6192cd6 to d832cde Compare September 6, 2021 13:30
@xingbowu
Copy link
Contributor Author

xingbowu commented Sep 6, 2021

Thanks @xingbowu to make a smaller PR for the Aliyun OSS integration work. I skimmed the whole PR, this PR is trying to introduce the aliyun oss mock app server, so that we could build oss test cases on top of it (We don't need to mock all the called OSS API in a relative complex test cases , such as multi-upload test cases). It's a good thing for us to introduce a simple simulator to align the local mock app and aliyun online oss server.

The most important thing for me is: How do we guarantee the local mock app is aligning correctly to the aliyun online oss server ? In the parent PR #2230 , we introduced a class test rule named OSSTestRule and the rule has two different impl:

  • OSSMockRule: The rule will start a local mini aliyun oss server, which could serving the remote OSS client http requests.
  • OSSIntegrationTestRule: The rule will prepare testing buckets in the remote oss server, so that the test cases could write real data to.

For the local oss application, we provide tests cases in TestLocalOSS , which was designed to run different TestRule according to the intentional environment variables (will run local mock app by default). If the tests could be passes on both local env and remote oss env, then we could definitely ensure the local oss app is implemented correctly.

Thanks @xingbowu to make a smaller PR for the Aliyun OSS integration work. I skimmed the whole PR, this PR is trying to introduce the aliyun oss mock app server, so that we could build oss test cases on top of it (We don't need to mock all the called OSS API in a relative complex test cases , such as multi-upload test cases). It's a good thing for us to introduce a simple simulator to align the local mock app and aliyun online oss server.

The most important thing for me is: How do we guarantee the local mock app is aligning correctly to the aliyun online oss server ? In the parent PR #2230 , we introduced a class test rule named OSSTestRule and the rule has two different impl:

  • OSSMockRule: The rule will start a local mini aliyun oss server, which could serving the remote OSS client http requests.
  • OSSIntegrationTestRule: The rule will prepare testing buckets in the remote oss server, so that the test cases could write real data to.

For the local oss application, we provide tests cases in TestLocalOSS , which was designed to run different TestRule according to the intentional environment variables (will run local mock app by default). If the tests could be passes on both local env and remote oss env, then we could definitely ensure the local oss app is implemented correctly.

Thanks a lot for pointing out missing part. I have done rework and added local test here to guarantee quality. comparing with 2230, implemented more test case, such as range get.

Additionally, followed up your review comments to simulate basic aliyun oss behavior including create/put/delete and excluded multi-parts , feel free to let me know if you have further comments.

public void tearDownBucket(String bucket) {
try {
Files.walk(rootDir().toPath())
.filter(p -> p.toFile().isFile())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why if we have a bucket named test-bucket, and have a object named path/to/a.dat, Then the full local path will be <root-dir>/test-bucket/path/to/a.dat, will we remove the <root-dir>/test-bucket/path/to directory also ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In aliyun OSS, the <root-dir>/test-bucket/path/to won't be a specific directory in the real production, but in our local OSS storage app, it will be a real directory (though it's not a real object that people could see by using aliyun OSS SDK).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding to the scenario you list above, the directories are deleted in the deleteBucket. However, we indeed have problem in another scenario. if we create an object with name "aa/bb/cc.txt", and then remove it. after that we create a new object with name "aa/bb". it has problem because of deleting cc.txt in current logic only. I will fix it in separate PR.

public class TestLocalAliyunOSS {

@ClassRule
public static final AliyunOSSTestRule OSS_TEST_RULE = AliyunOSSMockRule.builder().silent().build();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to make this TestRule to be configurable so that we could verify this unit tests on both local mock oss services and remote aliyun OSS environment, to guarantee that the local oss app has the same semantics as remote aliyun OSS environment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xingbowu How is feeling about this comment ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I will implement it in the next PR, which include both local and remote part


@Test
public void testGetObjectWithRange() throws IOException {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra empty line ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think this line should be removed because it seems not match the whole iceberg project code style , even if we don't put it into the check style rule set.


Bucket getBucket(String bucketName) {
List<Bucket> buckets = findBucketsByFilter(file ->
Files.isDirectory(file) && file.getFileName().endsWith(bucketName));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file.getFileName().endsWith(bucketName)

It's incorrect to use the file.getFileName().endsWith(bucketName) to check its bucket name, because for a bucket test-bucket, it's possible that we will have an object with name /path/to/test-bucket/a.txt. In that case, we will create a directory <root-dir>/path/to/test-bucket though it's no an object being visiable to OSS SDK, but it will fail to check the existence of this given bucket.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newDirectoryStream only traverses the directory&file under root path. the object with name /test-bucket/a.txt, is under /test-bucket(bucket name)/path/to/test-bucket(object prefix name)/a.txt. so it is no problem here.

Copy link
Member

@openinx openinx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xingbowu Thanks for the updating, I just token another round and left some comments !

@xingbowu xingbowu force-pushed the aliyunoss branch 3 times, most recently from 27f1168 to fcef7ac Compare September 14, 2021 02:13
@openinx
Copy link
Member

openinx commented Sep 14, 2021

The broken unit test is :

org.apache.iceberg.types.TestReadabilityChecks > testStructWriteReordering STANDARD_ERROR
    [nested.field_b is out of order, before field_a]

org.apache.iceberg.io.TestCloseableGroup > suppressExceptionIfSetSuppressIsTrue STANDARD_ERROR
    [Test worker] ERROR org.apache.iceberg.io.CloseableGroup - Exception suppressed when attempting to close resources
    java.io.IOException: exception1
    	at org.apache.iceberg.io.CloseableGroup.close(CloseableGroup.java:80)
    	at org.apache.iceberg.io.TestCloseableGroup.suppressExceptionIfSetSuppressIsTrue(TestCloseableGroup.java:75)
    	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
    	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
    	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
    	at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
    	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
    	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
    	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
    	at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
    	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
    	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
    	at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
    	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
    	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
    	at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
    	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
    	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
    	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
    	at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
    	at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
    	at jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
    	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
    	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
    	at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
    	at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
    	at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
    	at org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118)
    	at jdk.internal.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
    	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
    	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
    	at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:175)
    	at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:157)
    	at org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:404)
    	at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
    	at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46)
    	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    	at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:55)
    	at java.base/java.lang.Thread.run(Thread.java:829)

Looks like it's unrelated to this PR, let's reopen to trigger the travis CI once again.

@openinx openinx closed this Sep 14, 2021
@openinx openinx reopened this Sep 14, 2021
@xingbowu xingbowu closed this Sep 14, 2021
@xingbowu xingbowu reopened this Sep 14, 2021
@xingbowu xingbowu closed this Sep 15, 2021
@xingbowu xingbowu reopened this Sep 15, 2021
@xingbowu
Copy link
Contributor Author

@openinx Thanks for your effort and comments, I have reworked them with latest code, feel free to take further review

Comment on lines +271 to +294
compileOnly 'javax.xml.bind:jaxb-api'
compileOnly 'javax.activation:activation'
compileOnly 'org.glassfish.jaxb:jaxb-runtime'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to introduce those three jars when adding a testing framework under the aliyun/src/test module?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per aliyun oss sdk documentation, https://help.aliyun.com/document_detail/32009.html
jaxb related dependencies need to be added under java 9 and plus version environment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, then I will recommend to package all the dependencies jar into a single bundled iceberg-aliyun jar. Because in flink SQL client, we will need to add each jar in the shell as the following:

./bin/sql-client.sh embedded \
-j <iceberg-aliyu>.jar \
-j <aliyun-oss>.jar \
-j <javax.xml.bind:jaxb-api>.jar \
-j <org.glassfish.jaxb:jaxb-runtime>.jar 
shell

It will be quite tedious for people to add jars one by one to make the iceberg job works ( aws-sdk don't need to package all of them into a bundled jar because aws sdk has provided it ( see document).

Comment on lines +20 to +23
javax.xml.bind:jaxb-api = 2.3.1
javax.activation:activation = 1.1.1
org.glassfish.jaxb:jaxb-runtime = 2.3.3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need to three extra jars ? I passed the test cases after removing them ..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above comment

@openinx
Copy link
Member

openinx commented Sep 18, 2021

I left several comments, almost looks good to me. @rdblue @jackye1995 would you like to have a double check ?

@xingbowu xingbowu force-pushed the aliyunoss branch 3 times, most recently from 26aa8a8 to c492ec0 Compare September 18, 2021 13:38

Banner.Mode bannerMode = Banner.Mode.CONSOLE;

if (Boolean.parseBoolean(String.valueOf(properties.remove("silent")))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Here we can replace the silent by the PROP_SILENT.

@openinx
Copy link
Member

openinx commented Sep 26, 2021

@xingbowu , Could you please file new issues for the above comment ([1], [2], [3]) and put them in this project dashboard , so that we could easily track the whole progress ?

I plan to get this PR merged once we've tracked those issues and fixed the other minor comments, thanks for the work !

[1] #3067 (comment)
[2] #3067 (comment)
[3] #3067 (comment)

@xingbowu
Copy link
Contributor Author

@xingbowu , Could you please file new issues for the above comment ([1], [2], [3]) and put them in this project dashboard , so that we could easily track the whole progress ?

I plan to get this PR merged once we've tracked those issues and fixed the other minor comments, thanks for the work !

[1] #3067 (comment)
[2] #3067 (comment)
[3] #3067 (comment)

Issue #3180 opened

Copy link
Member

@openinx openinx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

@openinx openinx merged commit 8b64d96 into apache:master Sep 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants