Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -145,8 +145,11 @@ public static Path convertPathWithScheme(Path oldPath, String newScheme) {
URI oldURI = oldPath.toUri();
URI newURI;
try {
newURI = new URI(newScheme, oldURI.getUserInfo(), oldURI.getHost(), oldURI.getPort(), oldURI.getPath(),
oldURI.getQuery(), oldURI.getFragment());
newURI = new URI(newScheme,
oldURI.getAuthority(),
oldURI.getPath(),
oldURI.getQuery(),
oldURI.getFragment());
Comment on lines -148 to +152
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you help clarify the impact of this change pls? it fixes s3 compatibilities, but will it stay fully compatible with previous method usage? not very clear about the nuances of these 2 APIs, and how it caused problems with s3. would like to have some detailed info in the PR to analyze the impact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It will stay fully compatible. The issue is introduced because of the use of getHost() API. This works fine for HDFS like URIs that have a host and a port. Also it works well for most bucket naming conventions that match java URI standards. But for certain patters as explained in the Jira the getHost() API breaks because of the way it tries to parse the hostname. Instead this constructor works better because it uses getAuthority which correctly extracts bucket name in case of S3. I do not see a need for specifically using the getHotst/getPort API, when we have getAuthority that can essentially do the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying.

return new CachingPath(newURI);
} catch (URISyntaxException e) {
// TODO - Better Exception handling
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,10 @@

package org.apache.hudi.common.fs;

import org.apache.hadoop.fs.Path;
import org.junit.jupiter.api.Test;

import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.junit.jupiter.api.Assertions.assertFalse;
import static org.junit.jupiter.api.Assertions.assertThrows;
import static org.junit.jupiter.api.Assertions.assertTrue;
Expand Down Expand Up @@ -54,4 +56,19 @@ public void testStorageSchemes() {
StorageSchemes.isAppendSupported("s2");
}, "Should throw exception for unsupported schemes");
}

@Test
public void testConversionToNewSchema() {
Path s3TablePath1 = new Path("s3://test.1234/table1");
assertEquals(s3TablePath1, HoodieWrapperFileSystem.convertPathWithScheme(s3TablePath1, "s3"));

Path s3TablePath2 = new Path("s3://1234.test/table1");
assertEquals(s3TablePath2, HoodieWrapperFileSystem.convertPathWithScheme(s3TablePath2, "s3"));

Path s3TablePath3 = new Path("s3://test1234/table1");
assertEquals(s3TablePath3, HoodieWrapperFileSystem.convertPathWithScheme(s3TablePath3, "s3"));

Path hdfsTablePath = new Path("hdfs://sandbox.foo.com:8020/test.1234/table1");
System.out.println(HoodieWrapperFileSystem.convertPathWithScheme(hdfsTablePath, "hdfs"));
}
}