Skip to content

Fix Hive sorted write failure when user identity contains @ sign#18129

Closed
findepi wants to merge 1 commit intotrinodb:masterfrom
findepi:findepi/fix-hive-sorted-write-failure-when-user-identity-contains-sign-b52351
Closed

Fix Hive sorted write failure when user identity contains @ sign#18129
findepi wants to merge 1 commit intotrinodb:masterfrom
findepi:findepi/fix-hive-sorted-write-failure-when-user-identity-contains-sign-b52351

Conversation

@findepi
Copy link
Copy Markdown
Member

@findepi findepi commented Jul 4, 2023

It's common for staging path to contain ${USER} placeholder and be on local file system. When username contains an AT sign (name@domain), the setSchemeToFileIfAbsent used to fail.

Fixes #18132

It's common for staging path to contain `${USER}` placeholder and be on
local file system. When username contains an AT sign (`name@domain`),
the `setSchemeToFileIfAbsent` used to fail.
@cla-bot cla-bot bot added the cla-signed label Jul 4, 2023
@findepi
Copy link
Copy Markdown
Member Author

findepi commented Jul 4, 2023

This is needed because Location first splits by @ and then find 3rd slash to determine where the host ends and path starts.

List<String> userInfoSplit = USER_INFO_SPLITTER.splitToList(afterScheme);
Optional<String> userInfo = userInfoSplit.size() == 2 ? Optional.of(userInfoSplit.get(0)) : Optional.empty();
List<String> authoritySplit = AUTHORITY_SPLITTER.splitToList(Iterables.getLast(userInfoSplit));

However, it seems the logic doesn't follow how URI class does this:

String input = "http:///foo/bar@baz/";

URI uri = new URI(input);
System.out.println("uri.getUserInfo() = " + uri.getUserInfo()); // null
System.out.println("uri.getHost() = " + uri.getHost()); // null
System.out.println("uri.getPath() = " + uri.getPath()); // "/foo/bar@baz/"

Location location = Location.of(input);
System.out.println("location.userInfo() = " + location.userInfo()); // "/foo/bar"
System.out.println("location.host() = " + location.host()); // "baz"
System.out.println("location.path() = " + location.path()); // ""

I think the "more correct" approach would be to change Location.of so that it matches URI with respect to finding user info in the input.
Will send a PR tomorrow.

cc @electrum @losipiuk @alexjo2144 @findinpath

@github-actions github-actions bot added tests:hive hive Hive connector labels Jul 4, 2023
@findepi
Copy link
Copy Markdown
Member Author

findepi commented Jul 4, 2023

I think the "more correct" approach would be to change Location.of so that it matches URI with respect to finding user info in the input.

-> #18131

@findepi findepi marked this pull request as draft July 4, 2023 21:18
return location;
}
return Location.of("file:///" + location.path());
return Location.of("file://@/" + location.path());
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works for Location, but we still use Hadoop Path and it chokes on such locations.
In particular, AbstractTestHive.testBucketSortedTables / TestHiveFileMetastore.testBucketSortedTables fails

java.lang.IllegalArgumentException: Wrong FS: file://@/var/folders/dr/jhnfvsrd2zl925dyfsctj0pr0000gn/T/trino-staging-11857319125698218229/temp_path_, expected: file:///

	at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:831)
	at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:119)
	at org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:822)
	at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:808)
	at org.apache.hadoop.fs.ChecksumFileSystem.mkdirs(ChecksumFileSystem.java:997)
	at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:722)
	at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:709)
	at io.trino.hdfs.TrinoFileSystemCache$FileSystemWrapper.create(TrinoFileSystemCache.java:361)
	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1240)
	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1217)
	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1098)
	at io.trino.filesystem.hdfs.HdfsOutputFile.lambda$create$1(HdfsOutputFile.java:75)
	at io.trino.hdfs.authentication.NoHdfsAuthentication.doAs(NoHdfsAuthentication.java:25)
	at io.trino.hdfs.HdfsEnvironment.doAs(HdfsEnvironment.java:125)
	at io.trino.filesystem.hdfs.HdfsOutputFile.create(HdfsOutputFile.java:75)
	at io.trino.filesystem.hdfs.HdfsOutputFile.create(HdfsOutputFile.java:54)
	at io.trino.orc.OutputStreamOrcDataSink.create(OutputStreamOrcDataSink.java:41)
	at io.trino.plugin.hive.orc.OrcFileWriterFactory.createOrcDataSink(OrcFileWriterFactory.java:217)
	at io.trino.plugin.hive.SortingFileWriter.writeTempFile(SortingFileWriter.java:258)
	at io.trino.plugin.hive.SortingFileWriter.flushToTempFile(SortingFileWriter.java:201)
	at io.trino.plugin.hive.SortingFileWriter.appendRows(SortingFileWriter.java:125)
	at io.trino.plugin.hive.HiveWriter.append(HiveWriter.java:85)
	at io.trino.plugin.hive.HivePageSink.writePage(HivePageSink.java:346)
	at io.trino.plugin.hive.HivePageSink.doAppend(HivePageSink.java:297)
	at io.trino.plugin.hive.HivePageSink.lambda$appendPage$3(HivePageSink.java:285)
	at io.trino.hdfs.authentication.HdfsAuthentication.lambda$doAs$0(HdfsAuthentication.java:26)
	at io.trino.hdfs.authentication.NoHdfsAuthentication.doAs(NoHdfsAuthentication.java:25)
	at io.trino.hdfs.authentication.HdfsAuthentication.doAs(HdfsAuthentication.java:25)
	at io.trino.hdfs.HdfsEnvironment.doAs(HdfsEnvironment.java:130)
	at io.trino.plugin.hive.HivePageSink.appendPage(HivePageSink.java:285)
	at io.trino.plugin.hive.AbstractTestHive.doTestBucketSortedTables(AbstractTestHive.java:2868)
	at io.trino.plugin.hive.AbstractTestHive.testBucketSortedTables(AbstractTestHive.java:2813)

@findepi findepi closed this Jul 4, 2023
@findepi findepi deleted the findepi/fix-hive-sorted-write-failure-when-user-identity-contains-sign-b52351 branch July 4, 2023 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed hive Hive connector

Development

Successfully merging this pull request may close these issues.

Hive sorted writes fail when user identity contains the @ AT sign

1 participant