Skip to content

Convert Iceberg to TrinoFileSystem interface#13530

Merged
electrum merged 1 commit intotrinodb:masterfrom
electrum:iceberg-io
Aug 12, 2022
Merged

Convert Iceberg to TrinoFileSystem interface#13530
electrum merged 1 commit intotrinodb:masterfrom
electrum:iceberg-io

Conversation

@electrum
Copy link
Member

@electrum electrum commented Aug 6, 2022

Description

Is this change a fix, improvement, new feature, refactoring, or other?

refactoring

Related issues, pull requests, and links

First commit is extracted to #13361

Documentation

(x) No documentation is needed.

Release notes

(x) No release notes entries required.

@cla-bot cla-bot bot added the cla-signed label Aug 6, 2022
@electrum electrum changed the title Iceberg io Convert Iceberg to TrinoFileSystem interface Aug 6, 2022
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider marking these classes as thread safe or not. I believe only the provider is thread safe and the rest are considered single threaded

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think that everything should be thread safe, including TrinoInput since it only has positioned read methods. The only thing not safe would be the FileIterator returned from FileSystem.listFiles() but that seems expected.

Did you see something not thread safe?

@electrum electrum force-pushed the iceberg-io branch 7 times, most recently from beb9454 to 561b66f Compare August 11, 2022 18:06
@findepi
Copy link
Member

findepi commented Aug 11, 2022

@electrum electrum force-pushed the iceberg-io branch 2 times, most recently from 0beb2ca to 6960816 Compare August 11, 2022 23:40
@electrum electrum merged commit e95fdcc into trinodb:master Aug 12, 2022
@electrum electrum deleted the iceberg-io branch August 12, 2022 08:12
hdfsClient.delete(schemaDir);
}

@Test(groups = ICEBERG) // make sure empty directories are noticed as well
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this test removed?

public IcebergAvroPageSource(
FileIO fileIo,
String path,
InputFile file,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requireNotNull file

public IcebergAvroFileWriter(
FileIO fileIo,
Path path,
OutputFile file,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requireNotNull file

public IcebergFileWriter createDataFileWriter(
Path outputPath,
TrinoFileSystem fileSystem,
String outputPath,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why String for outputPath ?
I see that FileIO expects a String for newOutputFile(String) method, but I a wondering why working with a rather generic type over Path is better.

Path path = new Path(location);
FileSystem fileSystem = hdfsEnvironment.getFileSystem(hdfsContext, path);
if (fileSystem.exists(path) && fileSystem.listFiles(path, true).hasNext()) {
if (fileSystem.listFiles(location).hasNext()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add recursive boolean parameter to TrinoFileSystem#listFiles(String) for more transparency? Alternatively, modify the method name to listFilesRecursively

InputFile file;
OptionalLong fileModifiedTime = OptionalLong.empty();
try {
file = fileSystem.toFileIo().newInputFile(inputFile.location(), inputFile.length());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toFileIo() returns a Closeable.
Therefore let's use try-with-resources.

FileEntry next()
throws IOException;

static FileIterator empty()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: since the returned instance has no state, we can make it static

@findinpath
Copy link
Contributor

Overall this PR brings a more elegant approach to dealing with the file system. The only thing I'm not sure of being an improvement is using String instead of Path for passing the file path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants