Skip to content

Conversation

@Fokko
Copy link
Contributor

@Fokko Fokko commented Mar 15, 2023

I noticed that we use Hadoop3 as a library in https://github.com/tabular-io/iceberg-rest-image. So I think it makes sense to compile against this version as well at some point. Including the latest version of Hive.

Including the latest version of Hive
@steveloughran
Copy link
Contributor

so this sets the minimum version iceberg builds against to being 3.3.4? nice.

with that you can use openFile(path) and declare the read options you want (sequential vs random), any split start/end and, if you know the length, pass that down to save on any HEAD request. s3a and gcs connectors like this

The other bit of fun 3.3.4 has is the whole IOStatistics API to get stats collected on filesystem input and output streams, filesystem instances and others, with
the ability to take a snapshot (IOStatisticsSnapshot) which can be serialized as java or json.

You can just call .toString() of streams to see what is collected, and/or set this to be logged in filesystem close, which is an easy way to see what is happening in existing code

  <property>
    <name>fs.iostatistics.logging.level</name>
    <value>info</value>
  </property>

@Fokko
Copy link
Contributor Author

Fokko commented Apr 19, 2023

Closing as it is a duplicate of #5024

@Fokko Fokko closed this Apr 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants