Expose partition, file format and compression info of input tables as part of QueryCompletedEvent#12551
Expose partition, file format and compression info of input tables as part of QueryCompletedEvent#12551sopel39 merged 5 commits intotrinodb:masterfrom gaurav8297:enhancements_for_telemetry
Conversation
raunaqmorarka
left a comment
There was a problem hiding this comment.
Please add some tests as well
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
This shouldn't be optional, table will always have a file format
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveInputInfo.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveMetadata.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Please confirm if the call to metastore.getTable is always satisfied by the transaction level metadata cache. Ideally, this shouldn't result in another metadata call to the underlying metastore.
lib/trino-parquet/src/main/java/io/trino/parquet/reader/ParquetReader.java
Outdated
Show resolved
Hide resolved
lib/trino-parquet/src/main/java/io/trino/parquet/reader/ParquetReader.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/orc/OrcPageSource.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/parquet/ParquetPageSource.java
Outdated
Show resolved
Hide resolved
|
@alexjo2144 for iceberg and delta changes |
lib/trino-parquet/src/main/java/io/trino/parquet/reader/ParquetReader.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Partitioned table may have multiple file formats (each partition can be different)
There was a problem hiding this comment.
Right, however I'm worried that that information may not be cheap to obtain. For our use case we're okay with just knowing about the current storage format configured for the table.
Would it be okay if we renamed fileFormat to tableFileFormat to be more explicit avoid the effort getting this info for all partitions ?
There was a problem hiding this comment.
Would it be okay if we renamed
fileFormattotableFileFormatto be more explicit avoid the effort getting this info for all partitions ?
Seems like an OK workaround. Add code comment that's it's only as much useful for partitioned tables.
or call it tableDefaultFileFormat
There was a problem hiding this comment.
If this is the format of the table and not necessarily the format encountered by the query, it’s an odd thing to include in the query event. You can just obtain that data by looking at table metadata with SHOW CREATE TABLE, etc.
There was a problem hiding this comment.
@martint we want to be able to do an offline analysis about the relative occurrence of each file format for reads. We're recording the default table format instead of extracting the storage format of every partition to avoid adding overheads for collecting this info and because knowing the table level format is good enough approximation for telemetry use cases.
There was a problem hiding this comment.
We have metics names, but we don't have metrics values.
What is the value supposed to mean?
There was a problem hiding this comment.
I've updated the value to be the data length as per this comment: #12551 (comment)
There was a problem hiding this comment.
In Iceberg, can different files be of different format?
There was a problem hiding this comment.
Yes, it is possible that's why updated it to tableDefaultFileFormat.
|
Please, move the change to add connector metrics support to the query event to a separate commit. |
plugin/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeInputInfo.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
If we're doing this only for testing with mock connector, can we keep the changes in MockConnectorPageSource ? You can merge the mock and delegate page source metrics in getMetrics
There was a problem hiding this comment.
Instead of sending dummy metrics via MockConnectorFactory, you can consider defining some actual metric like rowCount in MockConnectorPageSource if we get non-empty pages there during tests.
core/trino-main/src/test/java/io/trino/connector/MockConnectorFactory.java
Outdated
Show resolved
Hide resolved
lib/trino-parquet/src/main/java/io/trino/parquet/reader/ParquetReader.java
Outdated
Show resolved
Hide resolved
lib/trino-parquet/src/main/java/io/trino/parquet/reader/ParquetReader.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveInputInfo.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/orc/OrcPageSource.java
Outdated
Show resolved
Hide resolved
lib/trino-parquet/src/main/java/io/trino/parquet/reader/ParquetReader.java
Outdated
Show resolved
Hide resolved
lib/trino-parquet/src/main/java/io/trino/parquet/reader/ParquetReader.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/AbstractTestHive.java
Outdated
Show resolved
Hide resolved
Added partition and file format as part of input info in Hive, Iceberg and Delta Lake connectors.
|
I don't think this needs release notes |
Description
improvement
Expose the following things as part of QueryCompletedEvent which can be later tracked through telemetry.
Related issues, pull requests, and links
Documentation
( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
( ) No release notes entries required.
( ) Release notes entries required with the following suggested text: