-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Make metrics collection configurable via table properties #263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
parquet/src/main/java/org/apache/iceberg/parquet/ParquetUtil.java
Outdated
Show resolved
Hide resolved
site/docs/configuration.md
Outdated
| | write.avro.compression-codec | gzip | Avro compression codec | | ||
|
|
||
| | write.metadata.metrics.default | full | Default metrics mode for all columns in the table; none, counts or full | | ||
| | write.metadata.metrics.column.col1 | (not set) | Metrics mode for column 'col1' to allow per-column tuning; none, counts or full | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I like using a serial comma.
spark/src/main/java/org/apache/iceberg/spark/source/Writer.java
Outdated
Show resolved
Hide resolved
|
@aokolnychyi, this looks great! I think we need to merge it with the changes from #254, but this is an awesome start! |
| import static org.apache.iceberg.TableProperties.DEFAULT_WRITE_METRICS_MODE; | ||
| import static org.apache.iceberg.TableProperties.DEFAULT_WRITE_METRICS_MODE_DEFAULT; | ||
|
|
||
| public class MetricsSpec { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about naming this MetricsConfig instead? In a lot of places, we refer to just "spec" to mean the partition spec and I don't want to make the code confusing.
0f68b9d to
d23687f
Compare
|
@rdblue this one is ready for another review round. |
site/docs/configuration.md
Outdated
| | write.parquet.compression-codec | gzip | Parquet compression codec | | ||
| | write.avro.compression-codec | gzip | Avro compression codec | | ||
|
|
||
| | write.metadata.metrics.default | truncate(16) | Default metrics mode for all columns in the table; none, counts, truncate(lengthInBytes), or full | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Truncate length isn't in bytes. It is bytes for binary and characters (unicode codepoints) for UTF-8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I misinterpreted the property name WRITE_METADATA_TRUNCATE_BYTES.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably should have changed that name, good catch.
| import java.util.regex.Matcher; | ||
| import java.util.regex.Pattern; | ||
|
|
||
| public class MetricsModes { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to have some docs here for what the modes are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added short descriptions to MetricsModes and each MetricsMode.
|
@aokolnychyi, just a couple minor comments and this is ready to go. It also needs a rebase. Thanks! |
57f8d21 to
8c6a9f9
Compare
|
Merged. Thanks for fixing this, @aokolnychyi! |
This PR introduces
MetricsConfigthat allows us to control metrics collection and resolves #173.Currently, this logic is integrated only into the Parquet write path.