-
Notifications
You must be signed in to change notification settings - Fork 3k
Disable metrics for very wide tables to reduce metadata size #3959
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
7631ee8
be2fff3
bcd10ed
5c88e0e
68dde57
1c0bce5
181cddc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -44,6 +44,8 @@ public final class MetricsConfig implements Serializable { | |
| private static final Logger LOG = LoggerFactory.getLogger(MetricsConfig.class); | ||
| private static final Joiner DOT = Joiner.on('.'); | ||
|
|
||
| // Disable metrics by default for wide tables to prevent excessive metadata | ||
| private static final int MAX_COLUMNS = 32; | ||
| private static final MetricsConfig DEFAULT = new MetricsConfig(ImmutableMap.of(), | ||
| MetricsModes.fromString(DEFAULT_WRITE_METRICS_MODE_DEFAULT)); | ||
|
|
||
|
|
@@ -94,15 +96,21 @@ static Map<String, String> updateProperties(Map<String, String> props, List<Stri | |
| **/ | ||
| @Deprecated | ||
| public static MetricsConfig fromProperties(Map<String, String> props) { | ||
| return from(props, null); | ||
| return from(props, null, DEFAULT_WRITE_METRICS_MODE_DEFAULT); | ||
| } | ||
|
|
||
| /** | ||
| * Creates a metrics config from a table. | ||
| * @param table iceberg table | ||
| */ | ||
| public static MetricsConfig forTable(Table table) { | ||
| return from(table.properties(), table.sortOrder()); | ||
| String defaultMode; | ||
| if (table.schema().columns().size() <= MAX_COLUMNS) { | ||
| defaultMode = DEFAULT_WRITE_METRICS_MODE_DEFAULT; | ||
| } else { | ||
| defaultMode = MetricsModes.None.get().toString(); | ||
| } | ||
| return from(table.properties(), table.sortOrder(), defaultMode); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: missing newline after control flow block and before |
||
| } | ||
|
|
||
| /** | ||
|
|
@@ -127,24 +135,35 @@ public static MetricsConfig forPositionDelete(Table table) { | |
| return new MetricsConfig(columnModes.build(), defaultMode); | ||
| } | ||
|
|
||
| private static MetricsConfig from(Map<String, String> props, SortOrder order) { | ||
| /** | ||
| * Generate a MetricsConfig for all columns based on overrides, sortOrder, and defaultMode. | ||
| * @param props will be read for metrics overrides (write.metadata.metrics.column.*) and default | ||
| * (write.metadata.metrics.default) | ||
| * @param order sort order columns, will be promoted to truncate(16) | ||
| * @param defaultMode default, if not set by user property | ||
| * @return metrics configuration | ||
| */ | ||
| private static MetricsConfig from(Map<String, String> props, SortOrder order, String defaultMode) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Instead of renaming the variable that is used everywhere, I'd just rename the incoming argument. There would be fewer changes if this used |
||
| Map<String, MetricsMode> columnModes = Maps.newHashMap(); | ||
| MetricsMode defaultMode; | ||
| String defaultModeAsString = props.getOrDefault(DEFAULT_WRITE_METRICS_MODE, DEFAULT_WRITE_METRICS_MODE_DEFAULT); | ||
|
|
||
| // Handle user override of default mode | ||
| MetricsMode finalDefaultMode; | ||
| String defaultModeAsString = props.getOrDefault(DEFAULT_WRITE_METRICS_MODE, defaultMode); | ||
| try { | ||
| defaultMode = MetricsModes.fromString(defaultModeAsString); | ||
| finalDefaultMode = MetricsModes.fromString(defaultModeAsString); | ||
| } catch (IllegalArgumentException err) { | ||
| // Mode was invalid, log the error and use the default | ||
| // User override was invalid, log the error and use the default | ||
| LOG.warn("Ignoring invalid default metrics mode: {}", defaultModeAsString, err); | ||
| defaultMode = MetricsModes.fromString(DEFAULT_WRITE_METRICS_MODE_DEFAULT); | ||
| finalDefaultMode = MetricsModes.fromString(defaultMode); | ||
| } | ||
|
|
||
| // First set sorted column with sorted column default (can be overridden by user) | ||
| MetricsMode sortedColDefaultMode = sortedColumnDefaultMode(defaultMode); | ||
| MetricsMode sortedColDefaultMode = sortedColumnDefaultMode(finalDefaultMode); | ||
| Set<String> sortedCols = SortOrderUtil.orderPreservingSortedColumns(order); | ||
| sortedCols.forEach(sc -> columnModes.put(sc, sortedColDefaultMode)); | ||
|
|
||
| MetricsMode defaultModeFinal = defaultMode; | ||
| // Handle user overrides of defaults | ||
| MetricsMode finalDefaultModeVal = finalDefaultMode; | ||
| props.keySet().stream() | ||
| .filter(key -> key.startsWith(METRICS_MODE_COLUMN_CONF_PREFIX)) | ||
| .forEach(key -> { | ||
|
|
@@ -155,12 +174,12 @@ private static MetricsConfig from(Map<String, String> props, SortOrder order) { | |
| } catch (IllegalArgumentException err) { | ||
| // Mode was invalid, log the error and use the default | ||
| LOG.warn("Ignoring invalid metrics mode for column {}: {}", columnAlias, props.get(key), err); | ||
| mode = defaultModeFinal; | ||
| mode = finalDefaultModeVal; | ||
| } | ||
| columnModes.put(columnAlias, mode); | ||
| }); | ||
|
|
||
| return new MetricsConfig(columnModes, defaultMode); | ||
| return new MetricsConfig(columnModes, finalDefaultMode); | ||
| } | ||
|
|
||
| /** | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to override this behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I can make a table config , does that sound reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"defaultMetricCollection : all" or something? Don't we already have that?
Also this feels like it should probably be a catalog level prop 🤷
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was initially thinking like 'write.metadata.metrics.max.columns'='100'? Yea good point maybe catalog level prop is cleaner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took another look, and actually it's a bit messy to make this a catalog property in the current code (the Table is serialized and sent to writers, but Catalog is not..).
Was thinking it makes sense as a table property as well (ie, if i know this table is worth optimizing, set the max columns to be higher). May be a bit awkward to have a different catalog just for this. And if user really wants to have one global setting, this change is coming: #4011
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is okay for now. Tables will still respect whatever is set as
write.metadata.metrics.defaultso this really just changes Iceberg's default in a reasonable way. It is also good to note that metrics for sort columns are automatically promoted to at leasttruncate[16]so it isn't as though we're losing all stats.