Skip to content

Conversation

@jkolash
Copy link
Contributor

@jkolash jkolash commented Oct 4, 2024

Show that we do not bound the number of column statistics if the nested struct had 200 columns we would end up with over 200 statistics.

Show that we do not bound the number of column statistics if
the nested struct had 200 columns we would end up with over 200
statistics.
@github-actions github-actions bot added the data label Oct 4, 2024
Copy link
Contributor

@singhpk234 singhpk234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkolash this is a known behaviour,
The limit is enforced at top level columns
please ref this discussion: #5215 (comment)

@jkolash
Copy link
Contributor Author

jkolash commented Oct 4, 2024

Hmm the purpose of this property is to bound the number of metrics that get created for tables with wide schemas. the fact that it doesn't work with all types of wide schemas seems to be a fundamental flaw. I see the comment

Yes, but this is the current behavior. We use the top-level columns for the current check.
#5215 (comment)

but I don't see an explanation beyond that why it should be limited to top level columns.

@jkolash
Copy link
Contributor Author

jkolash commented Oct 4, 2024

I can close this and open a feature request instead or keep discussion on the issue instead. #11253

let me know

@singhpk234
Copy link
Contributor

I think both works, i would recommend raising this in iceberg slack channel for #dev as well to open up to wider forum !

@jkolash
Copy link
Contributor Author

jkolash commented Oct 7, 2024

Ok I started a slack thread and mailing list discussion I will close this issue for now.

@jkolash jkolash closed this Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants