[1.0.x] Core: Increase inferred column metrics limit to 100.#5933
[1.0.x] Core: Increase inferred column metrics limit to 100.#5933rdblue merged 1 commit intoapache:1.0.xfrom
Conversation
Co-authored-by: Ryan Blue <blue@apache.org>
|
If I have a table with more than 100 columns, what are the downsides since I'm above this param value? I don't see it documented here -- https://iceberg.apache.org/docs/latest/configuration/ I only ask because I have a table that is basically a collection of events. Upstream, each event has some metadata in a dict. Using a column per key in that metadata dict felt like it would compress better than each row having a {"key1": 123}, where the key names are relatively static and the values would benefit from columnar compression. The majority of such cols are empty for any particular partition which I assume is near 0 storage/runtime overhead. Like, file 1's rows will have metadata dict {"abc": 1234} repeated in virtually the whole GB of data. File 2 may have metadata in most rows of {"def": "foo"} instead. |
No description provided.