-
-
Couldn't load subscription status.
- Fork 19.2k
ENH: make Styler compatible with non-unique indexes
#41269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Styler.format compatible with non-unique indexesStyler compatible with non-unique indexes
pandas/io/formats/style.py
Outdated
| if not c: | ||
| continue | ||
| css_list = maybe_convert_css_to_tuples(c) | ||
| i, j = self.index.get_loc(rn), self.columns.get_loc(cn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can instead we just not use the indices to look up locations? and instead just use indexers (e.g. iterate over the number of columns and use iloc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, because attrs may be a subset of the main self.data, so to maintain performance and ensure the mapped css is placed in the right integer locations it needs to lookup - and lookup doesn't work (without ambiguity) for non-unique.
pandas/io/formats/style.py
Outdated
| try: | ||
| for rn, c in attrs[[cn]].itertuples(): | ||
| if not c: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you limit the try/except not to the entire loop here, e.g. just put it around the .get_loc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be even better to simply refuse to render non-uniques entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i removed the separate cases, as suggested.
pandas/io/formats/style.py
Outdated
| css_list = maybe_convert_css_to_tuples(c) | ||
| i, j = self.index.get_loc(rn), self.columns.get_loc(cn) | ||
| self.ctx[(i, j)].extend(css_list) | ||
| except ValueError as ve: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
want to be much more fine grained
pandas/io/formats/style_render.py
Outdated
| self._display_funcs[(i, j)] = format_func | ||
| for row in data[[col]].itertuples(): | ||
| i_ = self.index.get_indexer_for([row[0]]) # handle duplicate keys in | ||
| j_ = self.columns.get_indexer_for([col]) # non-unique indexes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can do this outside of the loop right? (as col doesn't change), for j_
does this change perf at all? (I don't think so, but checking).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.. the multiple loops really killed it for the unique case (which is benchmarked)..
before after ratio
[4af3eed5] [3a8f11e5]
<styler_non_unique~1^2> <styler_non_unique>
+ 57.8±2ms 163±4ms 2.82 io.style.Render.time_format_render(24, 120)
+ 87.1±5ms 242±4ms 2.78 io.style.Render.time_format_render(36, 120)
+ 30.5±0.9ms 82.2±0.4ms 2.69 io.style.Render.time_format_render(12, 120)
+ 8.53±0.08ms 14.7±0.2ms 1.72 io.style.Render.time_format_render(12, 12)
+ 16.2±0.3ms 27.7±0.7ms 1.71 io.style.Render.time_format_render(24, 12)
+ 25.3±1ms 40.9±0.7ms 1.62 io.style.Render.time_format_render(36, 12)
+ 16.6±0.2ms 18.5±0.2ms 1.11 io.style.Render.time_classes_render(36, 12)
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.
So I had to separate out the non-unique and unique cases with a conditional, then performance was the same...
BENCHMARKS NOT SIGNIFICANTLY CHANGED.
pandas/io/formats/style_render.py
Outdated
|
|
||
| j = self.columns.get_loc(col) # single value | ||
| for row, value in data[[col]].itertuples(): | ||
| i = self.index.get_loc(row) # single value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you pull the col indexer out does this still have a perf hit? e.g. get_indexer_for calls get_loc if its unique (which is cached) anyhow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good reviewing! Third time's a charm, the code is simpler, works for non-unique and unique and performance improvement:
before after ratio
[4af3eed5] [c3b7af82]
<text_gradient^2> <styler_non_unique>
- 33.1±0.4ms 25.9±0.5ms 0.78 io.style.Render.time_format_render(12, 120)
- 86.4±0.4ms 66.8±0.5ms 0.77 io.style.Render.time_format_render(36, 120)
- 62.3±2ms 46.4±0.4ms 0.75 io.style.Render.time_format_render(24, 120)
- 10.1±0.3ms 4.16±0.2ms 0.41 io.style.Render.time_format_render(12, 12)
- 24.7±0.2ms 9.53±0.2ms 0.39 io.style.Render.time_format_render(36, 12)
- 17.7±0.2ms 6.76±0.1ms 0.38 io.style.Render.time_format_render(24, 12)
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.
|
ok! |
The PR aims to make Styler compatible with non-unique indexes/columns, for the purpose of rendering all DataFrame types (even if no styling is applied)
Styler.format: made FULLY compatible with some modifications to the loops, inc TESTS.Styler.applyandStyler.applymap: made PARTIALLY compatible:subsets are unique then compatible, inc TESTS.ifsubsets are non-unique slices will raise a not compatibleKeyError, inc. TESTSStyler.applyandapplymapare NOT compatible. Raises KeyError.Styler.set_table_styles: made FULLY compatible and will style multiple rows/columns from a non-unique key, inc TESTS.Styler.set_td_classesusesreindexso is PARTIALLY compatible whereclasseshas unique index/columns: now returns aKeyErrorin non-unique case, inc TESTS.Styler.set_tooltipsusesreindexso is PARTIALLY compatible wherettipshas unique index/columns: now returns aKeyErrorin non-unique case, inc TESTS.Styler.hide_indexand.hide_columnsare already FULLY compatible through existing code (inc TESTS)built-instyling functions use some version ofapplyorapplymapso are captured by the above cases.I believe this is all relevant functionality reviewed.