refactor: Migrate DataFrame display to use IPython's _repr_mimebundle_() protocol for anywidget mode #2271

shuoweil · 2025-11-14T04:39:54Z

This PR refactors the DataFrame display system to properly implement IPython's repr_mimebundle() protocol for anywidget mode, moving widget handling logic out of repr_html() and into the appropriate display method.

See design doc here: screen/8XWbJDa2d9mb6r5

Fixes #<458796812> 🦕

review-notebook-app · 2025-11-14T04:40:00Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

tswast · 2025-11-19T22:34:04Z

bigframes/streaming/dataframe.py


-    def _repr_html_(self, *args, **kwargs):
-        return _return_type_wrapper(self._df._repr_html_, StreamingDataFrame)(
+    def _repr_html_fallback_(self, *args, **kwargs):


We don't need the trailing _ because we aren't trying to get ipython to automatically call this. In fact, we do not want it to call this directly.

tswast · 2025-11-19T22:34:51Z

bigframes/core/indexes/base.py

        # metadata, like we do with DataFrame.
        opts = bigframes.options.display
        max_results = opts.max_rows
        # anywdiget mode uses the same display logic as the "deferred" mode


Comment is out-of-date.

Removed outdated comments

tswast · 2025-11-19T22:35:41Z

bigframes/dataframe.py

        return "\n".join(lines)

-    def _repr_html_(self) -> str:
+    def _repr_html_fallback_(self) -> str:


Same here, remove the trailing _. Just, _repr_html_fallback.

tswast · 2025-11-19T22:39:07Z

bigframes/dataframe.py

+        widget_repr["text/plain"] = formatter.repr_query_job(df._compute_dry_run())
+        widget_repr["text/html"] = self._repr_html_fallback_()


At this point, we have already executed the query as part of the widget construction. Let's use the information available to render the HTML and plain text versions.

Optimized to use the data already fetched by the widget. Now it generates the HTML and plain text representations form the widget's cached data, avoiding redundant queries

tswast · 2025-11-19T22:39:52Z

bigframes/dataframe.py

+            except (AttributeError, ValueError, ImportError):
+                # Fallback: let IPython use _repr_html_fallback_() instead
+                warnings.warn(
+                    "Anywidget mode is not available. "
+                    "Please `pip install anywidget traitlets` or `pip install 'bigframes[anywidget]'` to use interactive tables. "
+                    f"Falling back to static HTML. Error: {traceback.format_exc()}"
+                )
+                # Don't return anything - let IPython fall back to _repr_html_fallback_()
+                pass


Please remove this except block. I would prefer we explicitly check for anywidget in particular to be installed. Right now, this could be hiding all sorts of errors.

If we must have it, then please remove all exception types except for ImportError.

tswast · 2025-11-19T22:43:47Z

bigframes/dataframe.py

+                # Don't return anything - let IPython fall back to _repr_html_fallback_()
+                pass
+
+        return {"text/html": self._repr_html_fallback_(), "text/plain": repr(self)}


This would be calling more BQ API calls than necessary. Let's call to_pandas_batches() and fetch the first page, just like we do in the anywidget version.

This reverts commit 61a9903.

tswast · 2025-11-20T22:33:02Z

bigframes/dataframe.py

+        # Process blob columns if needed
+        self._cached()
+        df = self.copy()
+        if bigframes.options.display.blob_display:
+            blob_cols = [
+                series_name
+                for series_name, series in df.items()
+                if series.dtype == bigframes.dtypes.OBJ_REF_DTYPE
+            ]
+            for col in blob_cols:
+                df[col] = df[col].blob._get_runtime(mode="R", with_metadata=True)


This looks duplicated with _repr_mimebundle_. Please pull it out into a separate function.

tswast · 2025-11-20T22:34:09Z

bigframes/dataframe.py

+        # Re-create the text representation from what we know.
+        opts = bigframes.options.display
+        with display_options.pandas_repr(opts):
+            import pandas.io.formats
+
+            # safe to mutate this, this dict is owned by this code, and does not affect global config
+            to_string_kwargs = (
+                pandas.io.formats.format.get_dataframe_repr_params()  # type: ignore
+            )
+            if not self._has_index:
+                to_string_kwargs.update({"index": False})
+            repr_string = widget._cached_data.to_string(**to_string_kwargs)
+
+        lines = repr_string.split("\n")
+        row_count = widget.row_count
+        if row_count is not None and row_count > len(widget._cached_data):
+            lines.append("...")
+
+        lines.append("")
+        column_count = len(self.columns)
+        lines.append(f"[{row_count or '?'} rows x {column_count} columns]")
+        widget_repr["text/plain"] = "\n".join(lines)


Likewise, this looks duplicated as well. Please get the common parameters out and refactor to a shared function.

tswast · 2025-11-20T22:38:11Z

bigframes/dataframe.py

+        # Generate HTML representation
+        with display_options.pandas_repr(opts):
+            if bigframes.options.display.blob_display and blob_cols:
+
+                def obj_ref_rt_to_html(obj_ref_rt) -> str:
+                    obj_ref_rt_json = json.loads(obj_ref_rt)
+                    obj_ref_details = obj_ref_rt_json["objectref"]["details"]
+                    if "gcs_metadata" in obj_ref_details:
+                        gcs_metadata = obj_ref_details["gcs_metadata"]
+                        content_type = typing.cast(
+                            str, gcs_metadata.get("content_type", "")
+                        )
+                        if content_type.startswith("image"):
+                            size_str = ""
+                            if bigframes.options.display.blob_display_width:
+                                size_str = f' width="{bigframes.options.display.blob_display_width}"'
+                            if bigframes.options.display.blob_display_height:
+                                size_str = (
+                                    size_str
+                                    + f' height="{bigframes.options.display.blob_display_height}"'
+                                )
+                            url = obj_ref_rt_json["access_urls"]["read_url"]
+                            return f'<img src="{url}"{size_str}>'
+
+                    return f'uri: {obj_ref_rt_json["objectref"]["uri"]}, authorizer: {obj_ref_rt_json["objectref"]["authorizer"]}'
+
+                formatters = {blob_col: obj_ref_rt_to_html for blob_col in blob_cols}
+                with pandas.option_context("display.max_colwidth", None):
+                    html_string = pandas_df.to_html(
+                        escape=False,
+                        notebook=True,
+                        max_rows=pandas.get_option("display.max_rows"),
+                        max_cols=pandas.get_option("display.max_columns"),
+                        show_dimensions=pandas.get_option("display.show_dimensions"),
+                        formatters=formatters,  # type: ignore
+                    )


I'm confused. Shouldn't we be using the _repr_html_fallback method you created here for this purpose?

Thanks for pointing it out. It was confusing and the logic was duplicated. I have refactored the display logic. A new private function _create_html_represenetation for the shared logic. _repr_html_fallback and repr_mimebundle now call this new method. The code duplication is removed.

tswast · 2025-11-21T22:20:39Z

bigframes/dataframe.py

+    def _repr_html_fallback(self) -> str:
        """
        Returns an html string primarily for use by notebooks for displaying
        a representation of the DataFrame. Displays 20 rows by default since
        many notebooks are not configured for large tables.
        """
        opts = bigframes.options.display
        max_results = opts.max_rows
        if opts.repr_mode == "deferred":
            return formatter.repr_query_job(self._compute_dry_run())

-        # Process blob columns first, regardless of display mode
+        # Process blob columns first for non-deferred modes
+        df, blob_cols = self._process_blob_columns()
+
+        pandas_df, row_count, query_job = df._block.retrieve_repr_request_results(
+            max_results
+        )
+
+        self._set_internal_query_job(query_job)
+        column_count = len(pandas_df.columns)
+
+        return self._create_html_representation(
+            pandas_df, row_count, column_count, blob_cols
+        )


I'm not seeing references to this _repr_html_fallback function. I think it is dead code and can be deleted.

_repr_html_fallback is referenced in bigframes/streaming/dataframe.py. This chunk of code is needed.

tswast · 2025-11-21T22:23:11Z

bigframes/dataframe.py

+        df = self
+        blob_cols = []
        if bigframes.options.display.blob_display:
            blob_cols = [
                series_name
-                for series_name, series in df.items()
+                for series_name, series in self.items()
                if series.dtype == bigframes.dtypes.OBJ_REF_DTYPE
            ]
-            for col in blob_cols:
-                # TODO(garrettwu): Not necessary to get access urls for all the rows. Update when having a to get URLs from local data.
-                df[col] = df[col].blob._get_runtime(mode="R", with_metadata=True)
+            if blob_cols:
+                df = self.copy()


Why do we define self twice? copy() is a pretty cheap operation (just makes a new block referencing the same expression), so I think we can just define it once.

Suggested change

df = self

blob_cols = []

if bigframes.options.display.blob_display:

blob_cols = [

series_name

for series_name, series in df.items()

for series_name, series in self.items()

if series.dtype == bigframes.dtypes.OBJ_REF_DTYPE

]

for col in blob_cols:

# TODO(garrettwu): Not necessary to get access urls for all the rows. Update when having a to get URLs from local data.

df[col] = df[col].blob._get_runtime(mode="R", with_metadata=True)

if blob_cols:

df = self.copy()

df = self.copy()

blob_cols = []

if bigframes.options.display.blob_display:

blob_cols = [

series_name

for series_name, series in df.items()

if series.dtype == bigframes.dtypes.OBJ_REF_DTYPE

]

if blob_cols:

tswast · 2025-11-21T22:26:37Z

bigframes/dataframe.py

+
+    def _process_blob_columns(self) -> tuple[DataFrame, list[str]]:
+        """Process blob columns for display."""
        self._cached()


This seems like the wrong function in which to call _cached(). _cached() can be very expensive (causes a query). Seems better to just call it once if needed in repr_mimebundle`.

The call to _cached() was removed from _process_blob_columns (now _get_display_df_and_blob_cols) to avoid unnecessary queries, as suggested.

tswast · 2025-11-21T22:28:16Z

bigframes/dataframe.py

+        """
+        from bigframes import display
+
+        df, _ = self._process_blob_columns()


What are we doing when we are "processing"? Seems like we should have a more descriptive name here that describes the purpose.

I've renamed _process_blob_columns to _get_display_df_and_blob_cols to better reflect its purpose of preparing the DataFrame for display, including handling blob columns.

tswast · 2025-11-21T22:34:00Z

bigframes/dataframe.py

+                # Fallback: let IPython use _repr_html_fallback() instead
                warnings.warn(
                    "Anywidget mode is not available. "
                    "Please `pip install anywidget traitlets` or `pip install 'bigframes[anywidget]'` to use interactive tables. "
-                    f"Falling back to deferred mode. Error: {traceback.format_exc()}"
+                    f"Falling back to static HTML. Error: {traceback.format_exc()}"
                )
-                return formatter.repr_query_job(self._compute_dry_run())
+                # Don't return anything - let IPython fall back to _repr_html_fallback()
+                pass


Nit: pass is only necessary if there is no body, but we do have a body with the warning above.

Also, we are manually falling back.

Suggested change

# Fallback: let IPython use _repr_html_fallback() instead

warnings.warn(

"Anywidget mode is not available. "

"Please `pip install anywidget traitlets` or `pip install 'bigframes[anywidget]'` to use interactive tables. "

f"Falling back to deferred mode. Error: {traceback.format_exc()}"

f"Falling back to static HTML. Error: {traceback.format_exc()}"

)

return formatter.repr_query_job(self._compute_dry_run())

# Don't return anything - let IPython fall back to _repr_html_fallback()

pass

# Anywidget is an optional dependency, so warn rather than fail.

# TODO(shuowei): When Anywidget becomes the default for all repr modes,

# remove this warning.

warnings.warn(

"Anywidget mode is not available. "

"Please `pip install anywidget traitlets` or `pip install 'bigframes[anywidget]'` to use interactive tables. "

f"Falling back to static HTML. Error: {traceback.format_exc()}"

)

tswast · 2025-11-21T22:35:49Z

bigframes/dataframe.py

+        """
+        from bigframes import display
+
+        df, _ = self._process_blob_columns()


Suggested change

df, _ = self._process_blob_columns()

# TODO(shuowei): Keep blob_cols and pass them to TableWidget so that they can render properly.

df, _ = self._process_blob_columns()

shuoweil requested a review from tswast November 14, 2025 04:39

shuoweil self-assigned this Nov 14, 2025

shuoweil requested review from a team as code owners November 14, 2025 04:39

product-auto-label bot added size: l Pull request size is large. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Nov 14, 2025

tswast reviewed Nov 19, 2025

View reviewed changes

product-auto-label bot added size: xl Pull request size is extra large. and removed size: l Pull request size is large. labels Nov 20, 2025

shuoweil added 3 commits November 20, 2025 20:33

feat: Refactor anywidget display to use _ipython_display_

0dc287e

Revert accidental changes to README.rst

94c05d7

add testcase

1b15ef0

shuoweil requested a review from tswast November 20, 2025 20:35

shuoweil added 10 commits November 20, 2025 22:31

ipython_display_ to set fallback mimetypes

53953bb

remove _repr_html_()

5397a59

Revert "remove _repr_html_()"

689fa74

This reverts commit 61a9903.

change the fallback function name to better reflect the change

6b407d4

code refactor

3cd960e

fix mypy

b0f1525

Revert unncessary change

7e47895

use test/plain for print(df)

736929a

fix failed testcase

eb61fc0

reuse available data and optimize error handling

3079a3c

tswast reviewed Nov 20, 2025

View reviewed changes

update the testcase due to my optimization reduce the query calls number

86e0b6a

shuoweil force-pushed the shuowei-anywidget-html-repr branch from c79425c to 86e0b6a Compare November 20, 2025 22:41

code refactor

d53eea3

shuoweil requested a review from tswast November 20, 2025 23:44

tswast reviewed Nov 21, 2025

View reviewed changes

shuoweil added 2 commits November 21, 2025 23:02

function polish

8b9a2cf

Merge branch 'main' into shuowei-anywidget-html-repr

1042a9e

shuoweil requested a review from tswast November 21, 2025 23:03

Merge branch 'main' into shuowei-anywidget-html-repr

83513b5

product-auto-label bot added size: l Pull request size is large. and removed size: xl Pull request size is extra large. labels Nov 21, 2025

notebook change

05a9245

		widget_repr["text/plain"] = formatter.repr_query_job(df._compute_dry_run())
		widget_repr["text/html"] = self._repr_html_fallback_()

	df, _ = self._process_blob_columns()
	# TODO(shuowei): Keep blob_cols and pass them to TableWidget so that they can render properly.
	df, _ = self._process_blob_columns()

refactor: Migrate DataFrame display to use IPython's _repr_mimebundle_() protocol for anywidget mode #2271

Are you sure you want to change the base?

refactor: Migrate DataFrame display to use IPython's _repr_mimebundle_() protocol for anywidget mode #2271

Uh oh!

Conversation

shuoweil commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Nov 14, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shuoweil commented Nov 14, 2025 •

edited

Loading