Skip to content

Conversation

@shuoweil
Copy link
Contributor

@shuoweil shuoweil commented Nov 14, 2025

This PR refactors the DataFrame display system to properly implement IPython's repr_mimebundle() protocol for anywidget mode, moving widget handling logic out of repr_html() and into the appropriate display method.

See design doc here: screen/8XWbJDa2d9mb6r5

Fixes #<458796812> 🦕

@shuoweil shuoweil requested a review from tswast November 14, 2025 04:39
@shuoweil shuoweil self-assigned this Nov 14, 2025
@shuoweil shuoweil requested review from a team as code owners November 14, 2025 04:39
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@product-auto-label product-auto-label bot added size: l Pull request size is large. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Nov 14, 2025

def _repr_html_(self, *args, **kwargs):
return _return_type_wrapper(self._df._repr_html_, StreamingDataFrame)(
def _repr_html_fallback_(self, *args, **kwargs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need the trailing _ because we aren't trying to get ipython to automatically call this. In fact, we do not want it to call this directly.

# metadata, like we do with DataFrame.
opts = bigframes.options.display
max_results = opts.max_rows
# anywdiget mode uses the same display logic as the "deferred" mode
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is out-of-date.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed outdated comments

return "\n".join(lines)

def _repr_html_(self) -> str:
def _repr_html_fallback_(self) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, remove the trailing _. Just, _repr_html_fallback.

Comment on lines 947 to 948
widget_repr["text/plain"] = formatter.repr_query_job(df._compute_dry_run())
widget_repr["text/html"] = self._repr_html_fallback_()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, we have already executed the query as part of the widget construction. Let's use the information available to render the HTML and plain text versions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimized to use the data already fetched by the widget. Now it generates the HTML and plain text representations form the widget's cached data, avoiding redundant queries

Comment on lines 962 to 970
except (AttributeError, ValueError, ImportError):
# Fallback: let IPython use _repr_html_fallback_() instead
warnings.warn(
"Anywidget mode is not available. "
"Please `pip install anywidget traitlets` or `pip install 'bigframes[anywidget]'` to use interactive tables. "
f"Falling back to static HTML. Error: {traceback.format_exc()}"
)
# Don't return anything - let IPython fall back to _repr_html_fallback_()
pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this except block. I would prefer we explicitly check for anywidget in particular to be installed. Right now, this could be hiding all sorts of errors.

If we must have it, then please remove all exception types except for ImportError.

# Don't return anything - let IPython fall back to _repr_html_fallback_()
pass

return {"text/html": self._repr_html_fallback_(), "text/plain": repr(self)}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be calling more BQ API calls than necessary. Let's call to_pandas_batches() and fetch the first page, just like we do in the anywidget version.

@product-auto-label product-auto-label bot added size: xl Pull request size is extra large. and removed size: l Pull request size is large. labels Nov 20, 2025
@shuoweil shuoweil requested a review from tswast November 20, 2025 20:35
Comment on lines 922 to 932
# Process blob columns if needed
self._cached()
df = self.copy()
if bigframes.options.display.blob_display:
blob_cols = [
series_name
for series_name, series in df.items()
if series.dtype == bigframes.dtypes.OBJ_REF_DTYPE
]
for col in blob_cols:
df[col] = df[col].blob._get_runtime(mode="R", with_metadata=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks duplicated with _repr_mimebundle_. Please pull it out into a separate function.

Comment on lines 949 to 970
# Re-create the text representation from what we know.
opts = bigframes.options.display
with display_options.pandas_repr(opts):
import pandas.io.formats

# safe to mutate this, this dict is owned by this code, and does not affect global config
to_string_kwargs = (
pandas.io.formats.format.get_dataframe_repr_params() # type: ignore
)
if not self._has_index:
to_string_kwargs.update({"index": False})
repr_string = widget._cached_data.to_string(**to_string_kwargs)

lines = repr_string.split("\n")
row_count = widget.row_count
if row_count is not None and row_count > len(widget._cached_data):
lines.append("...")

lines.append("")
column_count = len(self.columns)
lines.append(f"[{row_count or '?'} rows x {column_count} columns]")
widget_repr["text/plain"] = "\n".join(lines)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, this looks duplicated as well. Please get the common parameters out and refactor to a shared function.

Comment on lines 1020 to 1055
# Generate HTML representation
with display_options.pandas_repr(opts):
if bigframes.options.display.blob_display and blob_cols:

def obj_ref_rt_to_html(obj_ref_rt) -> str:
obj_ref_rt_json = json.loads(obj_ref_rt)
obj_ref_details = obj_ref_rt_json["objectref"]["details"]
if "gcs_metadata" in obj_ref_details:
gcs_metadata = obj_ref_details["gcs_metadata"]
content_type = typing.cast(
str, gcs_metadata.get("content_type", "")
)
if content_type.startswith("image"):
size_str = ""
if bigframes.options.display.blob_display_width:
size_str = f' width="{bigframes.options.display.blob_display_width}"'
if bigframes.options.display.blob_display_height:
size_str = (
size_str
+ f' height="{bigframes.options.display.blob_display_height}"'
)
url = obj_ref_rt_json["access_urls"]["read_url"]
return f'<img src="{url}"{size_str}>'

return f'uri: {obj_ref_rt_json["objectref"]["uri"]}, authorizer: {obj_ref_rt_json["objectref"]["authorizer"]}'

formatters = {blob_col: obj_ref_rt_to_html for blob_col in blob_cols}
with pandas.option_context("display.max_colwidth", None):
html_string = pandas_df.to_html(
escape=False,
notebook=True,
max_rows=pandas.get_option("display.max_rows"),
max_cols=pandas.get_option("display.max_columns"),
show_dimensions=pandas.get_option("display.show_dimensions"),
formatters=formatters, # type: ignore
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. Shouldn't we be using the _repr_html_fallback method you created here for this purpose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing it out. It was confusing and the logic was duplicated. I have refactored the display logic. A new private function _create_html_represenetation for the shared logic. _repr_html_fallback and repr_mimebundle now call this new method. The code duplication is removed.

@shuoweil shuoweil force-pushed the shuowei-anywidget-html-repr branch from c79425c to 86e0b6a Compare November 20, 2025 22:41
@shuoweil shuoweil requested a review from tswast November 20, 2025 23:44
Comment on lines 830 to 853
def _repr_html_fallback(self) -> str:
"""
Returns an html string primarily for use by notebooks for displaying
a representation of the DataFrame. Displays 20 rows by default since
many notebooks are not configured for large tables.
"""
opts = bigframes.options.display
max_results = opts.max_rows
if opts.repr_mode == "deferred":
return formatter.repr_query_job(self._compute_dry_run())

# Process blob columns first, regardless of display mode
# Process blob columns first for non-deferred modes
df, blob_cols = self._process_blob_columns()

pandas_df, row_count, query_job = df._block.retrieve_repr_request_results(
max_results
)

self._set_internal_query_job(query_job)
column_count = len(pandas_df.columns)

return self._create_html_representation(
pandas_df, row_count, column_count, blob_cols
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing references to this _repr_html_fallback function. I think it is dead code and can be deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_repr_html_fallback is referenced in bigframes/streaming/dataframe.py. This chunk of code is needed.

Comment on lines +858 to +867
df = self
blob_cols = []
if bigframes.options.display.blob_display:
blob_cols = [
series_name
for series_name, series in df.items()
for series_name, series in self.items()
if series.dtype == bigframes.dtypes.OBJ_REF_DTYPE
]
for col in blob_cols:
# TODO(garrettwu): Not necessary to get access urls for all the rows. Update when having a to get URLs from local data.
df[col] = df[col].blob._get_runtime(mode="R", with_metadata=True)
if blob_cols:
df = self.copy()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we define self twice? copy() is a pretty cheap operation (just makes a new block referencing the same expression), so I think we can just define it once.

Suggested change
df = self
blob_cols = []
if bigframes.options.display.blob_display:
blob_cols = [
series_name
for series_name, series in df.items()
for series_name, series in self.items()
if series.dtype == bigframes.dtypes.OBJ_REF_DTYPE
]
for col in blob_cols:
# TODO(garrettwu): Not necessary to get access urls for all the rows. Update when having a to get URLs from local data.
df[col] = df[col].blob._get_runtime(mode="R", with_metadata=True)
if blob_cols:
df = self.copy()
df = self.copy()
blob_cols = []
if bigframes.options.display.blob_display:
blob_cols = [
series_name
for series_name, series in df.items()
if series.dtype == bigframes.dtypes.OBJ_REF_DTYPE
]
if blob_cols:


def _process_blob_columns(self) -> tuple[DataFrame, list[str]]:
"""Process blob columns for display."""
self._cached()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like the wrong function in which to call _cached(). _cached() can be very expensive (causes a query). Seems better to just call it once if needed in repr_mimebundle`.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call to _cached() was removed from _process_blob_columns (now _get_display_df_and_blob_cols) to avoid unnecessary queries, as suggested.

"""
from bigframes import display

df, _ = self._process_blob_columns()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are we doing when we are "processing"? Seems like we should have a more descriptive name here that describes the purpose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've renamed _process_blob_columns to _get_display_df_and_blob_cols to better reflect its purpose of preparing the DataFrame for display, including handling blob columns.

Comment on lines 943 to 950
# Fallback: let IPython use _repr_html_fallback() instead
warnings.warn(
"Anywidget mode is not available. "
"Please `pip install anywidget traitlets` or `pip install 'bigframes[anywidget]'` to use interactive tables. "
f"Falling back to deferred mode. Error: {traceback.format_exc()}"
f"Falling back to static HTML. Error: {traceback.format_exc()}"
)
return formatter.repr_query_job(self._compute_dry_run())
# Don't return anything - let IPython fall back to _repr_html_fallback()
pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: pass is only necessary if there is no body, but we do have a body with the warning above.

Also, we are manually falling back.

Suggested change
# Fallback: let IPython use _repr_html_fallback() instead
warnings.warn(
"Anywidget mode is not available. "
"Please `pip install anywidget traitlets` or `pip install 'bigframes[anywidget]'` to use interactive tables. "
f"Falling back to deferred mode. Error: {traceback.format_exc()}"
f"Falling back to static HTML. Error: {traceback.format_exc()}"
)
return formatter.repr_query_job(self._compute_dry_run())
# Don't return anything - let IPython fall back to _repr_html_fallback()
pass
# Anywidget is an optional dependency, so warn rather than fail.
# TODO(shuowei): When Anywidget becomes the default for all repr modes,
# remove this warning.
warnings.warn(
"Anywidget mode is not available. "
"Please `pip install anywidget traitlets` or `pip install 'bigframes[anywidget]'` to use interactive tables. "
f"Falling back to static HTML. Error: {traceback.format_exc()}"
)

"""
from bigframes import display

df, _ = self._process_blob_columns()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
df, _ = self._process_blob_columns()
# TODO(shuowei): Keep blob_cols and pass them to TableWidget so that they can render properly.
df, _ = self._process_blob_columns()

@shuoweil shuoweil requested a review from tswast November 21, 2025 23:03
@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: xl Pull request size is extra large. labels Nov 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: l Pull request size is large.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants