Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Perspective to support large tables #7359

Open
MarcSkovMadsen opened this issue Oct 5, 2024 · 3 comments
Open

Enable Perspective to support large tables #7359

MarcSkovMadsen opened this issue Oct 5, 2024 · 3 comments
Labels
type: enhancement Minor feature or improvement to an existing feature
Milestone

Comments

@MarcSkovMadsen
Copy link
Collaborator

MarcSkovMadsen commented Oct 5, 2024

I work with lots of tabular datasets of the size 50-500MB. For exploratory data analysis the Perspective Viewer is really powerful and unique. Unfortunately sending the full datasets to the client is slow and often breaks of a websocket max limitation imposed by Panel, JupyterHub or Kubernetes. You can increase these limits but only to some extend and also this can be outside the control or capability of a data scientist.

I'm increasingly seeing this problem and I'm not the only one seeing this problem (Discourse #6804). Its actually a problem that is very common in Finance and Trading where I work. Currently Excel support larger tables than we do with Perspective. I believe we should enable users to work with larger files than Excel can in Perspective.

Actually Perspective was built to support large tabular data via virtualization. See regular-table and Perspective. But our implementation only use the perspective-viewer web component. Not the advanced client-server virtualization architecture supported.

A Panel user actually showcased how to use the client-server virtualization in Discourse #6430. But its only a complicated to use proof of concept.

Please note that the client-server virtualization architecture seems similar to Mosaic - Mosaic is just built on DuckDB. There is a request to add Mosaic in FR #7358.

Discussion

The Tabulator Pane provides a kind of virtualization via the pagination parameter ("local" or "remote"). We could support a similar parameter with Perspective making it really easy for users. On the other hand there is power in exposing more of the underlying Perspective api like the PerspectiveManager and hosting tables once but using across sessions and users. I think also Panel Perspective pane would be more useful if it implemented the Jupyter Perspective Widget api and capabilities. See PyCon Italy 2024 and PerspectiveWidget Implementation for inspiration.

Today Panel can be running on both Tornado and FastAPI servers. The solution should work in both environments. Personally I want to migrate to FastAPI deployments if that is possible.

Also it should just work in Pyodide because that is where lots of the showcasing of the functionality will take place.

Cannot use JupyterWidget

Unfortunately its not a workaround to use the Jupyter Widget

import pandas as pd
import panel as pn
from perspective.widget import PerspectiveWidget

pn.extension("ipywidgets")

df = pd.DataFrame([[1, 2, 3], [4, 5, 6]], columns=["A", "B", "C"])

p = PerspectiveWidget(df)

pn.pane.IPyWidget(p).servable()

image

@MarcSkovMadsen MarcSkovMadsen added the TRIAGE Default label for untriaged issues label Oct 5, 2024
@MarcSkovMadsen MarcSkovMadsen added type: enhancement Minor feature or improvement to an existing feature and removed TRIAGE Default label for untriaged issues labels Oct 5, 2024
@MarcSkovMadsen MarcSkovMadsen added this to the Wishlist milestone Oct 5, 2024
@texodus
Copy link

texodus commented Oct 6, 2024

Actually Perspective was built to support large tabular data via virtualization.

As you note, Perspective already supports this mode, and PerspectiveWidget already defaults to "server" mode and has a kwarg (binding_mode) for switching to server-client replicated mode - not sure what you are requesting exactly?

Today Panel can be running on both Tornado and FastAPI servers. The solution should work in both environments. Personally I want to migrate to FastAPI deployments if that is possible.
Also it should just work in Pyodide because that is where lots of the showcasing of the functionality will take place.

Perspective already supports all of these environments - we even publish pyodide wheels.

Unfortunately its not a workaround to use the Jupyter Widget

Please use the Issue template we provide and provide a repro. This screenshot does not tell me anything about the nature of the error (except that it is obviously running in some context that wraps exceptions).

EDIT I misread the repo I was commenting on :) - the point remains, I can't make heads nor tails of this screenshot without a Perspective repro.

As I said above, PerspectiveWidget already defaults to "server" mode.

@MarcSkovMadsen
Copy link
Collaborator Author

MarcSkovMadsen commented Oct 7, 2024

Hi @texodus

Thx.

As I read your reply you are thinking about the Perspective Jupyter Widget. This does not work with Panel.

We have our own Panel Perspective widget which only uses the perspective-viewer web component. Not anything else (implementation). Thus all data is transferred to client.

This request is to create a Panel Perspective widget which is as efficient as the Jupyter Perspective widget and can scale to large tables. The user code should be the same and just work across all environments where Panel can run (Tornado, FastAPI, pyodide, PY.CAFE).

@MarcSkovMadsen
Copy link
Collaborator Author

I've started making the architectural problems more specific via code examples in #7368 @texodus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement Minor feature or improvement to an existing feature
Projects
None yet
Development

No branches or pull requests

2 participants