Skip to content

Fix state serialization bug with pandas.DataFrame. #126

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 22, 2024

Conversation

richard-to
Copy link
Collaborator

To fix this issue we add custom JSON encoding and decoding logic.

For most accurate serialization/deserialization, we use Pandas's built in JSON serialization (via to_json) with the "table" strategy. This is more verbose, but provides enough metadata to deserialize as close as possible to the original DataFrame.

To deserialize we use pandas.read_json with orient="table".

One difference I've noticed so far is that pandas.NA becomes np.nan.

Ref: #117

@richard-to richard-to force-pushed the fix-pandas-serialization branch from 0db6126 to 67a28e5 Compare April 20, 2024 22:39
Copy link
Collaborator

@wwwillchen wwwillchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change makes sense, my only hesitation is that this adds a hard dependency to pandas.

As discussed in #103, there's a fine balance in Mesop between working well with the broader Python ecosystem without necessarily dragging in a lot of Python dependencies.

If you think there's a lot more work in table which will require direct pandas manipulation (e.g. table sorting or table editing), then it may make sense to simply have a dependency on pandas, which is very popular in the Python ecosystem.

OTOH, if this is the only usage of pandas, you could do something like:

try:
  import pandas as pd
   # do pandas-speciifc dataclass logic
except ImportError:
  # non-pandas dataclass logic

To fix this issue we add custom JSON encoding and decoding logic.

For most accurate serialization/deserialization, we use Pandas's built
in JSON serialization (via to_json) with the "table" strategy. This is
more verbose, but provides enough metadata to deserialize as close as
possible to the original DataFrame.

To deserialize we use `pandas.read_json` with `orient="table"`.

One difference I've noticed so far is that pandas.NA becomes np.nan.
@richard-to richard-to force-pushed the fix-pandas-serialization branch from 67a28e5 to 9a57c5e Compare April 21, 2024 20:41
@richard-to
Copy link
Collaborator Author

I think this change makes sense, my only hesitation is that this adds a hard dependency to pandas.

As discussed in #103, there's a fine balance in Mesop between working well with the broader Python ecosystem without necessarily dragging in a lot of Python dependencies.

If you think there's a lot more work in table which will require direct pandas manipulation (e.g. table sorting or table editing), then it may make sense to simply have a dependency on pandas, which is very popular in the Python ecosystem.

OTOH, if this is the only usage of pandas, you could do something like:

try:
  import pandas as pd
   # do pandas-speciifc dataclass logic
except ImportError:
  # non-pandas dataclass logic

That's a great point. About not wanting to include Pandas with the Mesop package. I definitely agree. Updated.

@richard-to richard-to merged commit 7dfe473 into mesop-dev:main Apr 22, 2024
3 checks passed
richard-to added a commit to richard-to/mesop that referenced this pull request Apr 28, 2024
To fix this issue we add custom JSON encoding and decoding logic.

For most accurate serialization/deserialization, we use Pandas's built
in JSON serialization (via to_json) with the "table" strategy. This is
more verbose, but provides enough metadata to deserialize as close as
possible to the original DataFrame.

To deserialize we use `pandas.read_json` with `orient="table"`.

One difference I've noticed so far is that pandas.NA becomes np.nan.
@richard-to richard-to deleted the fix-pandas-serialization branch May 4, 2024 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants