Fix state serialization bug with pandas.DataFrame. #126

richard-to · 2024-04-20T22:23:23Z

To fix this issue we add custom JSON encoding and decoding logic.

For most accurate serialization/deserialization, we use Pandas's built in JSON serialization (via to_json) with the "table" strategy. This is more verbose, but provides enough metadata to deserialize as close as possible to the original DataFrame.

To deserialize we use pandas.read_json with orient="table".

One difference I've noticed so far is that pandas.NA becomes np.nan.

Ref: #117

wwwillchen

I think this change makes sense, my only hesitation is that this adds a hard dependency to pandas.

As discussed in #103, there's a fine balance in Mesop between working well with the broader Python ecosystem without necessarily dragging in a lot of Python dependencies.

If you think there's a lot more work in table which will require direct pandas manipulation (e.g. table sorting or table editing), then it may make sense to simply have a dependency on pandas, which is very popular in the Python ecosystem.

OTOH, if this is the only usage of pandas, you could do something like:

try:
  import pandas as pd
   # do pandas-speciifc dataclass logic
except ImportError:
  # non-pandas dataclass logic

mesop/dataclass_utils/BUILD

To fix this issue we add custom JSON encoding and decoding logic. For most accurate serialization/deserialization, we use Pandas's built in JSON serialization (via to_json) with the "table" strategy. This is more verbose, but provides enough metadata to deserialize as close as possible to the original DataFrame. To deserialize we use `pandas.read_json` with `orient="table"`. One difference I've noticed so far is that pandas.NA becomes np.nan.

richard-to · 2024-04-21T20:43:58Z

I think this change makes sense, my only hesitation is that this adds a hard dependency to pandas.

As discussed in #103, there's a fine balance in Mesop between working well with the broader Python ecosystem without necessarily dragging in a lot of Python dependencies.

If you think there's a lot more work in table which will require direct pandas manipulation (e.g. table sorting or table editing), then it may make sense to simply have a dependency on pandas, which is very popular in the Python ecosystem.

OTOH, if this is the only usage of pandas, you could do something like:
try:
  import pandas as pd
   # do pandas-speciifc dataclass logic
except ImportError:
  # non-pandas dataclass logic

That's a great point. About not wanting to include Pandas with the Mesop package. I definitely agree. Updated.

To fix this issue we add custom JSON encoding and decoding logic. For most accurate serialization/deserialization, we use Pandas's built in JSON serialization (via to_json) with the "table" strategy. This is more verbose, but provides enough metadata to deserialize as close as possible to the original DataFrame. To deserialize we use `pandas.read_json` with `orient="table"`. One difference I've noticed so far is that pandas.NA becomes np.nan.

richard-to force-pushed the fix-pandas-serialization branch from 0db6126 to 67a28e5 Compare April 20, 2024 22:39

wwwillchen reviewed Apr 21, 2024

View reviewed changes

mesop/dataclass_utils/BUILD Outdated Show resolved Hide resolved

richard-to force-pushed the fix-pandas-serialization branch from 67a28e5 to 9a57c5e Compare April 21, 2024 20:41

wwwillchen approved these changes Apr 22, 2024

View reviewed changes

richard-to merged commit 7dfe473 into mesop-dev:main Apr 22, 2024
3 checks passed

richard-to deleted the fix-pandas-serialization branch May 4, 2024 17:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix state serialization bug with pandas.DataFrame. #126

Fix state serialization bug with pandas.DataFrame. #126

richard-to commented Apr 20, 2024

wwwillchen left a comment

richard-to commented Apr 21, 2024

Fix state serialization bug with pandas.DataFrame. #126

Fix state serialization bug with pandas.DataFrame. #126

Conversation

richard-to commented Apr 20, 2024

wwwillchen left a comment

Choose a reason for hiding this comment

richard-to commented Apr 21, 2024