Skip to content

Conversation

@bruAristimunha
Copy link
Contributor

Reference issue (if any)

Fixes #13487.

What does this implement/fix?

Introduce methods to convert Info objects to and from JSON-serializable dictionaries, enhancing data handling and storage capabilities. Adjust tests and pre-commit hooks accordingly.

Additional information

@bruAristimunha bruAristimunha mentioned this pull request Nov 14, 2025
@larsoner larsoner added this to the 1.11 milestone Nov 17, 2025
Copy link
Member

@drammock drammock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diff looks more or less as I was imagining. My only question is why the separation of __getstate__ and to_dict (and similar for __setstate__ and from_dict). What's the advantage of splitting like that? Why not call _make_serializable within __getstate__ and omit the to_dict method? (and similar question for _restore_objects within __setstate__)

Copy link
Member

@agramfort agramfort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I'll let @drammock @larsoner fully stamp this !

@bruAristimunha
Copy link
Contributor Author

hey @drammock,

Good and hard question, I think my logic was to try to disturb the current mne.info API as little as possible.
I think the rationale is that the "pickle" logic with setstate and _getstate _, and the JSON handler, handles different kinds of data.

  • __getstate__() can just return the data as-is (no conversion needed), to the best of my knowledge.

and the JSON only supports primitives (dict, list, str, int, float, bool, None), so we need conversions:

  • to_dict() converts: numpy arrays → lists, datetime → ISO strings
  • from_dict() converts back: lists → numpy arrays, ISO strings → datetime

@drammock
Copy link
Member

__getstate__() can just return the data as-is (no conversion needed), to the best of my knowledge.

I don't think that's right, pickling sometimes also requires some conversions; see https://docs.python.org/3/library/pickle.html#what-can-be-pickled-and-unpickled. Maybe another way to put my question would be this: what is the disadvantage of doing everything inside __getstate__ and using that for both pickling and JSON-compatible serialization? Make it do whichever one is stricter, IIUC they're not incommensurable (pickle allows including functions/classes/class instances under some circumstances, but it won't hurt to reduce those elements to something JSON-compatible as long as they're properly restored afterwards).

@bruAristimunha
Copy link
Contributor Author

hmmm, i think it is possible @drammock.

I find it particularly more interesting to have a dedicated function to make the behavior more explicit.
Also, I am not super sure about the from_dict function, i think this function need a dedicate function to assemble back the mne info object given the JSON. But if you know of any way, I can implement the behavior without the function.

But I understand that mne info is a dictionary and should behave as such, without needing an extra function.
Now, about the to_dict function, just double checking before doing, you would suggest something like this, right?

def _state_dict(self, *, json_safe=False):
    state = dict(self)
    if json_safe:
        state = _make_serializable(state)
    return state

def __getstate__(self):
    return _state_dict(self, json_safe=False)

def to_dict(self):
    return _state_dict(self, json_safe=True)
    

or even assume that we always gonna have json_safe=True, and just serialize everything as json, I think that would potentially break the API with the .fiff save function, but I'm really not sure, but it would be not super complicated to fix.

By indicating your preference, I can implement what is most convenient here; I don't have a strong preference.

@drammock
Copy link
Member

I find it particularly more interesting to have a dedicated function to make the behavior more explicit.

I don't understand. Are you saying "I want something in the public API (not a dunder)"? Or are you saying "I want to support pickling, and I want pickling to behave differently than serialization intended for JSON"? Or something else?

What I've been trying to express all this time is that if we need JSON-compatible serialization (which you seem to want) then why not implement it entirely using the built-in dunders? Why do we need separate to/from_dict methods, and why do those need to behave differently than the dunders? A JSON-compatible serialization should ALSO be pickle-compatible. So why not put the code all in one place?

I am not super sure about the from_dict function, i think this function need a dedicate function to assemble back the mne info object given the JSON

Yes, of course we need to properly reassemble the object from its serialized form. What I'm saying is that all the code to do that can live inside __setstate__

assume that we always gonna have json_safe=True, and just serialize everything as json, I think that would potentially break the API with the .fiff save function

saving to FIFF uses a completely different code path (write_info -> write_meas_info) which AFAICT you're not touching at all here. Why would you expect that to break simply by moving the code in to_dict inside of __getstate__?

@bruAristimunha
Copy link
Contributor Author

I finally understood what you were saying from the beginning! Sorry for my mental delay, now everything is serialization in the setstate and getstate

@bruAristimunha
Copy link
Contributor Author

Now, the function may not seem necessary, but I'm keeping it to maintain version control of the MNE, especially considering the significant governance agenda within MNE Info. I'm afraid that without this version marker, we might not be able to read a JSON file in 2 or 4 years, but I could be wrong.

@bruAristimunha
Copy link
Contributor Author

So, answering the question, I think we should have a public function that makes things more straightforward.

@larsoner
Copy link
Member

Now, the function may not seem necessary, but I'm keeping it to maintain version control of the MNE, especially considering the significant governance agenda within MNE Info. I'm afraid that without this version marker, we might not be able to read a JSON file in 2 or 4 years, but I could be wrong.

FWIW we have been writing to/from h5io i.e. builtin Python types since at least 5 years ago (e.g., 508cfbc#diff-71d579476b8eb5f4b806934f41d7061b8c6116ff2efc4da0ba10c9f97aaa85faR525-R526) with very few issues. The ones that come up have to do with stuff like us changing a type from tuple to datetime or so, and from what I recall haven't been too bad to fix

@bruAristimunha
Copy link
Contributor Author

Ah, okay, I can drop them the version reference, no problem for me. Do you think I should remove the function call to_dict as well and only leave from_dict?

Copy link
Member

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to send a couple of inline comments

Regarding making functions public, the fact that this doesn't work:

>>> json.dumps(mne.create_info(1, 1000., "eeg"))
...
TypeError: Object of type ndarray is not JSON serializable

suggests that it is worth having something public so people can do .to_dict

>>> json.dumps(mne.create_info(1, 1000., "eeg").to_dict())
'{"acq_pars": null, "acq_stim": null, "ctf_head_t": null, "description": null, "dev_head_t": null, "dev_ctf_t": null, "dig": null, "experimenter": null, "utc_offset": null, "device_info": null, "file_id": null, "highpass": 0.0, "hpi_subsystem": null, "kit_system_id": null, "helium_info": null, "line_freq": null, "lowpass": 500.0, "meas_date": null, "meas_id": null, "proj_id": null,
...

and having a to_dict/from_dict pair makes sense

@bruAristimunha
Copy link
Contributor Author

@larsoner, @drammock, many thanks for the iterations. I'm happier with the code; it's much cleaner and uses all the good and excellent design. I feel like I'm always learning from you, thank you so much for your time reviewing!

If you see any additional points, please let me know so I can address them 🙏🏽 :)

@bruAristimunha
Copy link
Contributor Author

I removed the mne_version as previously requested.

Copy link
Member

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay one minor tweak: I don't know if to_dict is specific enough. I think to_json_dict is probably better, paired with from_json_dict or similar.

And one major one that I only now just realized: I think the _mne_type gynmastics should actually only be done in from_json_dict, and to_json_dict. __setstate__ and __getstate__ didn't need these before, so having them now seems unnecessary as well. On main I can pickle.dumps(info) and it works, and the result is shorter than on this PR. This makes me worry that we will incur unnecessary conversion overhead in things like parallel processing (which uses pickling) by routing all __setstate__ and __getstate__ mechanics through the _mne_type conversions.

@larsoner larsoner changed the title Implement serialization and deserialization for Info objects Implement JSON serialization and deserialization for Info objects Nov 19, 2025
@drammock
Copy link
Member

And one major one that I only now just realized: I think the _mne_type gynmastics should actually only be done in from_json_dict, and to_json_dict. __setstate__ and __getstate__ didn't need these before, so having them now seems unnecessary as well. On main I can pickle.dumps(info) and it works, and the result is shorter than on this PR. This makes me worry that we will incur unnecessary conversion overhead in things like parallel processing (which uses pickling) by routing all __setstate__ and __getstate__ mechanics through the _mne_type conversions.

argh. I think @larsoner has a good point, sorry that this is the opposite of what I pushed for you to do @bruAristimunha

@agramfort
Copy link
Member

agramfort commented Nov 20, 2025 via email

Copy link
Member

@larsoner larsoner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay this looks nicely isolated now -- thanks @bruAristimunha !

@larsoner larsoner merged commit ff3e0d5 into mne-tools:main Nov 20, 2025
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow easy serialization of the object mne info

4 participants