Skip to content

Conversation

@bolkedebruin
Copy link
Contributor

@bolkedebruin bolkedebruin commented Dec 2, 2022

This refactors serialization so it is now modular and allows for extending. It implements versioning and allows namespace packages to be added to allow custom serializers for third-party objects.

This should replace all other serializers (eventually, only DAG remains).


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

Comment on lines +49 to +55
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this is needed for things like below:

In [1]: import numpy

In [2]: numpy.int64(2)
Out[2]: 2

In [3]: type(_2)
Out[3]: numpy.int64

In [4]: type(_2).__qualname__
Out[4]: 'int64'

In [5]: type(_2).__module__
Out[5]: 'numpy'

In [6]: _2.__qualname__
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [6], in <cell line: 1>()
----> 1 _2.__qualname__

AttributeError: 'numpy.int64' object has no attribute '__qualname__'

In [7]: _2.__module__
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [7], in <cell line: 1>()
----> 1 _2.__module__

AttributeError: 'numpy.int64' object has no attribute '__module__'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we try to get it from version too as a fallback? as that was shipped with 2.5?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VERSION: getattr(cls, "__version__", DEFAULT_VERSION),
VERSION: getattr(cls, "__version__", getattr(cls, "version", DEFAULT_VERSION)),

Copy link
Contributor Author

@bolkedebruin bolkedebruin Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it was shipped, but nowhere implemented, so only custom classes after that could use this. Some of the providers collide with version (e.g. ggoogle) so we are obtaining the wrong version then. So I'd prefer to take the small hit and this is PEP 8/480 compatible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is only really used by SerializedDAG, I think it’s more productive to simply have deserialize(...) -> Any and add a separate deserialize_dag(...) -> SerializedDAG that simply calls deserialize internally and provide a better return type hint.

Copy link
Contributor Author

@bolkedebruin bolkedebruin Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not only used by SerializedDAG (why do you think so?) it will be used by XCom too (which is even the first implementation). deserialize_dag will be removed as it is non-standard.

@bolkedebruin bolkedebruin force-pushed the refactor_serialize branch 2 times, most recently from 6b92ce0 to 97b0f8c Compare December 6, 2022 14:36
@bolkedebruin bolkedebruin changed the title WIP: Refactor serialize Refactor serialization Dec 7, 2022
Seralization and deserialization (derser) of objects that
were not under the control of Airflow of the dag author
was not possible. This implementation allows for extension
of serialization of arbitrary objects by putting them in
the namespace of airflow. Selection of serializer happens in
order of registered serializer, custom serializer, attr/dataclass
annottated object.
@bolkedebruin bolkedebruin requested a review from uranusjr December 7, 2022 17:31
@bolkedebruin bolkedebruin requested a review from mik-laj as a code owner December 8, 2022 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:plugins area:providers area:serialization area:webserver Webserver related Issues provider:amazon AWS/Amazon - related issues type:improvement Changelog: Improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants