Skip to content

Conversation

@zeruibao
Copy link
Contributor

@zeruibao zeruibao commented Dec 3, 2025

What changes were proposed in this pull request?

Fix type handling of namedTuple for transfromWithState

Why are the changes needed?

We hit the issue when using namedTuple as value of structType like

class Person(NamedTuple):
    age: Integer
    name: String
    
 def handleInputRows(
        self, 
        key: Any, 
        rows: Iterator[Row], 
        timerValues: TimerValues
    ) -> Iterator[Row]:             
        person: Person = Person(age = 1, name= "peter")
        person_list = []
        person_list.append(person)
        self.person_list.update((person_list,))    

The _serialize_to_bytes cannot construct the namedTuple correctly and hit

File "/databricks/spark/python/pyspark/sql/streaming/stateful_processor_api_client.py", line 575, in normalize_value
    return type(v)(normalize_value(e) for e in v)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Person.__new__() missing 2 required positional arguments: 'age' and 'name'

It's because NamedTuple cannot accept generator as parameter.

Does this PR introduce any user-facing change?

No

How was this patch tested?

UT

Was this patch authored or co-authored using generative AI tooling?

No

@zeruibao zeruibao changed the title [SPARK-5192] Fix type handling of namedTuple for transfromWithStateInPandas [SPARK-5192] Fix type handling of namedTuple for transfromWithState Dec 3, 2025
# Named tuples (collections.namedtuple or typing.NamedTuple) have a
# _fields attribute. Spark Row has __fields__. Both require positional
# arguments and cannot be instantiated with a generator expression.
if hasattr(v, '_fields') or hasattr(v, '__fields__'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think checking the type is much safer than checking an attribute, especially considering that _fields is a not a rare attribute name. If we know what type we are targeting, we should just check type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good! You are so fast. Did not get a chance to add a UT yet 😛 just convert to draft.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @gaogaotiantian, I use

                if (
                    isinstance(v, Row) or
                    (isinstance(v, tuple) and hasattr(v, "_fields"))
                ):

instead. isinstance(v, NamedTuple) won’t work because typing.NamedTuple is a class factory, not a runtime parent of instances. Checking isinstance(v, tuple) and _fields is the correct way. Please take another look. Thanks!

@zeruibao zeruibao marked this pull request as draft December 3, 2025 22:44
@github-actions github-actions bot added the SQL label Dec 3, 2025
@zeruibao zeruibao marked this pull request as ready for review December 4, 2025 00:19
Copy link
Contributor

@bogao007 bogao007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, one minor comment on the test.

@HyukjinKwon HyukjinKwon changed the title [SPARK-5192] Fix type handling of namedTuple for transfromWithState [SPARK-5192][SS] Fix type handling of namedTuple for transfromWithState Dec 4, 2025
@HyukjinKwon HyukjinKwon changed the title [SPARK-5192][SS] Fix type handling of namedTuple for transfromWithState [SPARK-5192][SS][PYTHON] Fix type handling of namedTuple for transfromWithState Dec 4, 2025

# A stateful processor that contains composite python type inside Value, List and Map state variable
class PandasStatefulProcessorCompositeType(StatefulProcessor):
from typing import NamedTuple
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this to the top of the test module? Is there a specific reason that you want it in the class definition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right. Just moved it the top!

@zeruibao zeruibao changed the title [SPARK-5192][SS][PYTHON] Fix type handling of namedTuple for transfromWithState [SPARK-51920][SS][PYTHON] Fix type handling of namedTuple for transfromWithState Dec 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants