Skip to content

Commit d21aab4

Browse files
viiryadongjoon-hyun
authored andcommitted
[SPARK-30941][PYSPARK] Add a note to asDict to document its behavior when there are duplicate fields
### What changes were proposed in this pull request? Adding a note to document `Row.asDict` behavior when there are duplicate fields. ### Why are the changes needed? When a row contains duplicate fields, `asDict` and `_get_item_` behaves differently. We should document it to let users know the difference explicitly. ### Does this PR introduce any user-facing change? No. Only document change. ### How was this patch tested? Existing test. Closes #27853 from viirya/SPARK-30941. Authored-by: Liang-Chi Hsieh <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
1 parent b6b0343 commit d21aab4

File tree

1 file changed

+6
-0
lines changed

1 file changed

+6
-0
lines changed

python/pyspark/sql/types.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1528,6 +1528,12 @@ def asDict(self, recursive=False):
15281528
15291529
:param recursive: turns the nested Rows to dict (default: False).
15301530
1531+
.. note:: If a row contains duplicate field names, e.g., the rows of a join
1532+
between two :class:`DataFrame` that both have the fields of same names,
1533+
one of the duplicate fields will be selected by ``asDict``. ``__getitem__``
1534+
will also return one of the duplicate fields, however returned value might
1535+
be different to ``asDict``.
1536+
15311537
>>> Row(name="Alice", age=11).asDict() == {'name': 'Alice', 'age': 11}
15321538
True
15331539
>>> row = Row(key=1, value=Row(name='a', age=2))

0 commit comments

Comments
 (0)