-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
Fix pd.merge to preserve ExtensionArrays dtypes #20745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
73d64eb
716e928
8824a47
884510c
9cf8cfe
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -95,3 +95,24 @@ def test_set_frame_overwrite_object(self, data): | |
| df = pd.DataFrame({"A": [1] * len(data)}, dtype=object) | ||
| df['A'] = data | ||
| assert df.dtypes['A'] == data.dtype | ||
|
|
||
| def test_merge(self, data, na_value): | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you prob should test with with the
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. They both need a different expected result, I don't think it is really worth it here in this case?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok, its worth doing way more tests than a single usecase. but ok here I guess. |
||
|
|
||
|
||
| df1 = pd.DataFrame({'int1': [1, 2, 3], 'key': [0, 1, 2], | ||
| 'ext': data[:3]}) | ||
| df2 = pd.DataFrame({'int2': [1, 2, 3, 4], 'key': [0, 0, 1, 3]}) | ||
|
|
||
| res = pd.merge(df1, df2) | ||
| exp = pd.DataFrame( | ||
| {'int1': [1, 1, 2], 'int2': [1, 2, 3], 'key': [0, 0, 1], | ||
| 'ext': data._constructor_from_sequence( | ||
| [data[0], data[0], data[1]])}) | ||
| self.assert_frame_equal(res, exp[['ext', 'int1', 'key', 'int2']]) | ||
|
|
||
| res = pd.merge(df1, df2, how='outer') | ||
| exp = pd.DataFrame( | ||
| {'int1': [1, 1, 2, 3, np.nan], 'int2': [1, 2, 3, np.nan, 4], | ||
| 'key': [0, 0, 1, 2, 3], | ||
| 'ext': data._constructor_from_sequence( | ||
| [data[0], data[0], data[1], data[2], na_value])}) | ||
| self.assert_frame_equal(res, exp[['ext', 'int1', 'key', 'int2']]) | ||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of a hack, would like to have a better solution.
(related to discussion earlier this day in #20721 (comment) about deprecating Index.base)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should only check base if concat_values is an ndarray
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback updated