Improvements to the Merge routine#5778
Conversation
| continue; | ||
| DataFrameColumn headColumn = head.Columns[originalColumn.Name]; | ||
| Assert.Equal(originalColumn[5], headColumn[verify[5]]); | ||
| Assert.Equal(originalColumn[7], headColumn[verify[5]]); |
There was a problem hiding this comment.
5 -> 7 because null is not equivalent to default in the GroupBy method anymore
| Assert.Equal(merge.Columns.Count, left.Columns.Count + right.Columns.Count); | ||
| Assert.Null(merge.Columns["Int_left"][12]); | ||
| Assert.Null(merge.Columns["Int_left"][5]); | ||
| Assert.Null(merge.Columns["Int_left"][15]); |
There was a problem hiding this comment.
5 -> 15 because we don't preserve the ordering of rows in OuterJoin, so rows with null values in the Column we are merging on will always go to the end. I don't think this needs to be a requirement. If somebody requests it at some point, we can think about preserving the ordering then.
Codecov Report
@@ Coverage Diff @@
## main #5778 +/- ##
==========================================
+ Coverage 68.33% 68.40% +0.07%
==========================================
Files 1131 1131
Lines 241008 241156 +148
Branches 25025 25033 +8
==========================================
+ Hits 164686 164957 +271
+ Misses 69824 69716 -108
+ Partials 6498 6483 -15
Flags with carried forward coverage won't be shown. Click here to find out more.
|
| for (long i = 0; i < matchedFullRows.Length; i++) | ||
| { | ||
| int rowIndex = matchedFullRows[i]; | ||
| MatchRowsOnMergedDataFrame(merge, left, right, rowIndex, rowIndex, rowIndex); |
There was a problem hiding this comment.
It's actually a part of TestMergeEdgeCases_Left, but I added one now anyway
|
/azp run |
|
Azure Pipelines successfully started running 2 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 2 pipeline(s). |
Currently, we don't differentiate between
nullvalues anddefaultvalues in columns. As a result, these values are considered equivalent in theMergeroutine. This PR fixes that by handling them separately. Also added unit tests so we don't regress in the future.Fixes #5765 and contributes to #5691