Skip to content

Conversation

@c21
Copy link
Contributor

@c21 c21 commented Jun 16, 2021

What changes were proposed in this pull request?

NULL-aware ANTI join (https://issues.apache.org/jira/browse/SPARK-32290) detects NULL join keys during building the map for HashedRelation, and will immediately return HashedRelationWithAllNullKeys without taking care of the map built already. Before returning HashedRelationWithAllNullKeys, the map needs to be freed properly to save memory and keep memory accounting correctly.

Why are the changes needed?

Save memory and keep memory accounting correctly for the join query.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing unit tests introduced in #29104 .

@github-actions github-actions bot added the SQL label Jun 16, 2021
@c21
Copy link
Contributor Author

c21 commented Jun 17, 2021

cc @cloud-fan could you help take a look when you have time? Thanks.

@SparkQA
Copy link

SparkQA commented Jun 17, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44422/

@SparkQA
Copy link

SparkQA commented Jun 17, 2021

Test build #139892 has finished for PR 32939 at commit d944e92.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}
} else if (isNullAware) {
binaryMap.free()
return HashedRelationWithAllNullKeys
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we return it before creating the binary map?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan - the NULL-aware ANTI join will be done if there's row with NULL join keys. Note the if (!key.anyNull || allowsNullKey) at line 472. So it will be taking effect when we reading the row with NULL join keys, and this information is not known before creating the binary map

map.append(key, unsafeRow)
} else if (isNullAware) {
map.free()
return HashedRelationWithAllNullKeys
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.1!

@cloud-fan cloud-fan closed this in e0d81d9 Jun 17, 2021
cloud-fan pushed a commit that referenced this pull request Jun 17, 2021
…join

### What changes were proposed in this pull request?

NULL-aware ANTI join (https://issues.apache.org/jira/browse/SPARK-32290) detects NULL join keys during building the map for `HashedRelation`, and will immediately return `HashedRelationWithAllNullKeys` without taking care of the map built already. Before returning `HashedRelationWithAllNullKeys`, the map needs to be freed properly to save memory and keep memory accounting correctly.

### Why are the changes needed?

Save memory and keep memory accounting correctly for the join query.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing unit tests introduced in #29104 .

Closes #32939 from c21/free-null-aware.

Authored-by: Cheng Su <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit e0d81d9)
Signed-off-by: Wenchen Fan <[email protected]>
@c21
Copy link
Contributor Author

c21 commented Jun 17, 2021

Thank you @cloud-fan for quick review!

@c21 c21 deleted the free-null-aware branch June 17, 2021 05:58
flyrain pushed a commit to flyrain/spark that referenced this pull request Sep 21, 2021
…join

### What changes were proposed in this pull request?

NULL-aware ANTI join (https://issues.apache.org/jira/browse/SPARK-32290) detects NULL join keys during building the map for `HashedRelation`, and will immediately return `HashedRelationWithAllNullKeys` without taking care of the map built already. Before returning `HashedRelationWithAllNullKeys`, the map needs to be freed properly to save memory and keep memory accounting correctly.

### Why are the changes needed?

Save memory and keep memory accounting correctly for the join query.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing unit tests introduced in apache#29104 .

Closes apache#32939 from c21/free-null-aware.

Authored-by: Cheng Su <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit e0d81d9)
Signed-off-by: Wenchen Fan <[email protected]>
fishcus pushed a commit to fishcus/spark that referenced this pull request Jan 12, 2022
…join

### What changes were proposed in this pull request?

NULL-aware ANTI join (https://issues.apache.org/jira/browse/SPARK-32290) detects NULL join keys during building the map for `HashedRelation`, and will immediately return `HashedRelationWithAllNullKeys` without taking care of the map built already. Before returning `HashedRelationWithAllNullKeys`, the map needs to be freed properly to save memory and keep memory accounting correctly.

### Why are the changes needed?

Save memory and keep memory accounting correctly for the join query.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing unit tests introduced in apache#29104 .

Closes apache#32939 from c21/free-null-aware.

Authored-by: Cheng Su <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit e0d81d9)
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants