Skip to content

Conversation

@maropu
Copy link
Member

@maropu maropu commented Mar 17, 2020

What changes were proposed in this pull request?

This pr fixed code for respecting a multi-part identifier (e.g.,dbname.tablename) for join strategy hint resolution. For example, the master ignores a database name in a hint parameter;

scala> sql("CREATE DATABASE testDb")
scala> spark.range(10).write.saveAsTable("testDb.t")

// without this patch
scala> spark.range(10).join(spark.table("testDb.t"), "id").hint("broadcast", "testDb.t").explain
== Physical Plan ==
*(2) Project [id#24L]
+- *(2) BroadcastHashJoin [id#24L], [id#26L], Inner, BuildLeft
   :- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]))
   :  +- *(1) Range (0, 10, step=1, splits=4)
   +- *(2) Project [id#26L]
      +- *(2) Filter isnotnull(id#26L)
         +- *(2) FileScan parquet testdb.t[id#26L] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/Users/maropu/Repositories/spark/spark-2.3.1-bin-hadoop2.7/spark-warehouse..., PartitionFilters: [], PushedFilters: [IsNotNull(id)], ReadSchema: struct<id:bigint>

// with this patch
scala> spark.range(10).join(spark.table("testDb.t"), "id").hint("broadcast", "testDb.t").explain
== Physical Plan ==
*(2) Project [id#3L]
+- *(2) BroadcastHashJoin [id#3L], [id#5L], Inner, BuildRight
   :- *(2) Range (0, 10, step=1, splits=4)
   +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, true]))
      +- *(1) Project [id#5L]
         +- *(1) Filter isnotnull(id#5L)
            +- *(1) FileScan parquet testdb.t[id#5L] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/Users/maropu/Repositories/spark/spark-master/spark-warehouse/testdb.db/t], PartitionFilters: [], PushedFilters: [IsNotNull(id)], ReadSchema: struct<id:bigint>

This PR comes from #22198

Why are the changes needed?

For better usability.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added unit tests.

@maropu
Copy link
Member Author

maropu commented Mar 17, 2020

Not ready for reviews.

@SparkQA
Copy link

SparkQA commented Mar 17, 2020

Test build #119924 has finished for PR 27935 at commit 57ced2b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait BaseIdentifier
  • sealed trait IdentifierWithDatabase extends BaseIdentifier
  • case class AliasIdentifier(name: String, qualifier: Seq[String]) extends BaseIdentifier

@maropu maropu force-pushed the SPARK-25121-2 branch 2 times, most recently from b276c1c to 9a0c68c Compare March 17, 2020 12:18
@SparkQA
Copy link

SparkQA commented Mar 17, 2020

Test build #119929 has finished for PR 27935 at commit b276c1c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait BaseIdentifier
  • sealed trait IdentifierWithDatabase extends BaseIdentifier
  • case class AliasIdentifier(name: String, qualifier: Seq[String]) extends BaseIdentifier

@SparkQA
Copy link

SparkQA commented Mar 17, 2020

Test build #119931 has finished for PR 27935 at commit 9a0c68c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait BaseIdentifier
  • sealed trait IdentifierWithDatabase extends BaseIdentifier
  • case class AliasIdentifier(name: String, qualifier: Seq[String]) extends BaseIdentifier

@maropu maropu changed the title [WIP][SPARK-25121][SQL] Supports multi-part table names for broadcast hint resolution [SPARK-25121][SQL] Supports multi-part table names for broadcast hint resolution Mar 18, 2020
// For example, in a query `SELECT /* BROADCAST(default.t) */ * FROM default.t JOIN t`,
// the broadcast hint will match the left-side table only, `default.t`.
//
// 3. otherwise, no match happens.
Copy link
Member Author

@maropu maropu Mar 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-read your comments again in #22198 and summarized up them above. If I misunderstand something, please let me know. @dongjoon-hyun @cloud-fan

@SparkQA
Copy link

SparkQA commented Mar 18, 2020

Test build #119963 has finished for PR 27935 at commit 52089fc.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait BaseIdentifier
  • sealed trait IdentifierWithDatabase extends BaseIdentifier
  • case class AliasIdentifier(name: String, qualifier: Seq[String]) extends BaseIdentifier

//
// 1. they match if an identifier in a hint only has one part and it is the same with
// a relation name in a query. If a relation has a namespace (`db1.t`), we just ignore it.
// For example, in a query `SELECT /* BROADCAST(t) */ * FROM db1.t JOIN t`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the existing behavior?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @maryannxue as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I checked queries below in v2.4.5;


// v2.4.5
scala> sql("CREATE DATABASE db1")
scala> sql("CREATE TABLE db1.t(key int)")
scala> sql("CREATE TABLE t(key int)")
scala> sql("""SELECT /*+ MAPJOIN(t) */ * FROM db1.t JOIN t""")
== Parsed Logical Plan ==
'UnresolvedHint MAPJOIN, ['t]
+- 'Project [*]
   +- 'Join Inner
      :- 'UnresolvedRelation `db1`.`t`
      +- 'UnresolvedRelation `t`

== Analyzed Logical Plan ==
key: int, key: int
Project [key#20, key#21]
+- Join Inner
   :- ResolvedHint (broadcast)
   :  +- SubqueryAlias `db1`.`t`
   :     +- HiveTableRelation `db1`.`t`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [key#20]
   +- ResolvedHint (broadcast)
      +- SubqueryAlias `default`.`t`
         +- HiveTableRelation `default`.`t`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [key#21]

// For example, in a query `SELECT /*+ BROADCAST(t) */ * FROM db1.t JOIN t`,
// the broadcast hint will match both tables, `db1.t` and `t`.
//
// 2. they match if an identifier in a hint has two parts and it is the same with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the identifier has more than 2 parts like cata.ns1.ns2.tbl ? How about we define a simple rule: If identInHint is a tail of identInQuery?

Copy link
Member Author

@maropu maropu Mar 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that looks nice. I'll update.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about the latest update?

@SparkQA
Copy link

SparkQA commented Mar 18, 2020

Test build #119987 has finished for PR 27935 at commit ac19ea1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 19, 2020

Test build #120012 has finished for PR 27935 at commit 5a0b4ed.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 19, 2020

Test build #120011 has finished for PR 27935 at commit 4628aa0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 19, 2020

Test build #120013 has finished for PR 27935 at commit 70c994a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


package org.apache.spark.sql.catalyst

trait BaseIdentifier {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we add a base trait?

Copy link
Member Author

@maropu maropu Mar 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, we can remove it now. I'll update.

plan: LogicalPlan,
relations: mutable.HashSet[String],
relationsInHint: Seq[Seq[String]],
appliedRelations: mutable.ArrayBuffer[Seq[String]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't care which relations are matched, but which relation name specified by hint does not have a match.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about relationsInHintWithMatch: mutable.HashSet[Seq[String]]

and in code

relationsInHint.find(matchedIdentifier(_, ident)).map { relation =>
  relationsInHintWithMatch += relation
  plan with hint applied
}.getOrElse {
  originalPlan
}

Copy link
Member Author

@maropu maropu Mar 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, without this variable, how do we track non-used hints for the error report in hintErrorHandler.hintRelationsNotFound?

Copy link
Member Author

@maropu maropu Mar 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated in the latest commit.

if relations.exists(resolver(_, ident.last)) =>
relations.remove(ident.last)
if relationsInHint.exists(matchedIdentifier(_, ident)) =>
relationsInHintWithMatch += ident
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here is, ident is the actual relation name, not the relation name in the hint. This forces us to do an extra case insensitive match in https://github.com/apache/spark/pull/27935/files#diff-746a6d090224c7cfbe15daa27fa27408R163

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Urr, I see. I'll fix that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since your suggested code above broke some existing tests, I modified it a little based on that. Updated in the latest commit.

}
}

test("broadcast hint on temp view") {
Copy link
Contributor

@cloud-fan cloud-fan Mar 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this test suite is only for permanent views. Maybe we can put everything in one test in https://github.com/apache/spark/pull/27935/files#diff-fa1d044f9cfe587e27866393fe18fd46R329

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this test into DataFrameJoinSuite.

Copy link
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except one comment

@SparkQA
Copy link

SparkQA commented Mar 19, 2020

Test build #120043 has finished for PR 27935 at commit f07613d.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 19, 2020

Test build #120052 has finished for PR 27935 at commit 6528dad.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 19, 2020

Test build #120056 has finished for PR 27935 at commit c517c1f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val plan = sql(s"SELECT * FROM $dbName.$table1Name, $dbName.$table2Name " +
s"WHERE $table1Name.id = $table2Name.id")
.queryExecution.executedPlan
assert(plan.collect { case p: BroadcastHashJoinExec => p }.isEmpty)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkIfHintNotApplied?

-   .queryExecution.executedPlan
- assert(plan.collect { case p: BroadcastHashJoinExec => p }.isEmpty)
+ checkIfHintNotApplied(plan)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah... I forgot to remove the old tests... I can remove it. Thanks!

checkIfHintApplied(sqlTemplate(s"$dbName.$table1Name", s"$dbName.$table1Name"))
checkIfHintApplied(sqlTemplate(s"$dbName.$table1Name", table1Name))
checkIfHintNotApplied(sqlTemplate(table1Name, s"$dbName.$table1Name"))
checkIfHintNotApplied(sqlTemplate(s"$dbName.$table1Name", s"$dbName.$table1Name.id"))
Copy link
Member

@dongjoon-hyun dongjoon-hyun Mar 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is $dbName.$table1Name.id used as a negative value for hintTableName? It's a little confusing because there is a catalog concept. For three fields, can we use catalog instead of id?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I will modify id -> spark_catalog.

withTempView("tv") {
sql(s"CREATE TEMPORARY VIEW tv AS SELECT * FROM $dbName.$table1Name")
checkIfHintApplied(sqlTemplate("tv", "tv"))
checkIfHintNotApplied(sqlTemplate("tv", "default.tv"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be misleading. Technically, this is the same with sqlTemplate("tv", "non_exist") because we cannot use a database qualifier for Temporary View.

scala> sql("select * from default.tv").show
org.apache.spark.sql.AnalysisException: Table or view not found: default.tv; line 1 pos 14;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, but I think a query with a hint having a non-existent relation identifier should work?;


scala> sql("create table t1 (id int)")
scala> sql("create table t2 (id int)")
scala> sql("create temporary view tv as select * from t1")
scala> sql("SELECT /*+ BROADCASTJOIN(default.non_exist) */ * FROM tv, t2 WHERE tv.id = t2.id").explain(true)
20/03/20 07:34:02 WARN HintErrorLogger: Count not find relation 'default.non_exist' specified in hint 'BROADCASTJOIN(default.non_exist)'.
== Parsed Logical Plan ==
'UnresolvedHint BROADCASTJOIN, ['default.non_exist]
+- 'Project [*]
   +- 'Filter ('tv.id = 't2.id)
      +- 'Join Inner
         :- 'UnresolvedRelation [tv]
         +- 'UnresolvedRelation [t2]

== Analyzed Logical Plan ==
id: int, id: int
Project [id#0, id#1]
+- Filter (id#0 = id#1)
   +- Join Inner
      :- SubqueryAlias tv
      :  +- Project [id#0]
      :     +- SubqueryAlias spark_catalog.default.t1
      :        +- Relation[id#0] parquet
      +- SubqueryAlias spark_catalog.default.t2
         +- Relation[id#1] parquet

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will modify the tests a little.

//
// For example,
// * in a query `SELECT /*+ BROADCAST(t) */ * FROM db1.t JOIN t`,
// the broadcast hint will match both tables, `db1.t` and `t`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- the broadcast hint will match both tables, `db1.t` and `t`.
+ the broadcast hint will match both tables, `db1.t` and `t`, even when the current db is `db2`.

// local temp table (single-part identifier case)
checkAnalysis(
UnresolvedHint("MAPJOIN", Seq("table", "table2"),
table("table").join(table("table2"))),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following will be better because this is a caseSensitive = false test case.

- table("table").join(table("table2"))),
+ table("TaBlE").join(table("TaBlE2"))),

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, yes!

@maropu
Copy link
Member Author

maropu commented Mar 19, 2020

Thanks for the reviews, @dongjoon-hyun! I've updated, so could you check the latest commit?

val plan = sql(s"SELECT * FROM $dbName.$table1Name, $dbName.$table2Name " +
s"WHERE $table1Name.id = $table2Name.id")
.queryExecution.executedPlan
assert(plan.collect { case p: BroadcastHashJoinExec => p }.isEmpty)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, is this pre-testing removed completely?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I think the current test coverage is good enough.

@SparkQA
Copy link

SparkQA commented Mar 20, 2020

Test build #120074 has finished for PR 27935 at commit 61e4a95.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@dongjoon-hyun
Copy link
Member

Merged to master/3.0. Thank you, @maropu and @cloud-fan .

dongjoon-hyun pushed a commit that referenced this pull request Mar 20, 2020
… resolution

### What changes were proposed in this pull request?

This pr fixed code to respect a database name for broadcast table hint resolution.
Currently, spark ignores a database name in multi-part names;
```
scala> sql("CREATE DATABASE testDb")
scala> spark.range(10).write.saveAsTable("testDb.t")

// without this patch
scala> spark.range(10).join(spark.table("testDb.t"), "id").hint("broadcast", "testDb.t").explain
== Physical Plan ==
*(2) Project [id#24L]
+- *(2) BroadcastHashJoin [id#24L], [id#26L], Inner, BuildLeft
   :- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]))
   :  +- *(1) Range (0, 10, step=1, splits=4)
   +- *(2) Project [id#26L]
      +- *(2) Filter isnotnull(id#26L)
         +- *(2) FileScan parquet testdb.t[id#26L] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/Users/maropu/Repositories/spark/spark-2.3.1-bin-hadoop2.7/spark-warehouse..., PartitionFilters: [], PushedFilters: [IsNotNull(id)], ReadSchema: struct<id:bigint>

// with this patch
scala> spark.range(10).join(spark.table("testDb.t"), "id").hint("broadcast", "testDb.t").explain
== Physical Plan ==
*(2) Project [id#3L]
+- *(2) BroadcastHashJoin [id#3L], [id#5L], Inner, BuildRight
   :- *(2) Range (0, 10, step=1, splits=4)
   +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, true]))
      +- *(1) Project [id#5L]
         +- *(1) Filter isnotnull(id#5L)
            +- *(1) FileScan parquet testdb.t[id#5L] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/Users/maropu/Repositories/spark/spark-master/spark-warehouse/testdb.db/t], PartitionFilters: [], PushedFilters: [IsNotNull(id)], ReadSchema: struct<id:bigint>
```

This PR comes from #22198

### Why are the changes needed?

For better usability.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Added unit tests.

Closes #27935 from maropu/SPARK-25121-2.

Authored-by: Takeshi Yamamuro <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit ca499e9)
Signed-off-by: Dongjoon Hyun <[email protected]>
@maropu
Copy link
Member Author

maropu commented Mar 20, 2020

Thanks for the reviews, @dongjoon-hyun @cloud-fan !

@gatorsmile
Copy link
Member

@maropu Could you fix the title and PR description? Add the test cases for the other join hints? Thanks!

@maropu
Copy link
Member Author

maropu commented Mar 25, 2020

Sure, I'll do that. Thanks!

@maropu maropu changed the title [SPARK-25121][SQL] Supports multi-part table names for broadcast hint resolution [SPARK-25121][SQL] Supports multi-part relation names for join strategy hint resolution Mar 25, 2020
dongjoon-hyun pushed a commit that referenced this pull request Mar 25, 2020
…ifiers in join strategy hints

### What changes were proposed in this pull request?

This pr intends to add unit tests for the other join hints (`MERGEJOIN`, `SHUFFLE_HASH`, and `SHUFFLE_REPLICATE_NL`). This is a followup PR of #27935.

### Why are the changes needed?

For better test coverage.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Added unit tests.

Closes #28013 from maropu/SPARK-25121-FOLLOWUP.

Authored-by: Takeshi Yamamuro <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun pushed a commit that referenced this pull request Mar 25, 2020
…ifiers in join strategy hints

### What changes were proposed in this pull request?

This pr intends to add unit tests for the other join hints (`MERGEJOIN`, `SHUFFLE_HASH`, and `SHUFFLE_REPLICATE_NL`). This is a followup PR of #27935.

### Why are the changes needed?

For better test coverage.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Added unit tests.

Closes #28013 from maropu/SPARK-25121-FOLLOWUP.

Authored-by: Takeshi Yamamuro <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit da49f50)
Signed-off-by: Dongjoon Hyun <[email protected]>
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
… resolution

### What changes were proposed in this pull request?

This pr fixed code to respect a database name for broadcast table hint resolution.
Currently, spark ignores a database name in multi-part names;
```
scala> sql("CREATE DATABASE testDb")
scala> spark.range(10).write.saveAsTable("testDb.t")

// without this patch
scala> spark.range(10).join(spark.table("testDb.t"), "id").hint("broadcast", "testDb.t").explain
== Physical Plan ==
*(2) Project [id#24L]
+- *(2) BroadcastHashJoin [id#24L], [id#26L], Inner, BuildLeft
   :- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]))
   :  +- *(1) Range (0, 10, step=1, splits=4)
   +- *(2) Project [id#26L]
      +- *(2) Filter isnotnull(id#26L)
         +- *(2) FileScan parquet testdb.t[id#26L] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/Users/maropu/Repositories/spark/spark-2.3.1-bin-hadoop2.7/spark-warehouse..., PartitionFilters: [], PushedFilters: [IsNotNull(id)], ReadSchema: struct<id:bigint>

// with this patch
scala> spark.range(10).join(spark.table("testDb.t"), "id").hint("broadcast", "testDb.t").explain
== Physical Plan ==
*(2) Project [id#3L]
+- *(2) BroadcastHashJoin [id#3L], [id#5L], Inner, BuildRight
   :- *(2) Range (0, 10, step=1, splits=4)
   +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, true]))
      +- *(1) Project [id#5L]
         +- *(1) Filter isnotnull(id#5L)
            +- *(1) FileScan parquet testdb.t[id#5L] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/Users/maropu/Repositories/spark/spark-master/spark-warehouse/testdb.db/t], PartitionFilters: [], PushedFilters: [IsNotNull(id)], ReadSchema: struct<id:bigint>
```

This PR comes from apache#22198

### Why are the changes needed?

For better usability.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Added unit tests.

Closes apache#27935 from maropu/SPARK-25121-2.

Authored-by: Takeshi Yamamuro <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
…ifiers in join strategy hints

### What changes were proposed in this pull request?

This pr intends to add unit tests for the other join hints (`MERGEJOIN`, `SHUFFLE_HASH`, and `SHUFFLE_REPLICATE_NL`). This is a followup PR of apache#27935.

### Why are the changes needed?

For better test coverage.

### Does this PR introduce any user-facing change?

No.

### How was this patch tested?

Added unit tests.

Closes apache#28013 from maropu/SPARK-25121-FOLLOWUP.

Authored-by: Takeshi Yamamuro <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants