[SPARK-18119][SPARK-CORE] Namenode safemode check is only performed on one namenode which can stuck the startup of SparkHistory server by ashangit · Pull Request #8 · criteo-forks/spark

ashangit · 2016-10-26T21:21:49Z

What changes were proposed in this pull request?

Instead of using the setSafeMode method that check the first namenode used the one which permitts to check only for active NNs

How was this patch tested?

manual tests

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

This commit is contributed by Criteo SA under the Apache v2 licence.

…n one namenode which can stuck the startup of SparkHistory server This commit is contributed by Criteo SA under the Apache v2 licence.

ashangit · 2016-10-26T21:23:52Z

Here is the pull request for apache master branch: apache#15648
Will see then how to do the same for 1.6 branch

## What changes were proposed in this pull request? This PR aims to optimize GroupExpressions by removing repeating expressions. `RemoveRepetitionFromGroupExpressions` is added. **Before** ```scala scala> sql("select a+1 from values 1,2 T(a) group by a+1, 1+a, A+1, 1+A").explain() == Physical Plan == WholeStageCodegen : +- TungstenAggregate(key=[(a#0 + 1)criteo-forks#6,(1 + a#0)criteo-forks#7,(A#0 + 1)criteo-forks#8,(1 + A#0)criteo-forks#9], functions=[], output=[(a + 1)criteo-forks#5]) : +- INPUT +- Exchange hashpartitioning((a#0 + 1)criteo-forks#6, (1 + a#0)criteo-forks#7, (A#0 + 1)criteo-forks#8, (1 + A#0)criteo-forks#9, 200), None +- WholeStageCodegen : +- TungstenAggregate(key=[(a#0 + 1) AS (a#0 + 1)criteo-forks#6,(1 + a#0) AS (1 + a#0)criteo-forks#7,(A#0 + 1) AS (A#0 + 1)criteo-forks#8,(1 + A#0) AS (1 + A#0)criteo-forks#9], functions=[], output=[(a#0 + 1)criteo-forks#6,(1 + a#0)criteo-forks#7,(A#0 + 1)criteo-forks#8,(1 + A#0)criteo-forks#9]) : +- INPUT +- LocalTableScan [a#0], [[1],[2]] ``` **After** ```scala scala> sql("select a+1 from values 1,2 T(a) group by a+1, 1+a, A+1, 1+A").explain() == Physical Plan == WholeStageCodegen : +- TungstenAggregate(key=[(a#0 + 1)criteo-forks#6], functions=[], output=[(a + 1)criteo-forks#5]) : +- INPUT +- Exchange hashpartitioning((a#0 + 1)criteo-forks#6, 200), None +- WholeStageCodegen : +- TungstenAggregate(key=[(a#0 + 1) AS (a#0 + 1)criteo-forks#6], functions=[], output=[(a#0 + 1)criteo-forks#6]) : +- INPUT +- LocalTableScan [a#0], [[1],[2]] ``` ## How was this patch tested? Pass the Jenkins tests (with a new testcase) Author: Dongjoon Hyun <dongjoon@apache.org> Closes apache#12590 from dongjoon-hyun/SPARK-14830. (cherry picked from commit 6e63201) Signed-off-by: Michael Armbrust <michael@databricks.com>

…plan properly ### What changes were proposed in this pull request? Make `ResolveRelations` handle plan id properly cherry-pick bugfix apache#45214 to 3.5 ### Why are the changes needed? bug fix for Spark Connect, it won't affect classic Spark SQL before this PR: ``` from pyspark.sql import functions as sf spark.range(10).withColumn("value_1", sf.lit(1)).write.saveAsTable("test_table_1") spark.range(10).withColumnRenamed("id", "index").withColumn("value_2", sf.lit(2)).write.saveAsTable("test_table_2") df1 = spark.read.table("test_table_1") df2 = spark.read.table("test_table_2") df3 = spark.read.table("test_table_1") join1 = df1.join(df2, on=df1.id==df2.index).select(df2.index, df2.value_2) join2 = df3.join(join1, how="left", on=join1.index==df3.id) join2.schema ``` fails with ``` AnalysisException: [CANNOT_RESOLVE_DATAFRAME_COLUMN] Cannot resolve dataframe column "id". It's probably because of illegal references like `df1.select(df2.col("a"))`. SQLSTATE: 42704 ``` That is due to existing plan caching in `ResolveRelations` doesn't work with Spark Connect ``` === Applying Rule org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations === '[#12]Join LeftOuter, '`==`('index, 'id) '[#12]Join LeftOuter, '`==`('index, 'id) !:- '[#9]UnresolvedRelation [test_table_1], [], false :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 !+- '[#11]Project ['index, 'value_2] : +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ! +- '[#10]Join Inner, '`==`('id, 'index) +- '[#11]Project ['index, 'value_2] ! :- '[#7]UnresolvedRelation [test_table_1], [], false +- '[#10]Join Inner, '`==`('id, 'index) ! +- '[#8]UnresolvedRelation [test_table_2], [], false :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 ! : +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ! +- '[#8]SubqueryAlias spark_catalog.default.test_table_2 ! +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_2`, [], false Can not resolve 'id with plan 7 ``` `[#7]UnresolvedRelation [test_table_1], [], false` was wrongly resolved to the cached one ``` :- '[#9]SubqueryAlias spark_catalog.default.test_table_1 +- 'UnresolvedCatalogRelation `spark_catalog`.`default`.`test_table_1`, [], false ``` ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? added ut ### Was this patch authored or co-authored using generative AI tooling? ci Closes apache#46291 from zhengruifeng/connect_fix_read_join_35. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>

[SPARK-18119][SPARK-CORE] Namenode safemode check is only performed o…

e81c862

…n one namenode which can stuck the startup of SparkHistory server This commit is contributed by Criteo SA under the Apache v2 licence.

AnthonyTruchet approved these changes Nov 21, 2016

View reviewed changes

ashangit merged commit bed230b into criteo-forks:criteo-1.6 Nov 24, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-18119][SPARK-CORE] Namenode safemode check is only performed on one namenode which can stuck the startup of SparkHistory server#8

[SPARK-18119][SPARK-CORE] Namenode safemode check is only performed on one namenode which can stuck the startup of SparkHistory server#8
ashangit merged 1 commit intocriteo-forks:criteo-1.6from
ashangit:criteo-1.6

ashangit commented Oct 26, 2016

Uh oh!

ashangit commented Oct 26, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ashangit commented Oct 26, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

ashangit commented Oct 26, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants