Destroying a broadcasted variable in LBFGS CostFun#5
Merged
AnthonyTruchet merged 1 commit intocriteo-forks:criteo-1.6from Oct 12, 2016
Merged
Conversation
…ctor of model weights is broadcasted in LBFGS/CostFun but is neither re-used nor destroyed afterwards.
AnthonyTruchet
approved these changes
Oct 12, 2016
ashangit
pushed a commit
that referenced
this pull request
Oct 26, 2016
## What changes were proposed in this pull request?
Implements `eval()` method for expression `AssertNotNull` so that we can convert local projection on LocalRelation to another LocalRelation.
### Before change:
```
scala> import org.apache.spark.sql.catalyst.dsl.expressions._
scala> import org.apache.spark.sql.catalyst.expressions.objects.AssertNotNull
scala> import org.apache.spark.sql.Column
scala> case class A(a: Int)
scala> Seq((A(1),2)).toDS().select(new Column(AssertNotNull("_1".attr, Nil))).explain
java.lang.UnsupportedOperationException: Only code-generated evaluation is supported.
at org.apache.spark.sql.catalyst.expressions.objects.AssertNotNull.eval(objects.scala:850)
...
```
### After the change:
```
scala> Seq((A(1),2)).toDS().select(new Column(AssertNotNull("_1".attr, Nil))).explain(true)
== Parsed Logical Plan ==
'Project [assertnotnull('_1) AS assertnotnull(_1)#5]
+- LocalRelation [_1#2, _2#3]
== Analyzed Logical Plan ==
assertnotnull(_1): struct<a:int>
Project [assertnotnull(_1#2) AS assertnotnull(_1)#5]
+- LocalRelation [_1#2, _2#3]
== Optimized Logical Plan ==
LocalRelation [assertnotnull(_1)#5]
== Physical Plan ==
LocalTableScan [assertnotnull(_1)#5]
```
## How was this patch tested?
Unit test.
Author: Sean Zhong <seanzhong@databricks.com>
Closes apache#14486 from clockfly/assertnotnull_eval.
Willymontaz
pushed a commit
to Willymontaz/spark
that referenced
this pull request
Nov 8, 2018
## What changes were proposed in this pull request?
Implements Every, Some, Any aggregates in SQL. These new aggregate expressions are analyzed in normal way and rewritten to equivalent existing aggregate expressions in the optimizer.
Every(x) => Min(x) where x is boolean.
Some(x) => Max(x) where x is boolean.
Any is a synonym for Some.
SQL
```
explain extended select every(v) from test_agg group by k;
```
Plan :
```
== Parsed Logical Plan ==
'Aggregate ['k], [unresolvedalias('every('v), None)]
+- 'UnresolvedRelation `test_agg`
== Analyzed Logical Plan ==
every(v): boolean
Aggregate [k#0], [every(v#1) AS every(v)criteo-forks#5]
+- SubqueryAlias `test_agg`
+- Project [k#0, v#1]
+- SubqueryAlias `test_agg`
+- LocalRelation [k#0, v#1]
== Optimized Logical Plan ==
Aggregate [k#0], [min(v#1) AS every(v)criteo-forks#5]
+- LocalRelation [k#0, v#1]
== Physical Plan ==
*(2) HashAggregate(keys=[k#0], functions=[min(v#1)], output=[every(v)criteo-forks#5])
+- Exchange hashpartitioning(k#0, 200)
+- *(1) HashAggregate(keys=[k#0], functions=[partial_min(v#1)], output=[k#0, min#7])
+- LocalTableScan [k#0, v#1]
Time taken: 0.512 seconds, Fetched 1 row(s)
```
## How was this patch tested?
Added tests in SQLQueryTestSuite, DataframeAggregateSuite
Closes apache#22809 from dilipbiswal/SPARK-19851-specific-rewrite.
Authored-by: Dilip Biswal <dbiswal@us.ibm.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Willymontaz
pushed a commit
to Willymontaz/spark
that referenced
this pull request
Apr 2, 2019
## What changes were proposed in this pull request?
This PR aims to optimize GroupExpressions by removing repeating expressions. `RemoveRepetitionFromGroupExpressions` is added.
**Before**
```scala
scala> sql("select a+1 from values 1,2 T(a) group by a+1, 1+a, A+1, 1+A").explain()
== Physical Plan ==
WholeStageCodegen
: +- TungstenAggregate(key=[(a#0 + 1)criteo-forks#6,(1 + a#0)criteo-forks#7,(A#0 + 1)criteo-forks#8,(1 + A#0)criteo-forks#9], functions=[], output=[(a + 1)criteo-forks#5])
: +- INPUT
+- Exchange hashpartitioning((a#0 + 1)criteo-forks#6, (1 + a#0)criteo-forks#7, (A#0 + 1)criteo-forks#8, (1 + A#0)criteo-forks#9, 200), None
+- WholeStageCodegen
: +- TungstenAggregate(key=[(a#0 + 1) AS (a#0 + 1)criteo-forks#6,(1 + a#0) AS (1 + a#0)criteo-forks#7,(A#0 + 1) AS (A#0 + 1)criteo-forks#8,(1 + A#0) AS (1 + A#0)criteo-forks#9], functions=[], output=[(a#0 + 1)criteo-forks#6,(1 + a#0)criteo-forks#7,(A#0 + 1)criteo-forks#8,(1 + A#0)criteo-forks#9])
: +- INPUT
+- LocalTableScan [a#0], [[1],[2]]
```
**After**
```scala
scala> sql("select a+1 from values 1,2 T(a) group by a+1, 1+a, A+1, 1+A").explain()
== Physical Plan ==
WholeStageCodegen
: +- TungstenAggregate(key=[(a#0 + 1)criteo-forks#6], functions=[], output=[(a + 1)criteo-forks#5])
: +- INPUT
+- Exchange hashpartitioning((a#0 + 1)criteo-forks#6, 200), None
+- WholeStageCodegen
: +- TungstenAggregate(key=[(a#0 + 1) AS (a#0 + 1)criteo-forks#6], functions=[], output=[(a#0 + 1)criteo-forks#6])
: +- INPUT
+- LocalTableScan [a#0], [[1],[2]]
```
## How was this patch tested?
Pass the Jenkins tests (with a new testcase)
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes apache#12590 from dongjoon-hyun/SPARK-14830.
(cherry picked from commit 6e63201)
Signed-off-by: Michael Armbrust <michael@databricks.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Adding a .destroy() call for a broadcasted variable that is neither re-used nor destroyed. This variable contains the vector of learned weights before the current iteration and can be relatively large; moreover these weights are broadcasted on each LBFGS iteration.
How was this patch tested?
Manual tests & unit tests.