Skip to content

Destroying a broadcasted variable in LBFGS CostFun#5

Merged
AnthonyTruchet merged 1 commit intocriteo-forks:criteo-1.6from
eugene-kharitonov:criteo-1.6
Oct 12, 2016
Merged

Destroying a broadcasted variable in LBFGS CostFun#5
AnthonyTruchet merged 1 commit intocriteo-forks:criteo-1.6from
eugene-kharitonov:criteo-1.6

Conversation

@eugene-kharitonov
Copy link
Copy Markdown

What changes were proposed in this pull request?

Adding a .destroy() call for a broadcasted variable that is neither re-used nor destroyed. This variable contains the vector of learned weights before the current iteration and can be relatively large; moreover these weights are broadcasted on each LBFGS iteration.

How was this patch tested?

Manual tests & unit tests.

…ctor of model weights is broadcasted in LBFGS/CostFun but is neither re-used nor destroyed afterwards.
@AnthonyTruchet AnthonyTruchet merged commit efe6ea3 into criteo-forks:criteo-1.6 Oct 12, 2016
ashangit pushed a commit that referenced this pull request Oct 26, 2016
## What changes were proposed in this pull request?

Implements `eval()` method for expression `AssertNotNull` so that we can convert local projection on LocalRelation to another LocalRelation.

### Before change:
```
scala> import org.apache.spark.sql.catalyst.dsl.expressions._
scala> import org.apache.spark.sql.catalyst.expressions.objects.AssertNotNull
scala> import org.apache.spark.sql.Column
scala> case class A(a: Int)
scala> Seq((A(1),2)).toDS().select(new Column(AssertNotNull("_1".attr, Nil))).explain

java.lang.UnsupportedOperationException: Only code-generated evaluation is supported.
  at org.apache.spark.sql.catalyst.expressions.objects.AssertNotNull.eval(objects.scala:850)
  ...
```

### After the change:
```
scala> Seq((A(1),2)).toDS().select(new Column(AssertNotNull("_1".attr, Nil))).explain(true)

== Parsed Logical Plan ==
'Project [assertnotnull('_1) AS assertnotnull(_1)#5]
+- LocalRelation [_1#2, _2#3]

== Analyzed Logical Plan ==
assertnotnull(_1): struct<a:int>
Project [assertnotnull(_1#2) AS assertnotnull(_1)#5]
+- LocalRelation [_1#2, _2#3]

== Optimized Logical Plan ==
LocalRelation [assertnotnull(_1)#5]

== Physical Plan ==
LocalTableScan [assertnotnull(_1)#5]
```

## How was this patch tested?

Unit test.

Author: Sean Zhong <seanzhong@databricks.com>

Closes apache#14486 from clockfly/assertnotnull_eval.
Willymontaz pushed a commit to Willymontaz/spark that referenced this pull request Nov 8, 2018
## What changes were proposed in this pull request?

Implements Every, Some, Any aggregates in SQL. These new aggregate expressions are analyzed in normal way and rewritten to equivalent existing aggregate expressions in the optimizer.

Every(x) => Min(x)  where x is boolean.
Some(x) => Max(x) where x is boolean.

Any is a synonym for Some.
SQL
```
explain extended select every(v) from test_agg group by k;
```
Plan :
```
== Parsed Logical Plan ==
'Aggregate ['k], [unresolvedalias('every('v), None)]
+- 'UnresolvedRelation `test_agg`

== Analyzed Logical Plan ==
every(v): boolean
Aggregate [k#0], [every(v#1) AS every(v)criteo-forks#5]
+- SubqueryAlias `test_agg`
   +- Project [k#0, v#1]
      +- SubqueryAlias `test_agg`
         +- LocalRelation [k#0, v#1]

== Optimized Logical Plan ==
Aggregate [k#0], [min(v#1) AS every(v)criteo-forks#5]
+- LocalRelation [k#0, v#1]

== Physical Plan ==
*(2) HashAggregate(keys=[k#0], functions=[min(v#1)], output=[every(v)criteo-forks#5])
+- Exchange hashpartitioning(k#0, 200)
   +- *(1) HashAggregate(keys=[k#0], functions=[partial_min(v#1)], output=[k#0, min#7])
      +- LocalTableScan [k#0, v#1]
Time taken: 0.512 seconds, Fetched 1 row(s)
```

## How was this patch tested?
Added tests in SQLQueryTestSuite, DataframeAggregateSuite

Closes apache#22809 from dilipbiswal/SPARK-19851-specific-rewrite.

Authored-by: Dilip Biswal <dbiswal@us.ibm.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Willymontaz pushed a commit to Willymontaz/spark that referenced this pull request Apr 2, 2019
## What changes were proposed in this pull request?

This PR aims to optimize GroupExpressions by removing repeating expressions. `RemoveRepetitionFromGroupExpressions` is added.

**Before**
```scala
scala> sql("select a+1 from values 1,2 T(a) group by a+1, 1+a, A+1, 1+A").explain()
== Physical Plan ==
WholeStageCodegen
:  +- TungstenAggregate(key=[(a#0 + 1)criteo-forks#6,(1 + a#0)criteo-forks#7,(A#0 + 1)criteo-forks#8,(1 + A#0)criteo-forks#9], functions=[], output=[(a + 1)criteo-forks#5])
:     +- INPUT
+- Exchange hashpartitioning((a#0 + 1)criteo-forks#6, (1 + a#0)criteo-forks#7, (A#0 + 1)criteo-forks#8, (1 + A#0)criteo-forks#9, 200), None
   +- WholeStageCodegen
      :  +- TungstenAggregate(key=[(a#0 + 1) AS (a#0 + 1)criteo-forks#6,(1 + a#0) AS (1 + a#0)criteo-forks#7,(A#0 + 1) AS (A#0 + 1)criteo-forks#8,(1 + A#0) AS (1 + A#0)criteo-forks#9], functions=[], output=[(a#0 + 1)criteo-forks#6,(1 + a#0)criteo-forks#7,(A#0 + 1)criteo-forks#8,(1 + A#0)criteo-forks#9])
      :     +- INPUT
      +- LocalTableScan [a#0], [[1],[2]]
```

**After**
```scala
scala> sql("select a+1 from values 1,2 T(a) group by a+1, 1+a, A+1, 1+A").explain()
== Physical Plan ==
WholeStageCodegen
:  +- TungstenAggregate(key=[(a#0 + 1)criteo-forks#6], functions=[], output=[(a + 1)criteo-forks#5])
:     +- INPUT
+- Exchange hashpartitioning((a#0 + 1)criteo-forks#6, 200), None
   +- WholeStageCodegen
      :  +- TungstenAggregate(key=[(a#0 + 1) AS (a#0 + 1)criteo-forks#6], functions=[], output=[(a#0 + 1)criteo-forks#6])
      :     +- INPUT
      +- LocalTableScan [a#0], [[1],[2]]
```

## How was this patch tested?

Pass the Jenkins tests (with a new testcase)

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes apache#12590 from dongjoon-hyun/SPARK-14830.

(cherry picked from commit 6e63201)
Signed-off-by: Michael Armbrust <michael@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants