Skip to content

Conversation

@mengxr
Copy link

@mengxr mengxr commented Jul 17, 2015

@GeorgeDittmar I updated the implementation with more comments. If it looks good to you, please merge this PR and the changes would show up in your PR.

GeorgeDittmar added a commit that referenced this pull request Jul 18, 2015
@GeorgeDittmar GeorgeDittmar merged commit 3e0a939 into GeorgeDittmar:SPARK-7422 Jul 18, 2015
GeorgeDittmar pushed a commit that referenced this pull request Jul 26, 2015
…into a single batch.

SQL
```
select * from tableA join tableB on (a > 3 and b = d) or (a > 3 and b = e)
```
Plan before modify
```
== Optimized Logical Plan ==
Project [a#293,b#294,c#295,d#296,e#297]
 Join Inner, Some(((a#293 > 3) && ((b#294 = d#296) || (b#294 = e#297))))
  MetastoreRelation default, tablea, None
  MetastoreRelation default, tableb, None
```
Plan after modify
```
== Optimized Logical Plan ==
Project [a#293,b#294,c#295,d#296,e#297]
 Join Inner, Some(((b#294 = d#296) || (b#294 = e#297)))
  Filter (a#293 > 3)
   MetastoreRelation default, tablea, None
  MetastoreRelation default, tableb, None
```

CombineLimits ==> Limit(If(LessThan(ne, le), ne, le), grandChild) and LessThan is in BooleanSimplification ,  so CombineLimits  must before BooleanSimplification and BooleanSimplification must before PushPredicateThroughJoin.

Author: Zhongshuai Pei <[email protected]>
Author: DoingDone9 <[email protected]>

Closes apache#6351 from DoingDone9/master and squashes the following commits:

20de7be [Zhongshuai Pei] Update Optimizer.scala
7bc7d28 [Zhongshuai Pei] Merge pull request apache#17 from apache/master
0ba5f42 [Zhongshuai Pei] Update Optimizer.scala
f8b9314 [Zhongshuai Pei] Update FilterPushdownSuite.scala
c529d9f [Zhongshuai Pei] Update FilterPushdownSuite.scala
ae3af6d [Zhongshuai Pei] Update FilterPushdownSuite.scala
a04ffae [Zhongshuai Pei] Update Optimizer.scala
11beb61 [Zhongshuai Pei] Update FilterPushdownSuite.scala
f2ee5fe [Zhongshuai Pei] Update Optimizer.scala
be6b1d5 [Zhongshuai Pei] Update Optimizer.scala
b01e622 [Zhongshuai Pei] Merge pull request apache#15 from apache/master
8df716a [Zhongshuai Pei] Update FilterPushdownSuite.scala
d98bc35 [Zhongshuai Pei] Update FilterPushdownSuite.scala
fa65718 [Zhongshuai Pei] Update Optimizer.scala
ab8e9a6 [Zhongshuai Pei] Merge pull request apache#14 from apache/master
14952e2 [Zhongshuai Pei] Merge pull request apache#13 from apache/master
f03fe7f [Zhongshuai Pei] Merge pull request apache#12 from apache/master
f12fa50 [Zhongshuai Pei] Merge pull request apache#10 from apache/master
f61210c [Zhongshuai Pei] Merge pull request apache#9 from apache/master
34b1a9a [Zhongshuai Pei] Merge pull request apache#8 from apache/master
802261c [DoingDone9] Merge pull request apache#7 from apache/master
d00303b [DoingDone9] Merge pull request apache#6 from apache/master
98b134f [DoingDone9] Merge pull request apache#5 from apache/master
161cae3 [DoingDone9] Merge pull request apache#4 from apache/master
c87e8b6 [DoingDone9] Merge pull request apache#3 from apache/master
cb1852d [DoingDone9] Merge pull request apache#2 from apache/master
c3f046f [DoingDone9] Merge pull request #1 from apache/master
GeorgeDittmar pushed a commit that referenced this pull request Jul 26, 2015
…columns are found

This PR improves the error message shown when conflicting partition column names are detected.  This can be particularly annoying and confusing when there are a large number of partitions while a handful of them happened to contain unexpected temporary file(s).  Now all suspicious directories are listed as below:

```
java.lang.AssertionError: assertion failed: Conflicting partition column names detected:

        Partition column name list #0: b, c, d
        Partition column name list #1: b, c
        Partition column name list apache#2: b

For partitioned table directories, data files should only live in leaf directories. Please check the following directories for unexpected files:

        file:/tmp/foo/b=0
        file:/tmp/foo/b=1
        file:/tmp/foo/b=1/c=1
        file:/tmp/foo/b=0/c=0
```

Author: Cheng Lian <[email protected]>

Closes apache#6610 from liancheng/part-errmsg and squashes the following commits:

7d05f2c [Cheng Lian] Fixes Scala style issue
a149250 [Cheng Lian] Adds test case for the error message
6b74dd8 [Cheng Lian] Also lists suspicious non-leaf partition directories
a935eb8 [Cheng Lian] Improves error message when conflicting partition columns are found
GeorgeDittmar pushed a commit that referenced this pull request Jul 26, 2015
… to ShuffleReader

This commit updates the shuffle read path to enable ShuffleReader implementations more control over the deserialization process.

The BlockStoreShuffleFetcher.fetch() method has been renamed to BlockStoreShuffleFetcher.fetchBlockStreams(). Previously, this method returned a record iterator; now, it returns an iterator of (BlockId, InputStream). Deserialization of records is now handled in the ShuffleReader.read() method.

This change creates a cleaner separation of concerns and allows implementations of ShuffleReader more flexibility in how records are retrieved.

Author: Matt Massie <[email protected]>
Author: Kay Ousterhout <[email protected]>

Closes apache#6423 from massie/shuffle-api-cleanup and squashes the following commits:

8b0632c [Matt Massie] Minor Scala style fixes
d0a1b39 [Matt Massie] Merge pull request #1 from kayousterhout/massie_shuffle-api-cleanup
290f1eb [Kay Ousterhout] Added test for HashShuffleReader.read()
5186da0 [Kay Ousterhout] Revert "Add test to ensure HashShuffleReader is freeing resources"
f98a1b9 [Matt Massie] Add test to ensure HashShuffleReader is freeing resources
a011bfa [Matt Massie] Use PrivateMethodTester on check that delegate stream is closed
4ea1712 [Matt Massie] Small code cleanup for readability
7429a98 [Matt Massie] Update tests to check that BufferReleasingStream is closing delegate InputStream
f458489 [Matt Massie] Remove unnecessary map() on return Iterator
4abb855 [Matt Massie] Consolidate metric code. Make it clear why InterrubtibleIterator is needed.
5c30405 [Matt Massie] Return visibility of BlockStoreShuffleFetcher to private[hash]
7eedd1d [Matt Massie] Small Scala import cleanup
28f8085 [Matt Massie] Small import nit
f93841e [Matt Massie] Update shuffle read metrics in ShuffleReader instead of BlockStoreShuffleFetcher.
7e8e0fe [Matt Massie] Minor Scala style fixes
01e8721 [Matt Massie] Explicitly cast iterator in branches for type clarity
7c8f73e [Matt Massie] Close Block InputStream immediately after all records are read
208b7a5 [Matt Massie] Small code style changes
b70c945 [Matt Massie] Make BlockStoreShuffleFetcher visible to shuffle package
19135f2 [Matt Massie] [SPARK-7884] Allow Spark shuffle APIs to be more customizable
GeorgeDittmar pushed a commit that referenced this pull request Jul 26, 2015
Fix for incorrect memory in Spark UI as per SPARK-5768

Author: Joshi <[email protected]>
Author: Rekha Joshi <[email protected]>

Closes apache#6972 from rekhajoshm/SPARK-5768 and squashes the following commits:

b678a91 [Joshi] Fix for incorrect memory in Spark UI
2fe53d9 [Joshi] Fix for incorrect memory in Spark UI
eb823b8 [Joshi] SPARK-5768: Fix for incorrect memory in Spark UI
0be142d [Rekha Joshi] Merge pull request apache#3 from apache/master
106fd8e [Rekha Joshi] Merge pull request apache#2 from apache/master
e3677c9 [Rekha Joshi] Merge pull request #1 from apache/master
GeorgeDittmar pushed a commit that referenced this pull request Jul 26, 2015
… without side effects.

Fix for SparkContext stop behavior - Allow sc.stop() to be called multiple times without side effects.

Author: Joshi <[email protected]>
Author: Rekha Joshi <[email protected]>

Closes apache#6973 from rekhajoshm/SPARK-2645 and squashes the following commits:

277043e [Joshi] Fix for SparkContext stop behavior
446b0a4 [Joshi] Fix for SparkContext stop behavior
2ce5760 [Joshi] Fix for SparkContext stop behavior
c97839a [Joshi] Fix for SparkContext stop behavior
1aff39c [Joshi] Fix for SparkContext stop behavior
12f66b5 [Joshi] Fix for SparkContext stop behavior
72bb484 [Joshi] Fix for SparkContext stop behavior
a5a7d7f [Joshi] Fix for SparkContext stop behavior
9193a0c [Joshi] Fix for SparkContext stop behavior
58dba70 [Joshi] SPARK-2645: Fix for SparkContext stop behavior
380c5b0 [Joshi] SPARK-2645: Fix for SparkContext stop behavior
b566b66 [Joshi] SPARK-2645: Fix for SparkContext stop behavior
0be142d [Rekha Joshi] Merge pull request apache#3 from apache/master
106fd8e [Rekha Joshi] Merge pull request apache#2 from apache/master
e3677c9 [Rekha Joshi] Merge pull request #1 from apache/master
GeorgeDittmar pushed a commit that referenced this pull request Jul 26, 2015
…nfo if needed

Author: Joshi <[email protected]>
Author: Rekha Joshi <[email protected]>

Closes apache#5992 from rekhajoshm/fix/SPARK-7137 and squashes the following commits:

8c42b57 [Joshi] update checkInputColumn to print more info if needed
33ddd2e [Joshi] update checkInputColumn to print more info if needed
acf3e17 [Joshi] update checkInputColumn to print more info if needed
8993c0e [Joshi] SPARK-7137: Add checkInputColumn back to Params and print more info
e3677c9 [Rekha Joshi] Merge pull request #1 from apache/master
GeorgeDittmar pushed a commit that referenced this pull request Jul 26, 2015
…mands

This will allow problems with piped commands to be detected.
This will also allow tasks to be retried where errors are rare (such as network problems in piped commands).

Author: Scott Taylor <[email protected]>

Closes apache#6262 from megatron-me-uk/patch-2 and squashes the following commits:

04ae1d5 [Scott Taylor] Remove spurious empty line
98fa101 [Scott Taylor] fix blank line style error
574b564 [Scott Taylor] Merge pull request apache#2 from megatron-me-uk/patch-4
0c1e762 [Scott Taylor] Update rdd pipe method for checkCode
ab9a2e1 [Scott Taylor] Update rdd pipe tests for checkCode
eb4801c [Scott Taylor] fix fail_condition
b0ac3a4 [Scott Taylor] Merge pull request #1 from megatron-me-uk/megatron-me-uk-patch-1
a307d13 [Scott Taylor] update rdd tests to test pipe modes
34fcdc3 [Scott Taylor] add optional argument 'mode' for rdd.pipe
a0c0161 [Scott Taylor] fix generator issue
8a9ef9c [Scott Taylor] make check_return_code an iterator
0486ae3 [Scott Taylor] style fixes
8ed89a6 [Scott Taylor] Chain generators to prevent potential deadlock
4153b02 [Scott Taylor] fix list.sort returns None
491d3fc [Scott Taylor] Pass a function handle to assertRaises
3344a21 [Scott Taylor] wrap assertRaises with QuietTest
3ab8c7a [Scott Taylor] remove whitespace for style
cc1a73d [Scott Taylor] fix style issues in pipe test
8db4073 [Scott Taylor] Add a test for rdd pipe functions
1b3dc4e [Scott Taylor] fix missing space around operator style
0974f98 [Scott Taylor] add space between words in multiline string
45f4977 [Scott Taylor] fix line too long style error
5745d85 [Scott Taylor] Remove space to fix style
f552d49 [Scott Taylor] Catch non-zero exit from pipe commands
GeorgeDittmar pushed a commit that referenced this pull request Jul 26, 2015
Improves the performance of LocalPrefixSpan by implementing optimizations proposed in [SPARK-8997](https://issues.apache.org/jira/browse/SPARK-8997)

Author: Feynman Liang <[email protected]>
Author: Feynman Liang <[email protected]>
Author: Xiangrui Meng <[email protected]>

Closes apache#7360 from feynmanliang/SPARK-8997-improve-prefixspan and squashes the following commits:

59db2f5 [Feynman Liang] Merge pull request #1 from mengxr/SPARK-8997
91e4357 [Xiangrui Meng] update LocalPrefixSpan impl
9212256 [Feynman Liang] MengXR code review comments
f055d82 [Feynman Liang] Fix failing scalatest
2e00cba [Feynman Liang] Depth first projections
70b93e3 [Feynman Liang] Performance improvements in LocalPrefixSpan, fix tests
GeorgeDittmar pushed a commit that referenced this pull request Jul 26, 2015
This makes sure attempts are listed in the order they were executed, and that the
app's state matches the state of the most current attempt.

Author: Joshi <[email protected]>
Author: Rekha Joshi <[email protected]>

Closes apache#7253 from rekhajoshm/SPARK-8593 and squashes the following commits:

874dd80 [Joshi] History Server: updated order for multiple attempts(logcleaner)
716e0b1 [Joshi] History Server: updated order for multiple attempts(descending start time works everytime)
548c753 [Joshi] History Server: updated order for multiple attempts(descending start time works everytime)
83306a8 [Joshi] History Server: updated order for multiple attempts(descending start time)
b0fc922 [Joshi] History Server: updated order for multiple attempts(updated comment)
cc0fda7 [Joshi] History Server: updated order for multiple attempts(updated test)
304cb0b [Joshi] History Server: updated order for multiple attempts(reverted HistoryPage)
85024e8 [Joshi] History Server: updated order for multiple attempts
a41ac4b [Joshi] History Server: updated order for multiple attempts
ab65fa1 [Joshi] History Server: some attempt completed to work with showIncomplete
0be142d [Rekha Joshi] Merge pull request apache#3 from apache/master
106fd8e [Rekha Joshi] Merge pull request apache#2 from apache/master
e3677c9 [Rekha Joshi] Merge pull request #1 from apache/master
GeorgeDittmar pushed a commit that referenced this pull request Jul 26, 2015
Implement IntArrayParam in mllib

Author: Rekha Joshi <[email protected]>
Author: Joshi <[email protected]>

Closes apache#7481 from rekhajoshm/SPARK-9118 and squashes the following commits:

d3b1766 [Joshi] Implement IntArrayParam
0be142d [Rekha Joshi] Merge pull request apache#3 from apache/master
106fd8e [Rekha Joshi] Merge pull request apache#2 from apache/master
e3677c9 [Rekha Joshi] Merge pull request #1 from apache/master
GeorgeDittmar added a commit that referenced this pull request Jul 26, 2015
Modifying Vector, DenseVector, and SparseVector to implement argmax functionality. This work is to set the stage for changes to be done in Spark-7423.

Author: George Dittmar <[email protected]>
Author: George <[email protected]>
Author: dittmarg <[email protected]>
Author: Xiangrui Meng <[email protected]>

Closes apache#6112 from GeorgeDittmar/SPARK-7422 and squashes the following commits:

3e0a939 [George Dittmar] Merge pull request #1 from mengxr/SPARK-7422
127dec5 [Xiangrui Meng] update argmax impl
2ea6a55 [George Dittmar] Added MimaExcludes for Vectors.argmax
98058f4 [George Dittmar] Merge branch 'master' of github.com:apache/spark into SPARK-7422
5fd9380 [George Dittmar] fixing style check error
42341fb [George Dittmar] refactoring arg max check to better handle zero values
b22af46 [George Dittmar] Fixing spaces between commas in unit test
f2eba2f [George Dittmar] Cleaning up unit tests to be fewer lines
aa330e3 [George Dittmar] Fixing some last if else spacing issues
ac53c55 [George Dittmar] changing dense vector argmax unit test to be one line call vs 2
d5b5423 [George Dittmar] Fixing code style and updating if logic on when to check for zero values
ee1a85a [George Dittmar] Cleaning up unit tests a bit and modifying a few cases
3ee8711 [George Dittmar] Fixing corner case issue with zeros in the active values of the sparse vector. Updated unit tests
b1f059f [George Dittmar] Added comment before we start arg max calculation. Updated unit tests to cover corner cases
f21dcce [George Dittmar] commit
af17981 [dittmarg] Initial work fixing bug that was made clear in pr
eeda560 [George] Fixing SparseVector argmax function to ignore zero values while doing the calculation.
4526acc [George] Merge branch 'master' of github.com:apache/spark into SPARK-7422
df9538a [George] Added argmax to sparse vector and added unit test
3cffed4 [George] Adding unit tests for argmax functions for Dense and Sparse vectors
04677af [George] initial work on adding argmax to Vector and SparseVector
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants