Skip to content

Commit 8f366f8

Browse files
committed
Merge branch 'master' of git://git.apache.org/spark into dsl-missing-operator
2 parents 1b88e2e + b0a46d8 commit 8f366f8

File tree

18 files changed

+93
-38
lines changed

18 files changed

+93
-38
lines changed

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ dependency-reduced-pom.xml
4949
checkpoint
5050
derby.log
5151
dist/
52-
spark-*-bin.tar.gz
52+
spark-*-bin-*.tgz
5353
unit-tests.log
5454
/lib/
5555
rat-results.txt

core/src/main/scala/org/apache/spark/rdd/PipedRDD.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ private[spark] class PipedRDD[T: ClassTag](
6363

6464
/**
6565
* A FilenameFilter that accepts anything that isn't equal to the name passed in.
66-
* @param name of file or directory to leave out
66+
* @param filterName of file or directory to leave out
6767
*/
6868
class NotEqualsFileNameFilter(filterName: String) extends FilenameFilter {
6969
def accept(dir: File, name: String): Boolean = {

core/src/main/scala/org/apache/spark/util/AkkaUtils.scala

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -134,9 +134,16 @@ private[spark] object AkkaUtils extends Logging {
134134
Duration.create(conf.getLong("spark.akka.lookupTimeout", 30), "seconds")
135135
}
136136

137+
private val AKKA_MAX_FRAME_SIZE_IN_MB = Int.MaxValue / 1024 / 1024
138+
137139
/** Returns the configured max frame size for Akka messages in bytes. */
138140
def maxFrameSizeBytes(conf: SparkConf): Int = {
139-
conf.getInt("spark.akka.frameSize", 10) * 1024 * 1024
141+
val frameSizeInMB = conf.getInt("spark.akka.frameSize", 10)
142+
if (frameSizeInMB > AKKA_MAX_FRAME_SIZE_IN_MB) {
143+
throw new IllegalArgumentException("spark.akka.frameSize should not be greater than "
144+
+ AKKA_MAX_FRAME_SIZE_IN_MB + "MB")
145+
}
146+
frameSizeInMB * 1024 * 1024
140147
}
141148

142149
/** Space reserved for extra data in an Akka message besides serialized task or task result. */

core/src/test/scala/org/apache/spark/ShuffleSuite.scala

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -270,7 +270,6 @@ object ShuffleSuite {
270270

271271
def mergeCombineException(x: Int, y: Int): Int = {
272272
throw new SparkException("Exception for map-side combine.")
273-
x + y
274273
}
275274

276275
class NonJavaSerializableClass(val value: Int) extends Comparable[NonJavaSerializableClass] {

docs/programming-guide.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -934,6 +934,12 @@ for details.
934934
<td> Reshuffle the data in the RDD randomly to create either more or fewer partitions and balance it across them.
935935
This always shuffles all data over the network. </td>
936936
</tr>
937+
<tr>
938+
<td> <b>repartitionAndSortWithinPartitions</b>(<i>partitioner</i>) </td>
939+
<td> Repartition the RDD according to the given partitioner and, within each resulting partition,
940+
sort records by their keys. This is more efficient than calling <code>repartition</code> and then sorting within
941+
each partition because it can push the sorting down into the shuffle machinery. </td>
942+
</tr>
937943
</table>
938944

939945
### Actions
@@ -1177,7 +1183,7 @@ Accumulators are variables that are only "added" to through an associative opera
11771183
therefore be efficiently supported in parallel. They can be used to implement counters (as in
11781184
MapReduce) or sums. Spark natively supports accumulators of numeric types, and programmers
11791185
can add support for new types. If accumulators are created with a name, they will be
1180-
displayed in Spark's UI. This can can be useful for understanding the progress of
1186+
displayed in Spark's UI. This can be useful for understanding the progress of
11811187
running stages (NOTE: this is not yet supported in Python).
11821188

11831189
An accumulator is created from an initial value `v` by calling `SparkContext.accumulator(v)`. Tasks

docs/sql-programming-guide.md

Lines changed: 41 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -146,7 +146,7 @@ describes the various methods for loading data into a SchemaRDD.
146146

147147
Spark SQL supports two different methods for converting existing RDDs into SchemaRDDs. The first
148148
method uses reflection to infer the schema of an RDD that contains specific types of objects. This
149-
reflection based approach leads to more concise code and works well when you already know the schema
149+
reflection based approach leads to more concise code and works well when you already know the schema
150150
while writing your Spark application.
151151

152152
The second method for creating SchemaRDDs is through a programmatic interface that allows you to
@@ -566,7 +566,7 @@ for teenName in teenNames.collect():
566566

567567
### Configuration
568568

569-
Configuration of Parquet can be done using the `setConf` method on SQLContext or by running
569+
Configuration of Parquet can be done using the `setConf` method on SQLContext or by running
570570
`SET key=value` commands using SQL.
571571

572572
<table class="table">
@@ -575,8 +575,8 @@ Configuration of Parquet can be done using the `setConf` method on SQLContext or
575575
<td><code>spark.sql.parquet.binaryAsString</code></td>
576576
<td>false</td>
577577
<td>
578-
Some other Parquet-producing systems, in particular Impala and older versions of Spark SQL, do
579-
not differentiate between binary data and strings when writing out the Parquet schema. This
578+
Some other Parquet-producing systems, in particular Impala and older versions of Spark SQL, do
579+
not differentiate between binary data and strings when writing out the Parquet schema. This
580580
flag tells Spark SQL to interpret binary data as a string to provide compatibility with these systems.
581581
</td>
582582
</tr>
@@ -591,10 +591,20 @@ Configuration of Parquet can be done using the `setConf` method on SQLContext or
591591
<td><code>spark.sql.parquet.compression.codec</code></td>
592592
<td>gzip</td>
593593
<td>
594-
Sets the compression codec use when writing Parquet files. Acceptable values include:
594+
Sets the compression codec use when writing Parquet files. Acceptable values include:
595595
uncompressed, snappy, gzip, lzo.
596596
</td>
597597
</tr>
598+
<tr>
599+
<td><code>spark.sql.parquet.filterPushdown</code></td>
600+
<td>false</td>
601+
<td>
602+
Turn on Parquet filter pushdown optimization. This feature is turned off by default because of a known
603+
bug in Paruet 1.6.0rc3 (<a href="https://issues.apache.org/jira/browse/PARQUET-136">PARQUET-136</a>).
604+
However, if your table doesn't contain any nullable string or binary columns, it's still safe to turn
605+
this feature on.
606+
</td>
607+
</tr>
598608
<tr>
599609
<td><code>spark.sql.hive.convertMetastoreParquet</code></td>
600610
<td>true</td>
@@ -900,7 +910,6 @@ export HIVE_SERVER2_THRIFT_BIND_HOST=<listening-host>
900910
./sbin/start-thriftserver.sh \
901911
--master <master-uri> \
902912
...
903-
```
904913
{% endhighlight %}
905914

906915
or system properties:
@@ -911,7 +920,6 @@ or system properties:
911920
--hiveconf hive.server2.thrift.bind.host=<listening-host> \
912921
--master <master-uri>
913922
...
914-
```
915923
{% endhighlight %}
916924

917925
Now you can use beeline to test the Thrift JDBC/ODBC server:
@@ -947,7 +955,7 @@ options.
947955

948956
## Migration Guide for Shark User
949957

950-
### Scheduling
958+
### Scheduling
951959
To set a [Fair Scheduler](job-scheduling.html#fair-scheduler-pools) pool for a JDBC client session,
952960
users can set the `spark.sql.thriftserver.scheduler.pool` variable:
953961

@@ -994,7 +1002,7 @@ Several caching related features are not supported yet:
9941002
## Compatibility with Apache Hive
9951003

9961004
Spark SQL is designed to be compatible with the Hive Metastore, SerDes and UDFs. Currently Spark
997-
SQL is based on Hive 0.12.0.
1005+
SQL is based on Hive 0.12.0 and 0.13.1.
9981006

9991007
#### Deploying in Existing Hive Warehouses
10001008

@@ -1033,6 +1041,7 @@ Spark SQL supports the vast majority of Hive features, such as:
10331041
* Sampling
10341042
* Explain
10351043
* Partitioned tables
1044+
* View
10361045
* All Hive DDL Functions, including:
10371046
* `CREATE TABLE`
10381047
* `CREATE TABLE AS SELECT`
@@ -1048,6 +1057,7 @@ Spark SQL supports the vast majority of Hive features, such as:
10481057
* `STRING`
10491058
* `BINARY`
10501059
* `TIMESTAMP`
1060+
* `DATE`
10511061
* `ARRAY<>`
10521062
* `MAP<>`
10531063
* `STRUCT<>`
@@ -1148,6 +1158,7 @@ evaluated by the SQL execution engine. A full list of the functions supported c
11481158
* Datetime type
11491159
- `TimestampType`: Represents values comprising values of fields year, month, day,
11501160
hour, minute, and second.
1161+
- `DateType`: Represents values comprising values of fields year, month, day.
11511162
* Complex types
11521163
- `ArrayType(elementType, containsNull)`: Represents values comprising a sequence of
11531164
elements with the type of `elementType`. `containsNull` is used to indicate if
@@ -1255,6 +1266,13 @@ import org.apache.spark.sql._
12551266
TimestampType
12561267
</td>
12571268
</tr>
1269+
<tr>
1270+
<td> <b>DateType</b> </td>
1271+
<td> java.sql.Date </td>
1272+
<td>
1273+
DateType
1274+
</td>
1275+
</tr>
12581276
<tr>
12591277
<td> <b>ArrayType</b> </td>
12601278
<td> scala.collection.Seq </td>
@@ -1381,6 +1399,13 @@ please use factory methods provided in
13811399
DataType.TimestampType
13821400
</td>
13831401
</tr>
1402+
<tr>
1403+
<td> <b>DateType</b> </td>
1404+
<td> java.sql.Date </td>
1405+
<td>
1406+
DataType.DateType
1407+
</td>
1408+
</tr>
13841409
<tr>
13851410
<td> <b>ArrayType</b> </td>
13861411
<td> java.util.List </td>
@@ -1528,6 +1553,13 @@ from pyspark.sql import *
15281553
TimestampType()
15291554
</td>
15301555
</tr>
1556+
<tr>
1557+
<td> <b>DateType</b> </td>
1558+
<td> datetime.date </td>
1559+
<td>
1560+
DateType()
1561+
</td>
1562+
</tr>
15311563
<tr>
15321564
<td> <b>ArrayType</b> </td>
15331565
<td> list, tuple, or array </td>

examples/src/main/scala/org/apache/spark/examples/sql/hive/HiveFromSpark.scala

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -29,9 +29,10 @@ object HiveFromSpark {
2929
val sc = new SparkContext(sparkConf)
3030
val path = s"${System.getenv("SPARK_HOME")}/examples/src/main/resources/kv1.txt"
3131

32-
// A local hive context creates an instance of the Hive Metastore in process, storing
33-
// the warehouse data in the current directory. This location can be overridden by
34-
// specifying a second parameter to the constructor.
32+
// A hive context adds support for finding tables in the MetaStore and writing queries
33+
// using HiveQL. Users who do not have an existing Hive deployment can still create a
34+
// HiveContext. When not configured by the hive-site.xml, the context automatically
35+
// creates metastore_db and warehouse in the current directory.
3536
val hiveContext = new HiveContext(sc)
3637
import hiveContext._
3738

external/mqtt/pom.xml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -43,8 +43,8 @@
4343
</dependency>
4444
<dependency>
4545
<groupId>org.eclipse.paho</groupId>
46-
<artifactId>mqtt-client</artifactId>
47-
<version>0.4.0</version>
46+
<artifactId>org.eclipse.paho.client.mqttv3</artifactId>
47+
<version>1.0.1</version>
4848
</dependency>
4949
<dependency>
5050
<groupId>org.scalatest</groupId>

make-distribution.sh

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -201,6 +201,9 @@ if [ -e "$FWDIR"/CHANGES.txt ]; then
201201
cp "$FWDIR/CHANGES.txt" "$DISTDIR"
202202
fi
203203

204+
# Copy data files
205+
cp -r "$FWDIR/data" "$DISTDIR"
206+
204207
# Copy other things
205208
mkdir "$DISTDIR"/conf
206209
cp "$FWDIR"/conf/*.template "$DISTDIR"/conf

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SparkSQLParser.scala

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -97,10 +97,10 @@ class SqlLexical(val keywords: Seq[String]) extends StdLexical {
9797

9898
/** Generate all variations of upper and lower case of a given string */
9999
def allCaseVersions(s: String, prefix: String = ""): Stream[String] = {
100-
if (s == "") {
100+
if (s.isEmpty) {
101101
Stream(prefix)
102102
} else {
103-
allCaseVersions(s.tail, prefix + s.head.toLower) ++
103+
allCaseVersions(s.tail, prefix + s.head.toLower) #:::
104104
allCaseVersions(s.tail, prefix + s.head.toUpper)
105105
}
106106
}

0 commit comments

Comments
 (0)