@@ -146,7 +146,7 @@ describes the various methods for loading data into a SchemaRDD.
146146
147147Spark SQL supports two different methods for converting existing RDDs into SchemaRDDs. The first
148148method uses reflection to infer the schema of an RDD that contains specific types of objects. This
149- reflection based approach leads to more concise code and works well when you already know the schema
149+ reflection based approach leads to more concise code and works well when you already know the schema
150150while writing your Spark application.
151151
152152The second method for creating SchemaRDDs is through a programmatic interface that allows you to
@@ -566,7 +566,7 @@ for teenName in teenNames.collect():
566566
567567### Configuration
568568
569- Configuration of Parquet can be done using the ` setConf ` method on SQLContext or by running
569+ Configuration of Parquet can be done using the ` setConf ` method on SQLContext or by running
570570` SET key=value ` commands using SQL.
571571
572572<table class =" table " >
@@ -575,8 +575,8 @@ Configuration of Parquet can be done using the `setConf` method on SQLContext or
575575 <td ><code >spark.sql.parquet.binaryAsString</code ></td >
576576 <td >false</td >
577577 <td >
578- Some other Parquet-producing systems, in particular Impala and older versions of Spark SQL, do
579- not differentiate between binary data and strings when writing out the Parquet schema. This
578+ Some other Parquet-producing systems, in particular Impala and older versions of Spark SQL, do
579+ not differentiate between binary data and strings when writing out the Parquet schema. This
580580 flag tells Spark SQL to interpret binary data as a string to provide compatibility with these systems.
581581 </td >
582582</tr >
@@ -591,10 +591,20 @@ Configuration of Parquet can be done using the `setConf` method on SQLContext or
591591 <td ><code >spark.sql.parquet.compression.codec</code ></td >
592592 <td >gzip</td >
593593 <td >
594- Sets the compression codec use when writing Parquet files. Acceptable values include:
594+ Sets the compression codec use when writing Parquet files. Acceptable values include:
595595 uncompressed, snappy, gzip, lzo.
596596 </td >
597597</tr >
598+ <tr >
599+ <td ><code >spark.sql.parquet.filterPushdown</code ></td >
600+ <td >false</td >
601+ <td >
602+ Turn on Parquet filter pushdown optimization. This feature is turned off by default because of a known
603+ bug in Paruet 1.6.0rc3 (<a href="https://issues.apache.org/jira/browse/PARQUET-136">PARQUET-136</a>).
604+ However, if your table doesn't contain any nullable string or binary columns, it's still safe to turn
605+ this feature on.
606+ </td >
607+ </tr >
598608<tr >
599609 <td ><code >spark.sql.hive.convertMetastoreParquet</code ></td >
600610 <td >true</td >
@@ -900,7 +910,6 @@ export HIVE_SERVER2_THRIFT_BIND_HOST=<listening-host>
900910./sbin/start-thriftserver.sh \
901911 --master <master-uri > \
902912 ...
903- ```
904913{% endhighlight %}
905914
906915or system properties:
@@ -911,7 +920,6 @@ or system properties:
911920 --hiveconf hive.server2.thrift.bind.host=<listening-host > \
912921 --master <master-uri >
913922 ...
914- ```
915923{% endhighlight %}
916924
917925Now you can use beeline to test the Thrift JDBC/ODBC server:
@@ -947,7 +955,7 @@ options.
947955
948956## Migration Guide for Shark User
949957
950- ### Scheduling
958+ ### Scheduling
951959To set a [ Fair Scheduler] ( job-scheduling.html#fair-scheduler-pools ) pool for a JDBC client session,
952960users can set the ` spark.sql.thriftserver.scheduler.pool ` variable:
953961
@@ -994,7 +1002,7 @@ Several caching related features are not supported yet:
9941002## Compatibility with Apache Hive
9951003
9961004Spark SQL is designed to be compatible with the Hive Metastore, SerDes and UDFs. Currently Spark
997- SQL is based on Hive 0.12.0.
1005+ SQL is based on Hive 0.12.0 and 0.13.1 .
9981006
9991007#### Deploying in Existing Hive Warehouses
10001008
@@ -1033,6 +1041,7 @@ Spark SQL supports the vast majority of Hive features, such as:
10331041* Sampling
10341042* Explain
10351043* Partitioned tables
1044+ * View
10361045* All Hive DDL Functions, including:
10371046 * ` CREATE TABLE `
10381047 * ` CREATE TABLE AS SELECT `
@@ -1048,6 +1057,7 @@ Spark SQL supports the vast majority of Hive features, such as:
10481057 * ` STRING `
10491058 * ` BINARY `
10501059 * ` TIMESTAMP `
1060+ * ` DATE `
10511061 * ` ARRAY<> `
10521062 * ` MAP<> `
10531063 * ` STRUCT<> `
@@ -1148,6 +1158,7 @@ evaluated by the SQL execution engine. A full list of the functions supported c
11481158* Datetime type
11491159 - ` TimestampType ` : Represents values comprising values of fields year, month, day,
11501160 hour, minute, and second.
1161+ - ` DateType ` : Represents values comprising values of fields year, month, day.
11511162* Complex types
11521163 - ` ArrayType(elementType, containsNull) ` : Represents values comprising a sequence of
11531164 elements with the type of ` elementType ` . ` containsNull ` is used to indicate if
@@ -1255,6 +1266,13 @@ import org.apache.spark.sql._
12551266 TimestampType
12561267 </td >
12571268</tr >
1269+ <tr >
1270+ <td > <b >DateType</b > </td >
1271+ <td > java.sql.Date </td >
1272+ <td >
1273+ DateType
1274+ </td >
1275+ </tr >
12581276<tr >
12591277 <td > <b >ArrayType</b > </td >
12601278 <td > scala.collection.Seq </td >
@@ -1381,6 +1399,13 @@ please use factory methods provided in
13811399 DataType.TimestampType
13821400 </td >
13831401</tr >
1402+ <tr >
1403+ <td > <b >DateType</b > </td >
1404+ <td > java.sql.Date </td >
1405+ <td >
1406+ DataType.DateType
1407+ </td >
1408+ </tr >
13841409<tr >
13851410 <td > <b >ArrayType</b > </td >
13861411 <td > java.util.List </td >
@@ -1528,6 +1553,13 @@ from pyspark.sql import *
15281553 TimestampType()
15291554 </td >
15301555</tr >
1556+ <tr >
1557+ <td > <b >DateType</b > </td >
1558+ <td > datetime.date </td >
1559+ <td >
1560+ DateType()
1561+ </td >
1562+ </tr >
15311563<tr >
15321564 <td > <b >ArrayType</b > </td >
15331565 <td > list, tuple, or array </td >
0 commit comments