Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions dev/deps/spark-deps-hadoop-2-hive-2.3
Original file line number Diff line number Diff line change
Expand Up @@ -225,12 +225,12 @@ orc-shims/1.7.4//orc-shims-1.7.4.jar
oro/2.0.8//oro-2.0.8.jar
osgi-resource-locator/1.0.3//osgi-resource-locator-1.0.3.jar
paranamer/2.8//paranamer-2.8.jar
parquet-column/1.12.2//parquet-column-1.12.2.jar
parquet-common/1.12.2//parquet-common-1.12.2.jar
parquet-encoding/1.12.2//parquet-encoding-1.12.2.jar
parquet-format-structures/1.12.2//parquet-format-structures-1.12.2.jar
parquet-hadoop/1.12.2//parquet-hadoop-1.12.2.jar
parquet-jackson/1.12.2//parquet-jackson-1.12.2.jar
parquet-column/1.12.3//parquet-column-1.12.3.jar
parquet-common/1.12.3//parquet-common-1.12.3.jar
parquet-encoding/1.12.3//parquet-encoding-1.12.3.jar
parquet-format-structures/1.12.3//parquet-format-structures-1.12.3.jar
parquet-hadoop/1.12.3//parquet-hadoop-1.12.3.jar
parquet-jackson/1.12.3//parquet-jackson-1.12.3.jar
pickle/1.2//pickle-1.2.jar
protobuf-java/2.5.0//protobuf-java-2.5.0.jar
py4j/0.10.9.5//py4j-0.10.9.5.jar
Expand Down
12 changes: 6 additions & 6 deletions dev/deps/spark-deps-hadoop-3-hive-2.3
Original file line number Diff line number Diff line change
Expand Up @@ -214,12 +214,12 @@ orc-shims/1.7.4//orc-shims-1.7.4.jar
oro/2.0.8//oro-2.0.8.jar
osgi-resource-locator/1.0.3//osgi-resource-locator-1.0.3.jar
paranamer/2.8//paranamer-2.8.jar
parquet-column/1.12.2//parquet-column-1.12.2.jar
parquet-common/1.12.2//parquet-common-1.12.2.jar
parquet-encoding/1.12.2//parquet-encoding-1.12.2.jar
parquet-format-structures/1.12.2//parquet-format-structures-1.12.2.jar
parquet-hadoop/1.12.2//parquet-hadoop-1.12.2.jar
parquet-jackson/1.12.2//parquet-jackson-1.12.2.jar
parquet-column/1.12.3//parquet-column-1.12.3.jar
parquet-common/1.12.3//parquet-common-1.12.3.jar
parquet-encoding/1.12.3//parquet-encoding-1.12.3.jar
parquet-format-structures/1.12.3//parquet-format-structures-1.12.3.jar
parquet-hadoop/1.12.3//parquet-hadoop-1.12.3.jar
parquet-jackson/1.12.3//parquet-jackson-1.12.3.jar
pickle/1.2//pickle-1.2.jar
protobuf-java/2.5.0//protobuf-java-2.5.0.jar
py4j/0.10.9.5//py4j-0.10.9.5.jar
Expand Down
4 changes: 2 additions & 2 deletions docs/sql-data-sources-parquet.md
Original file line number Diff line number Diff line change
Expand Up @@ -257,7 +257,7 @@ REFRESH TABLE my_table;

Since Spark 3.2, columnar encryption is supported for Parquet tables with Apache Parquet 1.12+.

Parquet uses the envelope encryption practice, where file parts are encrypted with "data encryption keys" (DEKs), and the DEKs are encrypted with "master encryption keys" (MEKs). The DEKs are randomly generated by Parquet for each encrypted file/column. The MEKs are generated, stored and managed in a Key Management Service (KMS) of user’s choice. The Parquet Maven [repository](https://repo1.maven.org/maven2/org/apache/parquet/parquet-hadoop/1.12.0/) has a jar with a mock KMS implementation that allows to run column encryption and decryption using a spark-shell only, without deploying a KMS server (download the `parquet-hadoop-tests.jar` file and place it in the Spark `jars` folder):
Parquet uses the envelope encryption practice, where file parts are encrypted with "data encryption keys" (DEKs), and the DEKs are encrypted with "master encryption keys" (MEKs). The DEKs are randomly generated by Parquet for each encrypted file/column. The MEKs are generated, stored and managed in a Key Management Service (KMS) of user’s choice. The Parquet Maven [repository](https://repo1.maven.org/maven2/org/apache/parquet/parquet-hadoop/1.12.3/) has a jar with a mock KMS implementation that allows to run column encryption and decryption using a spark-shell only, without deploying a KMS server (download the `parquet-hadoop-tests.jar` file and place it in the Spark `jars` folder):

<div class="codetabs">

Expand Down Expand Up @@ -349,7 +349,7 @@ df2 = spark.read.parquet("/path/to/table.parquet.encrypted")

#### KMS Client

The InMemoryKMS class is provided only for illustration and simple demonstration of Parquet encryption functionality. **It should not be used in a real deployment**. The master encryption keys must be kept and managed in a production-grade KMS system, deployed in user's organization. Rollout of Spark with Parquet encryption requires implementation of a client class for the KMS server. Parquet provides a plug-in [interface](https://github.com/apache/parquet-mr/blob/apache-parquet-1.12.0/parquet-hadoop/src/main/java/org/apache/parquet/crypto/keytools/KmsClient.java) for development of such classes,
The InMemoryKMS class is provided only for illustration and simple demonstration of Parquet encryption functionality. **It should not be used in a real deployment**. The master encryption keys must be kept and managed in a production-grade KMS system, deployed in user's organization. Rollout of Spark with Parquet encryption requires implementation of a client class for the KMS server. Parquet provides a plug-in [interface](https://github.com/apache/parquet-mr/blob/1.12.3/parquet-hadoop/src/main/java/org/apache/parquet/crypto/keytools/KmsClient.java) for development of such classes,

<div data-lang="java" markdown="1">
{% highlight java %}
Expand Down
4 changes: 2 additions & 2 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@
<kafka.version>3.2.0</kafka.version>
<!-- After 10.15.1.3, the minimum required version is JDK9 -->
<derby.version>10.14.2.0</derby.version>
<parquet.version>1.12.2</parquet.version>
<parquet.version>1.12.3</parquet.version>
<orc.version>1.7.4</orc.version>
<jetty.version>9.4.46.v20220331</jetty.version>
<jakartaservlet.version>4.0.3</jakartaservlet.version>
Expand Down Expand Up @@ -2357,7 +2357,7 @@
<groupId>${hive.group}</groupId>
<artifactId>hive-service-rpc</artifactId>
</exclusion>
<!-- parquet-hadoop-bundle:1.8.1 conflict with 1.12.0 -->
<!-- parquet-hadoop-bundle:1.8.1 conflict with 1.12.3 -->
<exclusion>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-hadoop-bundle</artifactId>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ import scala.reflect.runtime.universe.TypeTag
import org.apache.hadoop.fs.Path
import org.apache.parquet.filter2.predicate.{FilterApi, FilterPredicate, Operators}
import org.apache.parquet.filter2.predicate.FilterApi._
import org.apache.parquet.filter2.predicate.Operators.{Column => _, _}
import org.apache.parquet.filter2.predicate.Operators.{Column => _, Eq, Gt, GtEq, Lt, LtEq, NotEq, UserDefinedByInstance}
import org.apache.parquet.hadoop.{ParquetFileReader, ParquetInputFormat, ParquetOutputFormat}
import org.apache.parquet.hadoop.util.HadoopInputFile
import org.apache.parquet.schema.MessageType
Expand Down