Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
1979169
Revert "[SPARK-41369][CONNECT] Remove unneeded connect server deps"
HyukjinKwon Dec 9, 2022
24a588c
[MINOR][CONNECT][DOCS] Document parallelism=1 in Spark Connect testing
HyukjinKwon Dec 9, 2022
3212fa9
[SPARK-41439][CONNECT][PYTHON] Implement `DataFrame.melt` and `DataFr…
beliefer Dec 9, 2022
bd37b10
[SPARK-41377][BUILD] Fix spark-version-info.properties not found on W…
GauthamBanasandra Dec 9, 2022
83c12c2
[SPARK-39948][BUILD] Exclude hive-vector-code-gen dependency
zhouyifan279 Dec 9, 2022
bc32392
[MINOR][DOC] Fix typo in SqlBaseLexer.g4
Dec 9, 2022
fec210b
[SPARK-41395][SQL] `InterpretedMutableProjection` should use `setDeci…
bersprockets Dec 9, 2022
0cfda39
[SPARK-41446][CONNECT][PYTHON] Make `createDataFrame` support schema …
zhengruifeng Dec 9, 2022
dd0bd07
[SPARK-40270][PS][FOLLOWUP] Skip test_style when pandas <1.3.0
Yikun Dec 9, 2022
be52d67
[SPARK-41458][BUILD][YARN][SHUFFLE] Correctly transform the SPI servi…
pan3793 Dec 9, 2022
acdacf4
[SPARK-41402][SQL][CONNECT] Override prettyName of StringDecode
HyukjinKwon Dec 9, 2022
fc3c0f1
[SPARK-41450][BUILD] Fix shading in `core` module
pan3793 Dec 9, 2022
928eab6
[SPARK-41462][SQL] Date and timestamp type can up cast to TimestampNTZ
gengliangwang Dec 9, 2022
e371c53
[SPARK-41466][BUILD] Change Scala Style configuration to catch AnyFun…
HyukjinKwon Dec 9, 2022
29a7011
[SPARK-41225][CONNECT][PYTHON][FOLLOW-UP] Disable unsupported functions
grundprinzip Dec 9, 2022
9bdaed1
[SPARK-41456][SQL] Improve the performance of try_cast
gengliangwang Dec 9, 2022
deabae7
[SPARK-41329][CONNECT] Resolve circular imports in Spark Connect
HyukjinKwon Dec 10, 2022
435f6b1
[SPARK-41414][CONNECT][PYTHON] Implement date/timestamp functions
xinrong-meng Dec 10, 2022
c78f935
[SPARK-41474][PROTOBUF][BUILD] Exclude `proto` files from `spark-prot…
dongjoon-hyun Dec 10, 2022
3084cc5
[SPARK-41475][CONNECT] Fix lint-scala command error and typo
dengziming Dec 10, 2022
bf4981f
[SPARK-41457][PYTHON][TESTS] Refactor type annotations and dependency…
HyukjinKwon Dec 10, 2022
ef9113f
[SPARK-41467][BUILD] Upgrade httpclient from 4.5.13 to 4.5.14
panbingkun Dec 10, 2022
6972341
[SPARK-41417][CORE][SQL] Rename `_LEGACY_ERROR_TEMP_0019` to `INVALID…
LuciferYang Dec 10, 2022
c4af4b0
[SPARK-41476][INFRA] Prevent `README.md` from triggering CIs
dongjoon-hyun Dec 10, 2022
92655db
[SPARK-41443][SQL] Assign a name to the error class _LEGACY_ERROR_TEM…
panbingkun Dec 10, 2022
d4dca2d
[SPARK-41459][SQL] fix thrift server operation log output is empty
idealspark Dec 10, 2022
1faf26b
[SPARK-41460][CORE] Introduce `IsolatedThreadSafeRpcEndpoint` to exte…
Ngone51 Dec 10, 2022
fcdd6dc
[SPARK-41461][BUILD][CORE] Support user configurable protoc executabl…
Dec 11, 2022
af33722
[SPARK-41404][SQL] Refactor `ColumnVectorUtils#toBatch` to make `Col…
LuciferYang Dec 11, 2022
f92c827
[SPARK-41008][MLLIB] Follow-up isotonic regression features deduplica…
ahmed-mahran Dec 11, 2022
a404261
[SPARK-41479][K8S][DOCS] Add `IPv4 and IPv6` section to K8s document
dongjoon-hyun Dec 11, 2022
7cf348c
[SPARK-41477][CONNECT][PYTHON] Correctly infer the datatype of litera…
zhengruifeng Dec 12, 2022
ef9c8e0
[SPARK-41439][CONNECT][PYTHON][FOLLOWUP] Make unpivot of `connect/dat…
beliefer Dec 12, 2022
7e7bc94
[SPARK-41187][CORE] LiveExecutor MemoryLeak in AppStatusListener when…
Dec 12, 2022
5d52bb3
[SPARK-41486][SQL][TESTS] Upgrade `MySQL` docker image to 8.0.31 to s…
dongjoon-hyun Dec 12, 2022
cd2f786
[SPARK-41463][SQL][TESTS] Ensure error class names contain only capit…
bozhang2820 Dec 12, 2022
b8a91da
[SPARK-41484][CONNECT][PYTHON] Implement `collection` functions: E~M
zhengruifeng Dec 12, 2022
3952833
[MINOR][PYTHON][DOCS] Correct the type hint for `from_csv`
zhengruifeng Dec 12, 2022
63226ca
[SPARK-41492][CONNECT][PYTHON] Implement MISC functions
zhengruifeng Dec 12, 2022
1b2d700
[SPARK-41468][SQL] Fix PlanExpression handling in EquivalentExpressions
peter-toth Dec 12, 2022
1f4c8e4
[SPARK-40775][SQL] Fix duplicate description entries for V2 file scans
Kimahriman Dec 12, 2022
7801666
[SPARK-41448] Make consistent MR job IDs in FileBatchWriter and FileF…
Dec 12, 2022
e43b4be
[SPARK-41378][SQL][FOLLOWUP] DS V2 ColStats follow up
huaxingao Dec 12, 2022
435b57b
[SPARK-41412][CONNECT][FOLLOW-UP] Fix test_cast to pass with ANSI mod…
HyukjinKwon Dec 12, 2022
7cb8288
[SPARK-41491][SQL][TESTS] Update postgres docker image to 15.1
williamhyun Dec 12, 2022
c3f46d5
[SPARK-41360][CORE] Avoid BlockManager re-registration if the executo…
Ngone51 Dec 12, 2022
6034e2c
[SPARK-41378][SQL][FOLLOWUP] Use toAttributeMap before comparison
dongjoon-hyun Dec 12, 2022
d2212e3
[SPARK-41495][CONNECT][PYTHON] Implement `collection` functions: P~Z
zhengruifeng Dec 13, 2022
c16c9b6
[SPARK-41502][K8S][TESTS] Upgrade the minimum Minikube version to 1.28.0
dongjoon-hyun Dec 13, 2022
3275555
[SPARK-41504][K8S][R][TESTS] Update R version to 4.1.2 in Dockerfile …
dongjoon-hyun Dec 13, 2022
f590f87
[SPARK-41461][BUILD][CORE][CONNECT][PROTOBUF] Unify the environment v…
Dec 13, 2022
5720e82
[SPARK-41499][BUILD] Upgrade Protobuf version to 3.21.11
gengliangwang Dec 13, 2022
af8dd41
[SPARK-33782][K8S][CORE] Place spark.files, spark.jars and spark.file…
pralabhkumar Dec 13, 2022
9b69331
[SPARK-41481][CORE][SQL] Reuse `INVALID_TYPED_LITERAL` instead of `_L…
LuciferYang Dec 13, 2022
27f4d1e
[SPARK-41468][SQL][FOLLOWUP] Handle NamedLambdaVariables in Equivalen…
peter-toth Dec 13, 2022
0e2d604
[SPARK-41406][SQL] Refactor error message for `NUM_COLUMNS_MISMATCH` …
panbingkun Dec 13, 2022
3809ccd
[SPARK-41478][SQL] Assign a name to the error class _LEGACY_ERROR_TEM…
panbingkun Dec 13, 2022
e857b7a
[SPARK-39601][YARN] AllocationFailure should not be treated as exitCa…
pan3793 Dec 13, 2022
a2ceff2
[SPARK-41360][CORE][BUILD][FOLLOW-UP] Exclude BlockManagerMessages.Re…
HyukjinKwon Dec 13, 2022
7e9b88b
[SPARK-27561][SQL] Support implicit lateral column alias resolution o…
anchovYu Dec 13, 2022
d00771f
[SPARK-39601][YARN][FOLLOWUP] YarnClusterSchedulerBackend should call…
pan3793 Dec 13, 2022
e2474f6
[SPARK-41482][BUILD] Upgrade dropwizard metrics 4.2.13
LuciferYang Dec 13, 2022
e29ada0
[SPARK-41062][SQL] Rename `UNSUPPORTED_CORRELATED_REFERENCE` to `CORR…
itholic Dec 13, 2022
a75bc84
[SPARK-41412][CONNECT][TESTS][FOLLOW-UP] Exclude binary casting to ma…
HyukjinKwon Dec 13, 2022
cdc73ad
[SPARK-41506][CONNECT][PYTHON] Refactor LiteralExpression to support …
zhengruifeng Dec 14, 2022
ea53dc8
[SPARK-41506][CONNECT][TESTS][FOLLOW-UP] Import BinaryType in pyspark…
HyukjinKwon Dec 14, 2022
1b3a444
[SPARK-27561][SQL][FOLLOWUP] Move the two rules for Later column alia…
gengliangwang Dec 14, 2022
4e8980e
[SPARK-41409][CORE][SQL] Rename `_LEGACY_ERROR_TEMP_1043` to `WRONG_N…
LuciferYang Dec 14, 2022
5b50834
[SPARK-41248][SQL] Add "spark.sql.json.enablePartialResults" to enabl…
sadikovi Dec 14, 2022
0fd1f85
[SPARK-41514][K8S][DOCS] Add `PVC-oriented executor pod allocation` d…
dongjoon-hyun Dec 14, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
1 change: 1 addition & 0 deletions appveyor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ only_commits:
files:
- appveyor.yml
- dev/appveyor-install-dependencies.ps1
- build/spark-build-info.ps1
- R/
- sql/core/src/main/scala/org/apache/spark/sql/api/r/
- core/src/main/scala/org/apache/spark/api/r/
Expand Down
46 changes: 46 additions & 0 deletions build/spark-build-info.ps1
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# This script generates the build info for spark and places it into the spark-version-info.properties file.
# Arguments:
# ResourceDir - The target directory where properties file would be created. [./core/target/extra-resources]
# SparkVersion - The current version of spark

param(
# The resource directory.
[Parameter(Position = 0)]
[String]
$ResourceDir,

# The Spark version.
[Parameter(Position = 1)]
[String]
$SparkVersion
)

$null = New-Item -Type Directory -Force $ResourceDir
$SparkBuildInfoPath = $ResourceDir.TrimEnd('\').TrimEnd('/') + '\spark-version-info.properties'

$SparkBuildInfoContent =
"version=$SparkVersion
user=$($Env:USERNAME)
revision=$(git rev-parse HEAD)
branch=$(git rev-parse --abbrev-ref HEAD)
date=$([DateTime]::UtcNow | Get-Date -UFormat +%Y-%m-%dT%H:%M:%SZ)
url=$(git config --get remote.origin.url)"

Set-Content -Path $SparkBuildInfoPath -Value $SparkBuildInfoContent
5 changes: 5 additions & 0 deletions common/network-yarn/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,11 @@
<goals>
<goal>shade</goal>
</goals>
<configuration>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -70,10 +70,6 @@ case class AvroScan(

override def hashCode(): Int = super.hashCode()

override def description(): String = {
super.description() + ", PushedFilters: " + pushedFilters.mkString("[", ", ", "]")
}

override def getMetaData(): Map[String, String] = {
super.getMetaData() ++ Map("PushedFilters" -> seqToString(pushedFilters))
}
Expand Down
6 changes: 3 additions & 3 deletions connector/connect/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,15 +32,15 @@ for example, compiling `connect` module on CentOS 6 or CentOS 7 which the defaul
specifying the user-defined `protoc` and `protoc-gen-grpc-java` binary files as follows:

```bash
export CONNECT_PROTOC_EXEC_PATH=/path-to-protoc-exe
export SPARK_PROTOC_EXEC_PATH=/path-to-protoc-exe
export CONNECT_PLUGIN_EXEC_PATH=/path-to-protoc-gen-grpc-java-exe
./build/mvn -Phive -Puser-defined-protoc clean package
```

or

```bash
export CONNECT_PROTOC_EXEC_PATH=/path-to-protoc-exe
export SPARK_PROTOC_EXEC_PATH=/path-to-protoc-exe
export CONNECT_PLUGIN_EXEC_PATH=/path-to-protoc-gen-grpc-java-exe
./build/sbt -Puser-defined-protoc clean package
```
Expand Down Expand Up @@ -82,7 +82,7 @@ To use the release version of Spark Connect:

```bash
# Run all Spark Connect Python tests as a module.
./python/run-tests --module pyspark-connect
./python/run-tests --module pyspark-connect --parallelism 1
```


Expand Down
4 changes: 2 additions & 2 deletions connector/connect/common/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,7 @@
<profile>
<id>user-defined-protoc</id>
<properties>
<connect.protoc.executable.path>${env.CONNECT_PROTOC_EXEC_PATH}</connect.protoc.executable.path>
<spark.protoc.executable.path>${env.SPARK_PROTOC_EXEC_PATH}</spark.protoc.executable.path>
<connect.plugin.executable.path>${env.CONNECT_PLUGIN_EXEC_PATH}</connect.plugin.executable.path>
</properties>
<build>
Expand All @@ -203,7 +203,7 @@
<artifactId>protobuf-maven-plugin</artifactId>
<version>0.6.1</version>
<configuration>
<protocExecutable>${connect.protoc.executable.path}</protocExecutable>
<protocExecutable>${spark.protoc.executable.path}</protocExecutable>
<pluginId>grpc-java</pluginId>
<pluginExecutable>${connect.plugin.executable.path}</pluginExecutable>
<protoSourceRoot>src/main/protobuf</protoSourceRoot>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -77,9 +77,7 @@ message Expression {
int32 year_month_interval = 20;
int64 day_time_interval = 21;

Array array = 22;
Struct struct = 23;
Map map = 24;
DataType typed_null = 22;
}

// whether the literal type should be treated as a nullable type. Applies to
Expand Down Expand Up @@ -107,25 +105,6 @@ message Expression {
int32 days = 2;
int64 microseconds = 3;
}

message Struct {
// A possibly heterogeneously typed list of literals
repeated Literal fields = 1;
}

message Array {
// A homogeneously typed list of literals
repeated Literal values = 1;
}

message Map {
repeated Pair pairs = 1;

message Pair {
Literal key = 1;
Literal value = 2;
}
}
}

// An unresolved attribute that is not explicitly bound to a specific column, but the column
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ syntax = 'proto3';
package spark.connect;

import "spark/connect/expressions.proto";
import "spark/connect/types.proto";

option java_multiple_files = true;
option java_package = "org.apache.spark.connect.proto";
Expand Down Expand Up @@ -54,6 +55,7 @@ message Relation {
Tail tail = 22;
WithColumns with_columns = 23;
Hint hint = 24;
Unpivot unpivot = 25;

// NA functions
NAFill fill_na = 90;
Expand Down Expand Up @@ -304,6 +306,17 @@ message LocalRelation {
// Local collection data serialized into Arrow IPC streaming format which contains
// the schema of the data.
bytes data = 1;

// (Optional) The user provided schema.
//
// The Sever side will update the column names and data types according to this schema.
oneof schema {

DataType datatype = 2;

// Server will use Catalyst parser to parse this string to DataType.
string datatype_str = 3;
}
}

// Relation of type [[Sample]] that samples a fraction of the dataset.
Expand Down Expand Up @@ -570,3 +583,21 @@ message Hint {
// (Optional) Hint parameters.
repeated Expression.Literal parameters = 3;
}

// Unpivot a DataFrame from wide format to long format, optionally leaving identifier columns set.
message Unpivot {
// (Required) The input relation.
Relation input = 1;

// (Required) Id columns.
repeated Expression ids = 2;

// (Optional) Value columns to unpivot.
repeated Expression values = 3;

// (Required) Name of the variable column.
string variable_column_name = 4;

// (Required) Name of the value column.
string value_column_name = 5;
}
67 changes: 67 additions & 0 deletions connector/connect/server/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,12 @@
<groupId>org.apache.spark</groupId>
<artifactId>spark-connect-common_${scala.binary.version}</artifactId>
<version>${project.version}</version>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
Expand Down Expand Up @@ -106,19 +112,80 @@
<artifactId>spark-tags_${scala.binary.version}</artifactId>
<version>${project.version}</version>
<scope>provided</scope>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
</dependency>
<!-- #if scala-2.13 --><!--
<dependency>
<groupId>org.scala-lang.modules</groupId>
<artifactId>scala-parallel-collections_${scala.binary.version}</artifactId>
</dependency>
--><!-- #endif scala-2.13 -->
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>${guava.version}</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>failureaccess</artifactId>
<version>${guava.failureaccess.version}</version>
</dependency>
<dependency>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java</artifactId>
<version>${protobuf.version}</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>io.grpc</groupId>
<artifactId>grpc-netty</artifactId>
<version>${io.grpc.version}</version>
</dependency>
<dependency>
<groupId>io.grpc</groupId>
<artifactId>grpc-protobuf</artifactId>
<version>${io.grpc.version}</version>
</dependency>
<dependency>
<groupId>io.grpc</groupId>
<artifactId>grpc-services</artifactId>
<version>${io.grpc.version}</version>
</dependency>
<dependency>
<groupId>io.grpc</groupId>
<artifactId>grpc-stub</artifactId>
<version>${io.grpc.version}</version>
</dependency>
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty-codec-http2</artifactId>
<version>${netty.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty-handler-proxy</artifactId>
<version>${netty.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty-transport-native-unix-common</artifactId>
<version>${netty.version}</version>
<scope>provided</scope>
</dependency>
<dependency> <!-- necessary for Java 9+ -->
<groupId>org.apache.tomcat</groupId>
<artifactId>annotations-api</artifactId>
<version>${tomcat.annotations.api.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.scalacheck</groupId>
<artifactId>scalacheck_${scala.binary.version}</artifactId>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,10 @@ private[spark] object Connect {

val CONNECT_GRPC_ARROW_MAX_BATCH_SIZE =
ConfigBuilder("spark.connect.grpc.arrow.maxBatchSize")
.doc("When using Apache Arrow, limit the maximum size of one arrow batch that " +
"can be sent from server side to client side. Currently, we conservatively use 70% " +
"of it because the size is not accurate but estimated.")
.doc(
"When using Apache Arrow, limit the maximum size of one arrow batch that " +
"can be sent from server side to client side. Currently, we conservatively use 70% " +
"of it because the size is not accurate but estimated.")
.version("3.4.0")
.bytesConf(ByteUnit.MiB)
.createWithDefaultString("4m")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -719,6 +719,53 @@ package object dsl {
.build()
}

def unpivot(
ids: Seq[Expression],
values: Seq[Expression],
variableColumnName: String,
valueColumnName: String): Relation = {
Relation
.newBuilder()
.setUnpivot(
Unpivot
.newBuilder()
.setInput(logicalPlan)
.addAllIds(ids.asJava)
.addAllValues(values.asJava)
.setVariableColumnName(variableColumnName)
.setValueColumnName(valueColumnName))
.build()
}

def unpivot(
ids: Seq[Expression],
variableColumnName: String,
valueColumnName: String): Relation = {
Relation
.newBuilder()
.setUnpivot(
Unpivot
.newBuilder()
.setInput(logicalPlan)
.addAllIds(ids.asJava)
.setVariableColumnName(variableColumnName)
.setValueColumnName(valueColumnName))
.build()
}

def melt(
ids: Seq[Expression],
values: Seq[Expression],
variableColumnName: String,
valueColumnName: String): Relation =
unpivot(ids, values, variableColumnName, valueColumnName)

def melt(
ids: Seq[Expression],
variableColumnName: String,
valueColumnName: String): Relation =
unpivot(ids, variableColumnName, valueColumnName)

private def createSetOperation(
left: Relation,
right: Relation,
Expand Down
Loading