Skip to content
Merged

sync #22

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
172 commits
Select commit Hold shift + click to select a range
bdeb626
[SPARK-32272][SQL] Add SQL standard command SET TIME ZONE
yaooqinn Jul 16, 2020
6be8b93
[SPARK-32234][SQL] Spark sql commands are failing on selecting the or…
SaurabhChawla100 Jul 16, 2020
c1f160e
[SPARK-30648][SQL] Support filters pushdown in JSON datasource
MaxGekk Jul 16, 2020
d5c672a
[SPARK-32315][ML] Provide an explanation error message when calling r…
dzlab Jul 16, 2020
383f5e9
[SPARK-32310][ML][PYSPARK] ML params default value parity in classifi…
huaxingao Jul 16, 2020
fb51925
[SPARK-32335][K8S][TESTS] Remove Python2 test from K8s IT
dongjoon-hyun Jul 16, 2020
9747e8f
[SPARK-31831][SQL][TESTS][FOLLOWUP] Put mocks for HiveSessionImplSuit…
Jul 17, 2020
ea9e8f3
[SPARK-32094][PYTHON] Update cloudpickle to v1.5.0
HyukjinKwon Jul 17, 2020
efa70b8
[SPARK-32145][SQL][FOLLOWUP] Fix type in the error log of SparkOperation
yaooqinn Jul 17, 2020
ffdbbae
[SPARK-32215] Expose a (protected) /workers/kill endpoint on the Mast…
dagrawal3409 Jul 17, 2020
34baed8
[SPARK-30616][SQL] Introduce TTL config option for SQL Metadata Cache
sap1ens Jul 17, 2020
5daf244
[SPARK-32329][TESTS] Rename HADOOP2_MODULE_PROFILES to HADOOP_MODULE_…
williamhyun Jul 17, 2020
3a60b41
[SPARK-32298][ML] tree models prediction optimization
zhengruifeng Jul 17, 2020
7dc1d89
[SPARK-32353][TEST] Update docker/spark-test and clean up unused stuff
williamhyun Jul 17, 2020
0678afe
[SPARK-21040][CORE] Speculate tasks which are running on decommission…
prakharjain09 Jul 17, 2020
f9f9309
[SPARK-31579][SQL] replaced floorDiv to Div
Sudhar287 Jul 18, 2020
ee62482
[SPARK-29292][YARN][K8S][MESOS] Fix Scala 2.13 compilation for remain…
srowen Jul 18, 2020
40ef012
[SPARK-29802][BUILD] Use python3 in build scripts
srowen Jul 19, 2020
c7a68a9
[SPARK-32344][SQL] Unevaluable expr is set to FIRST/LAST ignoreNullsE…
maropu Jul 19, 2020
026b0b9
[SPARK-32253][INFRA] Show errors only for the sbt tests of github act…
gengliangwang Jul 19, 2020
0aca1a6
[SPARK-32276][SQL] Remove redundant sorts before repartition nodes
aokolnychyi Jul 19, 2020
32a0451
[MINOR][DOCS] Fix links to Cloud Storage connectors docs
medb Jul 19, 2020
ef3cad1
[SPARK-29157][SQL][PYSPARK] Add DataFrameWriterV2 to Python API
zero323 Jul 20, 2020
a4ca355
[SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown
holdenk Jul 20, 2020
c2afe1c
[SPARK-32366][DOC] Fix doc link of datetime pattern in 3.0 migration …
gengliangwang Jul 20, 2020
d0c83f3
[SPARK-32302][SQL] Partially push down disjunctive predicates through…
gengliangwang Jul 20, 2020
e0ecb66
[SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side f…
imback82 Jul 20, 2020
fe07521
[SPARK-32330][SQL] Preserve shuffled hash join build side partitioning
c21 Jul 20, 2020
ffdca82
[SPARK-32367][K8S][TESTS] Correct the spelling of parameter in Kubern…
merrily01 Jul 20, 2020
133c5ed
[SPARK-32368][SQL] pathGlobFilter, recursiveFileLookup and basePath s…
HyukjinKwon Jul 20, 2020
7d65cae
[SPARK-32338][SQL] Overload slice to accept Column for start and length
nvander1 Jul 21, 2020
02114f9
[SPARK-32365][SQL] Add a boundary condition for negative index in reg…
beliefer Jul 21, 2020
8a1c24b
[SPARK-32362][SQL][TEST] AdaptiveQueryExecSuite misses verifying AE r…
LantaoJin Jul 21, 2020
1267d80
[MINOR][DOCS] add link for Debugging your Application in running-on-y…
brandonJY Jul 21, 2020
8c7d6f9
[SPARK-32377][SQL] CaseInsensitiveMap should be deterministic for add…
dongjoon-hyun Jul 21, 2020
4da93b0
[SPARK-32363][PYTHON][BUILD] Fix flakiness in pip package testing in …
HyukjinKwon Jul 21, 2020
0432379
[SPARK-24266][K8S] Restart the watcher when we receive a version chan…
stijndehaes Jul 21, 2020
39181ff
[SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if …
c21 Jul 21, 2020
7b9d755
[SPARK-32350][CORE] Add batch-write on LevelDB to improve performance…
Jul 22, 2020
b4a9606
[SPARK-31922][CORE] logDebug "RpcEnv already stopped" error on LocalS…
Ngone51 Jul 22, 2020
29b7eaa
[MINOR][SQL] Fix warning message for ThriftCLIService.GetCrossReferen…
yaooqinn Jul 22, 2020
feca9ed
[MINOR][SQL][TESTS] Create tables once in JDBC tests
MaxGekk Jul 22, 2020
04bf351
[SPARK-21117][SQL][FOLLOWUP] Define prettyName for WidthBucket
maropu Jul 22, 2020
e8c06af
[SPARK-32003][CORE] When external shuffle service is used, unregister…
wypoon Jul 22, 2020
cd16a10
[SPARK-32364][SQL] Use CaseInsensitiveMap for DataFrameReader/Writer …
dongjoon-hyun Jul 22, 2020
184074d
[SPARK-31999][SQL] Add REFRESH FUNCTION command
ulysses-you Jul 22, 2020
b151194
[SPARK-32392][SQL] Reduce duplicate error log for executing sql state…
yaooqinn Jul 23, 2020
4616982
[SPARK-30616][SQL][FOLLOW-UP] Use only config key name in the config doc
ueshin Jul 23, 2020
182566b
[SPARK-32237][SQL] Resolve hint in CTE
LantaoJin Jul 23, 2020
f8d29d3
[SPARK-32217] Plumb whether a worker would also be decommissioned alo…
dagrawal3409 Jul 23, 2020
7b66882
[SPARK-32338][SQL][PYSPARK][FOLLOW-UP] Update slice to accept Column …
ueshin Jul 23, 2020
a71233f
[SPARK-32389][TESTS] Add all hive.execution suite in the parallel tes…
xuanyuanking Jul 23, 2020
aed8dba
[SPARK-32364][SQL][FOLLOWUP] Add toMap to return originalMap and docu…
dongjoon-hyun Jul 23, 2020
aa54dcf
[SPARK-32251][SQL][TESTS][FOLLOWUP] improve SQL keyword test
cloud-fan Jul 23, 2020
a8e3de3
[SPARK-32280][SPARK-32372][SQL] ResolveReferences.dedupRight should o…
Ngone51 Jul 23, 2020
35345e3
[SPARK-32374][SQL] Disallow setting properties when creating temporar…
imback82 Jul 23, 2020
e7fb67c
[SPARK-31418][SCHEDULER] Request more executors in case of dynamic al…
venkata91 Jul 23, 2020
be2eca2
[SPARK-32398][TESTS][CORE][STREAMING][SQL][ML] Update to scalatest 3.…
srowen Jul 23, 2020
658e874
[SPARK-30648][SQL][FOLLOWUP] Refactoring of JsonFilters: move config …
MaxGekk Jul 24, 2020
19e3ed7
[SPARK-32415][SQL][TESTS] Enable tests for JSON option: allowNonNumer…
MaxGekk Jul 24, 2020
84efa04
[SPARK-32308][SQL] Move by-name resolution logic of unionByName from …
viirya Jul 24, 2020
8896f4a
Revert "[SPARK-32253][INFRA] Show errors only for the sbt tests of gi…
gengliangwang Jul 24, 2020
8bc799f
[SPARK-32375][SQL] Basic functionality of table catalog v2 for JDBC
MaxGekk Jul 24, 2020
fa184c3
[SPARK-32408][BUILD] Enable crossPaths back to prevent side effects
HyukjinKwon Jul 24, 2020
d3596c0
[SPARK-32406][SQL] Make RESET syntax support single configuration reset
yaooqinn Jul 24, 2020
64a01c0
[SPARK-32430][SQL] Extend SparkSessionExtensions to inject rules into…
andygrove Jul 24, 2020
e6ef27b
[SPARK-32287][TESTS] Flaky Test: ExecutorAllocationManagerSuite.add e…
tgravescs Jul 24, 2020
b890fdc
[SPARK-32387][SS] Extract UninterruptibleThread runner logic from Kaf…
gaborgsomogyi Jul 24, 2020
8e36a8f
[SPARK-32419][PYTHON][BUILD] Avoid using subshell for Conda env (de)a…
HyukjinKwon Jul 25, 2020
277a406
[SPARK-32422][SQL][TESTS] Use python3 executable instead of python3.6…
HyukjinKwon Jul 25, 2020
be9f03d
[SPARK-32426][SQL] ui shows sql after variable substitution
cxzl25 Jul 25, 2020
f642234
[SPARK-32437][CORE] Improve MapStatus deserialization speed with Roar…
dongjoon-hyun Jul 25, 2020
aab1e09
[SPARK-32434][CORE] Support Scala 2.13 in AbstractCommandBuilder and …
dongjoon-hyun Jul 25, 2020
f9f1867
[SPARK-32436][CORE] Initialize numNonEmptyBlocks in HighlyCompressedM…
dongjoon-hyun Jul 25, 2020
80e8898
[SPARK-32438][CORE][TESTS] Use HashMap.withDefaultValue in RDDSuite
dongjoon-hyun Jul 25, 2020
d1301af
[SPARK-32437][CORE][FOLLOWUP] Update dependency manifest for RoaringB…
dongjoon-hyun Jul 25, 2020
147022a
[SPARK-32440][CORE][TESTS] Make BlockManagerSuite robust from Scala o…
dongjoon-hyun Jul 25, 2020
83ffef7
[SPARK-32441][BUILD][CORE] Update json4s to 3.7.0-M5 for Scala 2.13
dongjoon-hyun Jul 26, 2020
86ead04
[SPARK-32428][EXAMPLES] Make BinaryClassificationMetricsExample cons…
titsuki Jul 26, 2020
7e0c5b3
[SPARK-32442][CORE][TESTS] Fix TaskSetManagerSuite by hiding `o.a.s.F…
dongjoon-hyun Jul 26, 2020
4f79b9f
[SPARK-32447][CORE] Use python3 by default in pyspark and find-spark-…
dongjoon-hyun Jul 26, 2020
70ac594
[SPARK-32450][PYTHON] Upgrade pycodestyle to v2.6.0
viirya Jul 27, 2020
8153f56
[SPARK-32451][R] Support Apache Arrow 1.0.0
dongjoon-hyun Jul 27, 2020
13c64c2
[SPARK-32448][K8S][TESTS] Use single version for exec-maven-plugin/sc…
dongjoon-hyun Jul 27, 2020
01cf8a4
[SPARK-32383][SQL] Preserve hash join (BHJ and SHJ) stream side ordering
c21 Jul 27, 2020
bfa5d57
[SPARK-32452][R][SQL] Bump up the minimum Arrow version as 1.0.0 in S…
HyukjinKwon Jul 27, 2020
99f33ec
[SPARK-32234][FOLLOWUP][SQL] Update the description of utility method
SaurabhChawla100 Jul 27, 2020
6ab29b3
[SPARK-32179][SPARK-32188][PYTHON][DOCS] Replace and redesign the doc…
HyukjinKwon Jul 27, 2020
a82aee0
[SPARK-32435][PYTHON] Remove heapq3 port from Python 3
HyukjinKwon Jul 27, 2020
998086c
[SPARK-30794][CORE] Stage Level scheduling: Add ability to set off he…
Jul 27, 2020
ea58e52
[SPARK-32434][CORE][FOLLOW-UP] Fix load-spark-env.cmd to be able to r…
HyukjinKwon Jul 27, 2020
548b7db
[SPARK-32420][SQL] Add handling for unique key in non-codegen hash join
c21 Jul 27, 2020
d315ebf
[SPARK-32424][SQL] Fix silent data change for timestamp parsing if ov…
yaooqinn Jul 27, 2020
c114066
[SPARK-32443][CORE] Use POSIX-compatible `command -v` in testCommandA…
HyukjinKwon Jul 27, 2020
f7542d3
[SPARK-32457][ML] logParam thresholds in DT/GBT/FM/LR/MLP
zhengruifeng Jul 27, 2020
8de4333
[SPARK-31753][SQL][DOCS] Add missing keywords in the SQL docs
GuoPhilipse Jul 28, 2020
8323c8e
[SPARK-32059][SQL] Allow nested schema pruning thru window/sort plans
Jul 28, 2020
77f2ca6
[MINOR][PYTHON] Fix spacing in error message
hauntsaninja Jul 28, 2020
44a5258
[SPARK-31525][SQL] Return an empty list for df.head() when df is empty
tianshizz Jul 28, 2020
12b9787
[SPARK-32290][SQL] SingleColumn Null Aware Anti Join Optimize
leanken-zz Jul 28, 2020
ca1ecf7
[SPARK-32459][SQL] Support WrappedArray as customCollectionCls in Map…
Ngone51 Jul 28, 2020
c28da67
[SPARK-32382][SQL] Override table renaming in JDBC dialects
MaxGekk Jul 28, 2020
44c868b
[SPARK-32339][ML][DOC] Improve MLlib BLAS native acceleration docs
xwu-intel Jul 28, 2020
a3d8056
[SPARK-32458][SQL][TESTS] Fix incorrectly sized row value reads
mundaym Jul 28, 2020
5491c08
Revert "[SPARK-31525][SQL] Return an empty list for df.head() when df…
HyukjinKwon Jul 29, 2020
b2180c0
[SPARK-32471][SQL][DOCS][TESTS][PYTHON][SS] Describe JSON option `all…
MaxGekk Jul 29, 2020
45b7212
[SPARK-32401][SQL] Migrate function related commands to use Unresolve…
imback82 Jul 29, 2020
26e6574
[SPARK-32283][CORE] Kryo should support multiple user registrators
LantaoJin Jul 29, 2020
77987a2
[SPARK-32473][CORE][TESTS] Use === instead IndexSeqView
dongjoon-hyun Jul 29, 2020
9be0883
[SPARK-32175][CORE] Fix the order between initialization for Executor…
sarutak Jul 29, 2020
5eab8d2
[SPARK-32477][CORE] JsonProtocol.accumulablesToJson should be determi…
dongjoon-hyun Jul 29, 2020
40e6a5b
[SPARK-32449][ML][PYSPARK] Add summary to MultilayerPerceptronClassif…
huaxingao Jul 29, 2020
d897825
[SPARK-32346][SQL] Support filters pushdown in Avro datasource
MaxGekk Jul 29, 2020
9dc0237
[SPARK-32476][CORE] ResourceAllocator.availableAddrs should be determ…
dongjoon-hyun Jul 29, 2020
e926d41
[SPARK-30322][DOCS] Add stage level scheduling docs
tgravescs Jul 29, 2020
a025a89
[SPARK-32332][SQL] Support columnar exchanges
cloud-fan Jul 29, 2020
50911df
[SPARK-32397][BUILD] Allow specifying of time for build to keep time …
holdenk Jul 29, 2020
1638674
[SPARK-32487][CORE] Remove j.w.r.NotFoundException from `import` in […
dongjoon-hyun Jul 30, 2020
08a66f8
[SPARK-32248][BUILD] Recover Java 11 build in Github Actions
dongjoon-hyun Jul 30, 2020
89d9b7c
[SPARK-32010][PYTHON][CORE] Add InheritableThread for local propertie…
HyukjinKwon Jul 30, 2020
81b0785
[SPARK-32455][ML] LogisticRegressionModel prediction optimization
zhengruifeng Jul 30, 2020
99a8555
[SPARK-32431][SQL] Check duplicate nested columns in read from in-bui…
MaxGekk Jul 30, 2020
e1d7321
[SPARK-32478][R][SQL] Error message to show the schema mismatch in ga…
HyukjinKwon Jul 30, 2020
510a165
[SPARK-32412][SQL] Unify error handling for spark thrift server opera…
yaooqinn Jul 30, 2020
30e3042
[SPARK-32488][SQL] Use @parser::members and @lexer::members to avoid …
maropu Jul 30, 2020
1f7fe54
[SPARK-32491][INFRA] Do not install SparkR in test-only mode in testi…
HyukjinKwon Jul 30, 2020
e0c8bd0
[SPARK-32493][INFRA] Manually install R instead of using setup-r in G…
HyukjinKwon Jul 30, 2020
12f443c
[SPARK-32496][INFRA] Include GitHub Action file as the changes in tes…
HyukjinKwon Jul 30, 2020
7437720
[SPARK-32227] Fix regression bug in load-spark-env.cmd with Spark 3.0.0
Jul 30, 2020
32f4ef0
[SPARK-32497][INFRA] Installs qpdf package for CRAN check in GitHub A…
HyukjinKwon Jul 30, 2020
7cf3b54
[SPARK-32489][CORE] Pass `core` module UTs in Scala 2.13
dongjoon-hyun Jul 30, 2020
366a178
[SPARK-32199][SPARK-32198] Reduce job failures during decommissioning
dagrawal3409 Jul 30, 2020
6032c5b
[SPARK-32417] Fix flakyness of BlockManagerDecommissionIntegrationSuite
dagrawal3409 Jul 30, 2020
9d7b1d9
[SPARK-32175][SPARK-32175][FOLLOWUP] Remove flaky test added in
sarutak Jul 31, 2020
f602782
[SPARK-32482][SS][TESTS] Eliminate deprecated poll(long) API calls to…
gaborgsomogyi Jul 31, 2020
ae82768
[SPARK-32421][SQL] Add code-gen for shuffled hash join
c21 Jul 31, 2020
813532d
[SPARK-32468][SS][TESTS] Fix timeout config issue in Kafka connector …
gaborgsomogyi Jul 31, 2020
8014b0b
[SPARK-32160][CORE][PYSPARK] Add a config to switch allow/disallow to…
ueshin Jul 31, 2020
f480040
[SPARK-32406][SQL][FOLLOWUP] Make RESET fail against static and core …
yaooqinn Jul 31, 2020
4eaf3a0
[SPARK-31418][CORE][FOLLOW-UP][MINOR] Fix log messages to print stage…
venkata91 Jul 31, 2020
354313b
[SPARK-31894][SS][FOLLOW-UP] Rephrase the config doc
xuanyuanking Jul 31, 2020
1c6dff7
[SPARK-32083][SQL] AQE coalesce should at least return one partition
cloud-fan Jul 31, 2020
71aea02
[SPARK-32467][UI] Avoid encoding URL twice on https redirect
gengliangwang Aug 1, 2020
0693d8b
[SPARK-32490][BUILD] Upgrade netty-all to 4.1.51.Final
LuciferYang Aug 2, 2020
713124d
[SPARK-32274][SQL] Make SQL cache serialization pluggable
revans2 Aug 3, 2020
fda397d
[SPARK-32510][SQL] Check duplicate nested columns in read from JDBC d…
MaxGekk Aug 3, 2020
7a09e71
[SPARK-32509][SQL] Ignore unused DPP True Filter in Canonicalization
prakharjain09 Aug 3, 2020
42f9ee4
[SPARK-24884][SQL] Support regexp function regexp_extract_all
beliefer Aug 3, 2020
3deb59d
[SPARK-31709][SQL] Proper base path for database/table location when …
yaooqinn Aug 3, 2020
7f5326c
[SPARK-32492][SQL] Fulfill missing column meta information COLUMN_SIZ…
yaooqinn Aug 3, 2020
c6109ba
[SPARK-32257][SQL] Reports explicit errors for invalid usage of SET/R…
maropu Aug 3, 2020
bc78859
[SPARK-32310][ML][PYSPARK] ML params default value parity in feature …
huaxingao Aug 3, 2020
f3b10f5
[SPARK-32290][SQL][FOLLOWUP] Add version for the SQL config `spark.sq…
MaxGekk Aug 3, 2020
9bbe8c7
[MINOR][SQL] Fix versions in the SQL migration guide for Spark 3.1
MaxGekk Aug 4, 2020
7deb67c
[SPARK-32160][CORE][PYSPARK][FOLLOWUP] Change the config name to swit…
ueshin Aug 4, 2020
1597d8f
[SPARK-30276][SQL] Support Filter expression allows simultaneous use …
beliefer Aug 4, 2020
005ef3a
[SPARK-32468][SS][TESTS][FOLLOWUP] Provide "default.api.timeout.ms" a…
HeartSaVioR Aug 4, 2020
7fec6e0
[SPARK-32524][SQL][TESTS] CachedBatchSerializerSuite should clean up …
dongjoon-hyun Aug 4, 2020
6d69068
[SPARK-32521][SQL] Bug-fix: WithFields Expression should not be foldable
fqaiser94 Aug 4, 2020
171b7d5
[SPARK-23431][CORE] Expose stage level peak executor metrics via REST…
imback82 Aug 4, 2020
7eb6f45
[SPARK-32499][SQL] Use `{}` in conversions maps and structs to strings
MaxGekk Aug 4, 2020
0660a05
[SPARK-32525][DOCS] The layout of monitoring.html is broken
sarutak Aug 4, 2020
15b7333
[SPARK-32507][DOCS][PYTHON] Add main page for PySpark documentation
HyukjinKwon Aug 5, 2020
b14a1e2
[SPARK-32402][SQL] Implement ALTER TABLE in JDBC Table Catalog
huaxingao Aug 5, 2020
3a437ed
[SPARK-32501][SQL] Convert null to "null" in structs, maps and arrays…
MaxGekk Aug 5, 2020
1b6f482
[SPARK-32492][SQL][FOLLOWUP][TEST-MAVEN] Fix jenkins maven jobs
yaooqinn Aug 5, 2020
4a0427c
[SPARK-32485][SQL][TEST] Fix endianness issues in tests in RecordBina…
mundaym Aug 5, 2020
42219af
[SPARK-32543][R] Remove arrow::as_tibble usage in SparkR
HyukjinKwon Aug 5, 2020
c1d17df
[SPARK-32529][CORE] Fix Historyserver log scan aborted by application…
yanxiaole Aug 5, 2020
375d348
[SPARK-31197][CORE] Shutdown executor once we are done decommissioning
holdenk Aug 5, 2020
7f275ee
[SPARK-32518][CORE] CoarseGrainedSchedulerBackend.maxNumConcurrentTas…
Ngone51 Aug 6, 2020
e93b8f0
[SPARK-32539][INFRA] Disallow `FileSystem.get(Configuration conf)` in…
gengliangwang Aug 6, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
56 changes: 45 additions & 11 deletions .github/workflows/master.yml
Original file line number Diff line number Diff line change
Expand Up @@ -154,15 +154,18 @@ jobs:
python3.8 -m pip install numpy pyarrow pandas scipy
python3.8 -m pip list
# SparkR
- name: Install R 3.6
uses: r-lib/actions/setup-r@v1
- name: Install R 4.0
if: contains(matrix.modules, 'sparkr')
with:
r-version: 3.6
run: |
sudo sh -c "echo 'deb https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/' >> /etc/apt/sources.list"
curl -sL "https://keyserver.ubuntu.com/pks/lookup?op=get&search=0xE298A3A825C0D65DFD57CBB651716619E084DAB9" | sudo apt-key add
sudo apt-get update
sudo apt-get install -y r-base r-base-dev libcurl4-openssl-dev
- name: Install R packages
if: contains(matrix.modules, 'sparkr')
run: |
sudo apt-get install -y libcurl4-openssl-dev
# qpdf is required to reduce the size of PDFs to make CRAN check pass. See SPARK-32497.
sudo apt-get install -y libcurl4-openssl-dev qpdf
sudo Rscript -e "install.packages(c('knitr', 'rmarkdown', 'testthat', 'devtools', 'e1071', 'survival', 'arrow', 'roxygen2'), repos='https://cloud.r-project.org/')"
# Show installed packages in R.
sudo Rscript -e 'pkg_list <- as.data.frame(installed.packages()[, c(1,3:4)]); pkg_list[is.na(pkg_list$Priority), 1:2, drop = FALSE]'
Expand Down Expand Up @@ -200,11 +203,15 @@ jobs:
architecture: x64
- name: Install Python linter dependencies
run: |
pip3 install flake8 sphinx numpy
- name: Install R 3.6
uses: r-lib/actions/setup-r@v1
with:
r-version: 3.6
# TODO(SPARK-32407): Sphinx 3.1+ does not correctly index nested classes.
# See also https://github.com/sphinx-doc/sphinx/issues/7551.
pip3 install flake8 'sphinx<3.1.0' numpy pydata_sphinx_theme
- name: Install R 4.0
run: |
sudo sh -c "echo 'deb https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/' >> /etc/apt/sources.list"
curl -sL "https://keyserver.ubuntu.com/pks/lookup?op=get&search=0xE298A3A825C0D65DFD57CBB651716619E084DAB9" | sudo apt-key add
sudo apt-get update
sudo apt-get install -y r-base r-base-dev libcurl4-openssl-dev
- name: Install R linter dependencies and SparkR
run: |
sudo apt-get install -y libcurl4-openssl-dev
Expand All @@ -218,7 +225,9 @@ jobs:
- name: Install dependencies for documentation generation
run: |
sudo apt-get install -y libcurl4-openssl-dev pandoc
pip install sphinx mkdocs numpy
# TODO(SPARK-32407): Sphinx 3.1+ does not correctly index nested classes.
# See also https://github.com/sphinx-doc/sphinx/issues/7551.
pip install 'sphinx<3.1.0' mkdocs numpy pydata_sphinx_theme
gem install jekyll jekyll-redirect-from rouge
sudo Rscript -e "install.packages(c('devtools', 'testthat', 'knitr', 'rmarkdown', 'roxygen2'), repos='https://cloud.r-project.org/')"
- name: Scala linter
Expand All @@ -237,3 +246,28 @@ jobs:
run: |
cd docs
jekyll build

java11:
name: Java 11 build
runs-on: ubuntu-latest
steps:
- name: Checkout Spark repository
uses: actions/checkout@v2
- name: Cache Maven local repository
uses: actions/cache@v2
with:
path: ~/.m2/repository
key: java11-maven-${{ hashFiles('**/pom.xml') }}
restore-keys: |
java11-maven-
- name: Install Java 11
uses: actions/setup-java@v1
with:
java-version: 11
- name: Build with Maven
run: |
export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g -Dorg.slf4j.simpleLogger.defaultLogLevel=WARN"
export MAVEN_CLI_OPTS="--no-transfer-progress"
mkdir -p ~/.m2
./build/mvn $MAVEN_CLI_OPTS -DskipTests -Pyarn -Pmesos -Pkubernetes -Phive -Phive-thriftserver -Phadoop-cloud -Djava.version=11 install
rm -rf ~/.m2/repository/org/apache/spark
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ python/lib/pyspark.zip
python/.eggs/
python/deps
python/docs/_site/
python/docs/source/reference/api/
python/test_coverage/coverage_data
python/test_coverage/htmlcov
python/pyspark/python
Expand Down
5 changes: 2 additions & 3 deletions LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -222,14 +222,13 @@ external/spark-ganglia-lgpl/src/main/java/com/codahale/metrics/ganglia/GangliaRe
Python Software Foundation License
----------------------------------

pyspark/heapq3.py
python/docs/_static/copybutton.js
python/docs/source/_static/copybutton.js

BSD 3-Clause
------------

python/lib/py4j-*-src.zip
python/pyspark/cloudpickle.py
python/pyspark/cloudpickle/*.py
python/pyspark/join.py
core/src/main/resources/org/apache/spark/ui/static/d3.min.js

Expand Down
6 changes: 0 additions & 6 deletions LICENSE-binary
Original file line number Diff line number Diff line change
Expand Up @@ -557,12 +557,6 @@ jakarta.ws.rs:jakarta.ws.rs-api https://github.com/eclipse-ee4j/jaxrs-api
org.glassfish.hk2.external:jakarta.inject


Python Software Foundation License
----------------------------------

pyspark/heapq3.py


Public Domain
-------------

Expand Down
2 changes: 1 addition & 1 deletion R/pkg/DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Suggests:
testthat,
e1071,
survival,
arrow (>= 0.15.1)
arrow (>= 1.0.0)
Collate:
'schema.R'
'generics.R'
Expand Down
9 changes: 2 additions & 7 deletions R/pkg/R/DataFrame.R
Original file line number Diff line number Diff line change
Expand Up @@ -1233,13 +1233,8 @@ setMethod("collect",
port = port, blocking = TRUE, open = "wb", timeout = connectionTimeout)
output <- tryCatch({
doServerAuth(conn, authSecret)
arrowTable <- arrow::read_arrow(readRaw(conn))
# Arrow drops `as_tibble` since 0.14.0, see ARROW-5190.
if (exists("as_tibble", envir = asNamespace("arrow"))) {
as.data.frame(arrow::as_tibble(arrowTable), stringsAsFactors = stringsAsFactors)
} else {
as.data.frame(arrowTable, stringsAsFactors = stringsAsFactors)
}
arrowTable <- arrow::read_ipc_stream(readRaw(conn))
as.data.frame(arrowTable, stringsAsFactors = stringsAsFactors)
}, finally = {
close(conn)
})
Expand Down
13 changes: 1 addition & 12 deletions R/pkg/R/deserialize.R
Original file line number Diff line number Diff line change
Expand Up @@ -233,24 +233,13 @@ readMultipleObjectsWithKeys <- function(inputCon) {

readDeserializeInArrow <- function(inputCon) {
if (requireNamespace("arrow", quietly = TRUE)) {
# Arrow drops `as_tibble` since 0.14.0, see ARROW-5190.
useAsTibble <- exists("as_tibble", envir = asNamespace("arrow"))


# Currently, there looks no way to read batch by batch by socket connection in R side,
# See ARROW-4512. Therefore, it reads the whole Arrow streaming-formatted binary at once
# for now.
dataLen <- readInt(inputCon)
arrowData <- readBin(inputCon, raw(), as.integer(dataLen), endian = "big")
batches <- arrow::RecordBatchStreamReader$create(arrowData)$batches()

if (useAsTibble) {
as_tibble <- get("as_tibble", envir = asNamespace("arrow"))
# Read all groupped batches. Tibble -> data.frame is cheap.
lapply(batches, function(batch) as.data.frame(as_tibble(batch)))
} else {
lapply(batches, function(batch) as.data.frame(batch))
}
lapply(batches, function(batch) as.data.frame(batch))
} else {
stop("'arrow' package should be installed.")
}
Expand Down
18 changes: 18 additions & 0 deletions R/pkg/tests/fulltests/test_sparkSQL_arrow.R
Original file line number Diff line number Diff line change
Expand Up @@ -312,4 +312,22 @@ test_that("Arrow optimization - unsupported types", {
})
})

test_that("SPARK-32478: gapply() Arrow optimization - error message for schema mismatch", {
skip_if_not_installed("arrow")
df <- createDataFrame(list(list(a = 1L, b = "a")))

conf <- callJMethod(sparkSession, "conf")
arrowEnabled <- sparkR.conf("spark.sql.execution.arrow.sparkr.enabled")[[1]]

callJMethod(conf, "set", "spark.sql.execution.arrow.sparkr.enabled", "true")
tryCatch({
expect_error(
count(gapply(df, "a", function(key, group) { group }, structType("a int, b int"))),
"expected IntegerType, IntegerType, got IntegerType, StringType")
},
finally = {
callJMethod(conf, "set", "spark.sql.execution.arrow.sparkr.enabled", arrowEnabled)
})
})

sparkR.session.stop()
4 changes: 2 additions & 2 deletions bin/find-spark-home
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,9 @@ elif [ ! -f "$FIND_SPARK_HOME_PYTHON_SCRIPT" ]; then
export SPARK_HOME="$(cd "$(dirname "$0")"/..; pwd)"
else
# We are pip installed, use the Python script to resolve a reasonable SPARK_HOME
# Default to standard python interpreter unless told otherwise
# Default to standard python3 interpreter unless told otherwise
if [[ -z "$PYSPARK_DRIVER_PYTHON" ]]; then
PYSPARK_DRIVER_PYTHON="${PYSPARK_PYTHON:-"python"}"
PYSPARK_DRIVER_PYTHON="${PYSPARK_PYTHON:-"python3"}"
fi
export SPARK_HOME=$($PYSPARK_DRIVER_PYTHON "$FIND_SPARK_HOME_PYTHON_SCRIPT")
fi
4 changes: 2 additions & 2 deletions bin/find-spark-home.cmd
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ rem
rem Path to Python script finding SPARK_HOME
set FIND_SPARK_HOME_PYTHON_SCRIPT=%~dp0find_spark_home.py

rem Default to standard python interpreter unless told otherwise
set PYTHON_RUNNER=python
rem Default to standard python3 interpreter unless told otherwise
set PYTHON_RUNNER=python3
rem If PYSPARK_DRIVER_PYTHON is set, it overwrites the python version
if not "x%PYSPARK_DRIVER_PYTHON%"=="x" (
set PYTHON_RUNNER=%PYSPARK_DRIVER_PYTHON%
Expand Down
56 changes: 28 additions & 28 deletions bin/load-spark-env.cmd
Original file line number Diff line number Diff line change
Expand Up @@ -21,42 +21,42 @@ rem This script loads spark-env.cmd if it exists, and ensures it is only loaded
rem spark-env.cmd is loaded from SPARK_CONF_DIR if set, or within the current directory's
rem conf\ subdirectory.

set SPARK_ENV_CMD=spark-env.cmd
if [%SPARK_ENV_LOADED%] == [] (
if not defined SPARK_ENV_LOADED (
set SPARK_ENV_LOADED=1

if [%SPARK_CONF_DIR%] == [] (
set SPARK_CONF_DIR=%~dp0..\conf
)

set SPARK_ENV_CMD=%SPARK_CONF_DIR%\%SPARK_ENV_CMD%
if exist %SPARK_ENV_CMD% (
call %SPARK_ENV_CMD%
)
call :LoadSparkEnv
)

rem Setting SPARK_SCALA_VERSION if not already set.

rem TODO: revisit for Scala 2.13 support
set SPARK_SCALA_VERSION=2.12
rem if [%SPARK_SCALA_VERSION%] == [] (
rem set SCALA_VERSION_1=2.12
rem set SCALA_VERSION_2=2.11
rem
rem set ASSEMBLY_DIR1=%SPARK_HOME%\assembly\target\scala-%SCALA_VERSION_1%
rem set ASSEMBLY_DIR2=%SPARK_HOME%\assembly\target\scala-%SCALA_VERSION_2%
rem set ENV_VARIABLE_DOC=https://spark.apache.org/docs/latest/configuration.html#environment-variables
rem if exist %ASSEMBLY_DIR2% if exist %ASSEMBLY_DIR1% (
rem echo "Presence of build for multiple Scala versions detected (%ASSEMBLY_DIR1% and %ASSEMBLY_DIR2%)."
rem echo "Remove one of them or, set SPARK_SCALA_VERSION=%SCALA_VERSION_1% in %SPARK_ENV_CMD%."
rem echo "Visit %ENV_VARIABLE_DOC% for more details about setting environment variables in spark-env.cmd."
rem echo "Either clean one of them or, set SPARK_SCALA_VERSION in spark-env.cmd."
rem exit 1
rem )
rem if exist %ASSEMBLY_DIR1% (
rem set SPARK_SCALA_VERSION=%SCALA_VERSION_1%
rem ) else (
rem set SPARK_SCALA_VERSION=%SCALA_VERSION_2%
rem )
rem )
set SCALA_VERSION_1=2.13
set SCALA_VERSION_2=2.12

set ASSEMBLY_DIR1=%SPARK_HOME%\assembly\target\scala-%SCALA_VERSION_1%
set ASSEMBLY_DIR2=%SPARK_HOME%\assembly\target\scala-%SCALA_VERSION_2%
set ENV_VARIABLE_DOC=https://spark.apache.org/docs/latest/configuration.html#environment-variables

if not defined SPARK_SCALA_VERSION (
if exist %ASSEMBLY_DIR2% if exist %ASSEMBLY_DIR1% (
echo Presence of build for multiple Scala versions detected ^(%ASSEMBLY_DIR1% and %ASSEMBLY_DIR2%^).
echo Remove one of them or, set SPARK_SCALA_VERSION=%SCALA_VERSION_1% in spark-env.cmd.
echo Visit %ENV_VARIABLE_DOC% for more details about setting environment variables in spark-env.cmd.
echo Either clean one of them or, set SPARK_SCALA_VERSION in spark-env.cmd.
exit 1
)
if exist %ASSEMBLY_DIR1% (
set SPARK_SCALA_VERSION=%SCALA_VERSION_1%
) else (
set SPARK_SCALA_VERSION=%SCALA_VERSION_2%
)
)
exit /b 0

:LoadSparkEnv
if exist "%SPARK_CONF_DIR%\spark-env.cmd" (
call "%SPARK_CONF_DIR%\spark-env.cmd"
)
42 changes: 20 additions & 22 deletions bin/load-spark-env.sh
Original file line number Diff line number Diff line change
Expand Up @@ -43,25 +43,23 @@ fi

# Setting SPARK_SCALA_VERSION if not already set.

# TODO: revisit for Scala 2.13 support
export SPARK_SCALA_VERSION=2.12
#if [ -z "$SPARK_SCALA_VERSION" ]; then
# SCALA_VERSION_1=2.12
# SCALA_VERSION_2=2.11
#
# ASSEMBLY_DIR_1="${SPARK_HOME}/assembly/target/scala-${SCALA_VERSION_1}"
# ASSEMBLY_DIR_2="${SPARK_HOME}/assembly/target/scala-${SCALA_VERSION_2}"
# ENV_VARIABLE_DOC="https://spark.apache.org/docs/latest/configuration.html#environment-variables"
# if [[ -d "$ASSEMBLY_DIR_1" && -d "$ASSEMBLY_DIR_2" ]]; then
# echo "Presence of build for multiple Scala versions detected ($ASSEMBLY_DIR_1 and $ASSEMBLY_DIR_2)." 1>&2
# echo "Remove one of them or, export SPARK_SCALA_VERSION=$SCALA_VERSION_1 in ${SPARK_ENV_SH}." 1>&2
# echo "Visit ${ENV_VARIABLE_DOC} for more details about setting environment variables in spark-env.sh." 1>&2
# exit 1
# fi
#
# if [[ -d "$ASSEMBLY_DIR_1" ]]; then
# export SPARK_SCALA_VERSION=${SCALA_VERSION_1}
# else
# export SPARK_SCALA_VERSION=${SCALA_VERSION_2}
# fi
#fi
if [ -z "$SPARK_SCALA_VERSION" ]; then
SCALA_VERSION_1=2.13
SCALA_VERSION_2=2.12

ASSEMBLY_DIR_1="${SPARK_HOME}/assembly/target/scala-${SCALA_VERSION_1}"
ASSEMBLY_DIR_2="${SPARK_HOME}/assembly/target/scala-${SCALA_VERSION_2}"
ENV_VARIABLE_DOC="https://spark.apache.org/docs/latest/configuration.html#environment-variables"
if [[ -d "$ASSEMBLY_DIR_1" && -d "$ASSEMBLY_DIR_2" ]]; then
echo "Presence of build for multiple Scala versions detected ($ASSEMBLY_DIR_1 and $ASSEMBLY_DIR_2)." 1>&2
echo "Remove one of them or, export SPARK_SCALA_VERSION=$SCALA_VERSION_1 in ${SPARK_ENV_SH}." 1>&2
echo "Visit ${ENV_VARIABLE_DOC} for more details about setting environment variables in spark-env.sh." 1>&2
exit 1
fi

if [[ -d "$ASSEMBLY_DIR_1" ]]; then
export SPARK_SCALA_VERSION=${SCALA_VERSION_1}
else
export SPARK_SCALA_VERSION=${SCALA_VERSION_2}
fi
fi
4 changes: 2 additions & 2 deletions bin/pyspark
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,9 @@ if [[ -n "$IPYTHON" || -n "$IPYTHON_OPTS" ]]; then
exit 1
fi

# Default to standard python interpreter unless told otherwise
# Default to standard python3 interpreter unless told otherwise
if [[ -z "$PYSPARK_PYTHON" ]]; then
PYSPARK_PYTHON=python
PYSPARK_PYTHON=python3
fi
if [[ -z "$PYSPARK_DRIVER_PYTHON" ]]; then
PYSPARK_DRIVER_PYTHON=$PYSPARK_PYTHON
Expand Down
Loading