Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
194 commits
Select commit Hold shift + click to select a range
b632e77
[SPARK-25317][CORE] Avoid perf regression in Murmur3 Hash on UTF8String
mgaido91 Sep 6, 2018
085f731
[SPARK-25268][GRAPHX] run Parallel Personalized PageRank throws seria…
shahidki31 Sep 6, 2018
f2d5022
[SPARK-25328][PYTHON] Add an example for having two columns as the gr…
HyukjinKwon Sep 6, 2018
3682d29
[SPARK-25072][PYSPARK] Forbid extra value for custom Row
Sep 6, 2018
a7cfe51
[SPARK-25108][SQL] Fix the show method to display the wide character …
xuejianbest Sep 6, 2018
ff832be
[SPARK-25208][SQL][FOLLOW-UP] Reduce code size.
ueshin Sep 7, 2018
24a3261
[SPARK-25330][BUILD][BRANCH-2.3] Revert Hadoop 2.7 to 2.7.3
wangyum Sep 7, 2018
3644c84
[SPARK-22357][CORE][FOLLOWUP] SparkContext.binaryFiles ignore minPart…
srowen Sep 7, 2018
f9b476c
[SPARK-25237][SQL] Remove updateBytesReadWithFileSize in FileScanRDD
Sep 7, 2018
872bad1
[SPARK-25267][SQL][TEST] Disable ConvertToLocalRelation in the test c…
dilipbiswal Sep 7, 2018
95a48b9
[SPARK-21786][SQL][FOLLOWUP] Add compressionCodec test for CTAS
fjh100456 Sep 7, 2018
80567fa
[MINOR][SS] Fix kafka-0-10-sql trivials
dongjinleekr Sep 7, 2018
904192a
[SPARK-25345][ML] Deprecate public APIs from ImageSchema
WeichenXu123 Sep 8, 2018
8f7d8a0
[SPARK-25375][SQL][TEST] Reenable qualified perm. function checks in …
dongjoon-hyun Sep 8, 2018
a00a160
Revert [SPARK-10399] [SPARK-23879] [SPARK-23762] [SPARK-25317]
gatorsmile Sep 9, 2018
6b7ea78
[MINOR][ML] Remove `BisectingKMeansModel.setDistanceMeasure` method
WeichenXu123 Sep 9, 2018
c1c1bda
[SPARK-25368][SQL] Incorrect predicate pushdown returns wrong result
wangyum Sep 9, 2018
0782dfa
[SPARK-25175][SQL] Field resolution should fail if there is ambiguity…
seancxmao Sep 10, 2018
c9ca359
[SPARK-25313][SQL][FOLLOW-UP] Fix InsertIntoHiveDirCommand output sch…
wangyum Sep 10, 2018
67bc7ef
[SPARK-24849][SPARK-24911][SQL][FOLLOW-UP] Converting a value of Stru…
gatorsmile Sep 10, 2018
5d98c31
[SPARK-25278][SQL] Avoid duplicated Exec nodes when the same logical …
mgaido91 Sep 10, 2018
ffd036a
[SPARK-23672][PYTHON] Document support for nested return types in sca…
holdenk Sep 10, 2018
fb4965a
[SPARK-25371][SQL] struct() should allow being called with 0 args
mgaido91 Sep 11, 2018
b7efca7
[SPARK-17916][SPARK-25241][SQL][FOLLOW-UP] Fix empty string being par…
mmolimar Sep 11, 2018
0b8bfbe
[SPARK-25389][SQL] INSERT OVERWRITE DIRECTORY STORED AS should preven…
dongjoon-hyun Sep 11, 2018
4414e02
[SPARK-25221][DEPLOY] Consistent trailing whitespace treatment of con…
gerashegalov Sep 11, 2018
16127e8
[SPARK-24889][CORE] Update block info when unpersist rdds
viirya Sep 11, 2018
99b37a9
[SPARK-25398] Minor bugs from comparing unrelated types
srowen Sep 11, 2018
3a6ef8b
Revert "[SPARK-23820][CORE] Enable use of long form of callsite in logs"
srowen Sep 11, 2018
0dbf145
[SPARK-25399][SS] Continuous processing state should not affect micro…
mukulmurthy Sep 11, 2018
40e4db0
[SPARK-25402][SQL] Null handling in BooleanSimplification
gatorsmile Sep 12, 2018
071babb
[SPARK-25352][SQL] Perform ordered global limit when limit number is …
viirya Sep 12, 2018
4c1428f
[SPARK-25363][SQL] Fix schema pruning in where clause by ignoring unn…
viirya Sep 12, 2018
15d2e9d
[SPARK-24882][SQL] Revert [] improve data source v2 API from branch 2.4
cloud-fan Sep 12, 2018
71f7013
[SPARK-23820][CORE] Enable use of long form of callsite in logs
michaelmior Sep 13, 2018
776dc42
[SPARK-25387][SQL] Fix for NPE caused by bad CSV input
MaxGekk Sep 13, 2018
6f4d647
[SPARK-25357][SQL] Add metadata to SparkPlanInfo to dump more informa…
LantaoJin Sep 13, 2018
ae5c7bb
[SPARK-25238][PYTHON] lint-python: Fix W605 warnings for pycodestyle 2.4
srowen Sep 13, 2018
abb5196
[SPARK-25295][K8S] Fix executor names collision
Sep 13, 2018
e7f511a
[SPARK-25352][SQL][FOLLOWUP] Add helper method and address style issue
viirya Sep 13, 2018
cc19f42
[SPARK-25170][DOC] Add list and short description of Spark Executor T…
LucaCanali Sep 13, 2018
35a84ba
[SPARK-25406][SQL] For ParquetSchemaPruningSuite.scala, move calls to…
mallman Sep 13, 2018
9273be0
[SPARK-25400][CORE][TEST] Increase test timeouts
squito Sep 13, 2018
1220ab8
Preparing Spark release v2.4.0-rc1
cloud-fan Sep 14, 2018
8cdf7f4
Preparing development version 2.4.1-SNAPSHOT
cloud-fan Sep 14, 2018
59054fa
[SPARK-25431][SQL][EXAMPLES] Fix function examples and unify the form…
ueshin Sep 14, 2018
d3f5475
[SPARK-25238][PYTHON] lint-python: Upgrade pycodestyle to v2.4.0
Sep 15, 2018
ae2ca0e
Revert "[SPARK-25431][SQL][EXAMPLES] Fix function examples and unify …
ueshin Sep 15, 2018
b40e5fe
[SPARK-25438][SQL][TEST] Fix FilterPushdownBenchmark to use the same …
dongjoon-hyun Sep 16, 2018
b839721
[SPARK-25439][TESTS][SQL] Fixes TPCHQuerySuite datatype of customer.c…
npoggi Sep 16, 2018
60af706
[SPARK-24418][FOLLOWUP][DOC] Update docs to show Scala 2.11.12
dongjoon-hyun Sep 16, 2018
1cb1e43
[MINOR][DOCS] Axe deprecated doc refs
Sep 16, 2018
fb1539a
[SPARK-22713][CORE][TEST][FOLLOWUP] Fix flaky ExternalAppendOnlyMapSu…
dongjoon-hyun Sep 17, 2018
e368efc
[SPARK-24685][BUILD][FOLLOWUP] Fix the nonexist profile name in relea…
jerryshao Sep 17, 2018
43c9b10
[SPARK-23425][SQL][FOLLOWUP] Support wildcards in HDFS path for loadt…
sujith71955 Sep 17, 2018
56f7068
[SPARK-25427][SQL][TEST] Add BloomFilter creation test cases
dongjoon-hyun Sep 17, 2018
d05596e
[SPARK-25431][SQL][EXAMPLES] Fix function examples and the example re…
ueshin Sep 17, 2018
08f7b14
[SPARK-24654][BUILD][FOLLOWUP] Update, fix LICENSE and NOTICE, and sp…
srowen Sep 17, 2018
963af13
[PYSPARK] Updates to pyspark broadcast
squito Aug 14, 2018
80e317b
[PYSPARK][SQL] Updates to RowQueue
squito Sep 6, 2018
7beb341
[CORE] Updates to remote cache reads
squito Aug 22, 2018
c403751
[SPARK-25443][BUILD] fix issues when building docs with release scrip…
cloud-fan Sep 18, 2018
ffd448b
[SPARK-24151][SQL] Case insensitive resolution of CURRENT_DATE and CU…
jamesthomp Sep 18, 2018
cc3fbea
[SPARK-25445][BUILD] the release script should be able to publish a s…
cloud-fan Sep 18, 2018
8a2992e
[SPARK-19550][DOC][FOLLOW-UP] Update tuning.md to use JDK8
wangyum Sep 18, 2018
67f2cb6
[SPARK-25291][K8S] Fixing Flakiness of Executor Pod tests
ifilonenko Sep 18, 2018
76514a0
[SPARK-25456][SQL][TEST] Fix PythonForeachWriterSuite
squito Sep 18, 2018
ba8560a
[SPARK-23200] Reset Kubernetes-specific config on Checkpoint restore
ssaavedra Sep 19, 2018
00ede12
[SPARK-23173][SQL] rename spark.sql.fromJsonForceNullableSchema
rxin Sep 19, 2018
f11f445
[SPARK-24626] Add statistics prefix to parallelFileListingInStatsComp…
rxin Sep 19, 2018
538ae62
[SPARK-25445][BUILD][FOLLOWUP] Resolve issues in release-build.sh for…
gengliangwang Sep 19, 2018
9fefb47
Revert "[SPARK-23173][SQL] rename spark.sql.fromJsonForceNullableSchema"
dongjoon-hyun Sep 19, 2018
83a75a8
[SPARK-22666][ML][FOLLOW-UP] Improve testcase to tolerate different s…
WeichenXu123 Sep 19, 2018
9031c78
[SPARK-25021][K8S][BACKPORT] Add spark.executor.pyspark.memory limit …
ifilonenko Sep 19, 2018
a9a8d3a
[SPARK-25425][SQL][BACKPORT-2.4] Extra options should override sessio…
MaxGekk Sep 19, 2018
99ae693
[SPARK-25471][PYTHON][TEST] Fix pyspark-sql test error when using Pyt…
BryanCutler Sep 20, 2018
535bf1c
[SPARK-24157][SS][FOLLOWUP] Rename to spark.sql.streaming.noDataMicro…
rxin Sep 20, 2018
06efed2
[SPARK-24341][FOLLOWUP][DOCS] Add migration note for IN subqueries be…
mgaido91 Sep 20, 2018
dfcff38
[SPARK-4502][SQL] Rename to spark.sql.optimizer.nestedSchemaPruning.e…
rxin Sep 20, 2018
e07042a
[MINOR][PYTHON][TEST] Use collect() instead of show() to make the out…
HyukjinKwon Sep 20, 2018
b3bdfd7
Revert [SPARK-19355][SPARK-25352]
viirya Sep 20, 2018
78dd1d8
[SPARK-25417][SQL] ArrayContains function may return incorrect result…
dilipbiswal Sep 20, 2018
fc03672
[MINOR][PYTHON] Use a helper in `PythonUtils` instead of direct acces…
HyukjinKwon Sep 20, 2018
c67c597
[SPARK-25450][SQL] PushProjectThroughUnion rule uses the same exprId …
maryannxue Sep 20, 2018
43c62e7
[SPARK-24918][CORE] Executor Plugin API
NiharS Sep 20, 2018
51f3659
[SPARK-24777][SQL] Add write benchmark for AVRO
gengliangwang Sep 21, 2018
5d74449
Revert "[SPARK-23715][SQL] the input of to/from_utc_timestamp can not…
gatorsmile Sep 21, 2018
aff6aed
[SPARK-25384][SQL] Clarify fromJsonForceNullableSchema will be remove…
rxin Sep 21, 2018
e425462
[SPARK-23549][SQL] Rename config spark.sql.legacy.compareDateTimestam…
rxin Sep 21, 2018
604828e
[SPARK-25469][SQL] Eval methods of Concat, Reverse and ElementAt shou…
mn-mikke Sep 21, 2018
ce66361
[SPARK-19724][SQL] allowCreatingManagedTableUsingNonemptyLocation sho…
rxin Sep 21, 2018
138a631
[SPARK-25321][ML] Revert SPARK-14681 to avoid API breaking change
WeichenXu123 Sep 21, 2018
1303eb5
[SPARK-25321][ML] Fix local LDA model constructor
WeichenXu123 Sep 21, 2018
c64e750
[MINOR][PYSPARK] Always Close the tempFile in _serialize_to_jvm
gatorsmile Sep 23, 2018
36e7c8f
[SPARKR] Match pyspark features in SparkR communication protocol
HyukjinKwon Sep 24, 2018
13bc58d
[SPARK-21318][SQL] Improve exception message thrown by `lookupFunction`
stanzhai Sep 24, 2018
51d5378
[SPARK-25416][SQL] ArrayPosition function may return incorrect result…
dilipbiswal Sep 24, 2018
ec38428
[SPARK-25460][BRANCH-2.4][SS] DataSourceV2: SS sources do not respect…
HyukjinKwon Sep 24, 2018
ffc081c
[SPARK-25502][CORE][WEBUI] Empty Page when page number exceeds the re…
shahidki31 Sep 24, 2018
e4c03e8
[SPARK-25503][CORE][WEBUI] Total task message in stage page is ambiguous
shahidki31 Sep 25, 2018
4ca4ef7
[SPARK-25519][SQL] ArrayRemove function may return incorrect result w…
dilipbiswal Sep 25, 2018
a709718
[SPARK-23907][SQL] Revert regr_* functions entirely
rxin Sep 25, 2018
544f86a
[SPARK-25495][SS] FetchedData.reset should reset all fields
zsxwing Sep 25, 2018
f91247f
[SPARK-25422][CORE] Don't memory map blocks streamed to disk.
squito Sep 26, 2018
3f20305
[SPARK-24324][PYTHON][FOLLOW-UP] Rename the Conf to spark.sql.legacy.…
gatorsmile Sep 26, 2018
d44b863
[SPARK-20937][DOCS] Describe spark.sql.parquet.writeLegacyFormat prop…
seancxmao Sep 26, 2018
9969827
[SPARK-25509][CORE] Windows doesn't support POSIX permissions
Sep 26, 2018
dc60476
[SPARK-25318] Add exception handling when wrapping the input stream d…
Sep 26, 2018
8d17200
[SPARK-24519][CORE] Compute SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_COMPRESS …
rxin Sep 26, 2018
2ff91f2
[SPARK-25454][SQL] add a new config for picking minimum precision for…
cloud-fan Sep 27, 2018
7656358
[SPARK-25540][SQL][PYSPARK] Make HiveContext in PySpark behave as the…
ueshin Sep 27, 2018
f12769e
[SPARK-25536][CORE] metric value for METRIC_OUTPUT_RECORDS_WRITTEN is…
shahidki31 Sep 27, 2018
01c000b
Revert "[SPARK-25540][SQL][PYSPARK] Make HiveContext in PySpark behav…
HyukjinKwon Sep 27, 2018
0cf4c5b
[SPARK-25468][WEBUI] Highlight current page index in the spark UI
Sep 27, 2018
0b4e581
[SPARK-23715][SQL][DOC] improve document for from/to_utc_timestamp
cloud-fan Sep 27, 2018
53eb858
[SPARK-25314][SQL] Fix Python UDF accessing attributes from both side…
xuanyuanking Sep 27, 2018
3c78ea2
[SPARK-25522][SQL] Improve type promotion for input arguments of elem…
dilipbiswal Sep 27, 2018
42f25f3
Preparing Spark release v2.4.0-rc2
cloud-fan Sep 27, 2018
659ecb5
Preparing development version 2.4.1-SNAPSHOT
cloud-fan Sep 27, 2018
0256f8a
[SPARK-25546][CORE] Don't cache value of EVENT_LOG_CALLSITE_LONG_FORM.
Sep 27, 2018
a43a082
[SPARK-25533][CORE][WEBUI] AppSummary should hold the information abo…
shahidki31 Sep 26, 2018
b2a1e2f
[SPARK-25505][SQL] The output order of grouping columns in Pivot is d…
maryannxue Sep 28, 2018
81391c2
[SPARK-23285][DOC][FOLLOWUP] Fix missing markup tag
dongjoon-hyun Sep 28, 2018
7614313
[SPARK-25542][CORE][TEST] Move flaky test in OpenHashMapSuite to Open…
viirya Sep 28, 2018
ec2c17a
[SPARK-25570][SQL][TEST] Replace 2.3.1 with 2.3.2 in HiveExternalCata…
dongjoon-hyun Sep 29, 2018
a14306b
[SPARK-25262][DOC][FOLLOWUP] Fix link tags in html table
viirya Sep 29, 2018
fef3027
[SPARK-25572][SPARKR] test only if not cran
felixcheung Sep 29, 2018
6f510c6
[SPARK-25568][CORE] Continue to update the remaining accumulators whe…
zsxwing Sep 30, 2018
8e6fb47
[CORE][MINOR] Fix obvious error and compiling for Scala 2.12.7
da-liii Sep 30, 2018
c886f05
[SPARK-25543][K8S] Print debug message iff execIdsRemovedInThisRound …
ScrapCodes Sep 30, 2018
7b1094b
[SPARK-25505][SQL][FOLLOWUP] Fix for attributes cosmetically differen…
mgaido91 Oct 1, 2018
82990e5
[SPARK-25453][SQL][TEST][.FFFFFFFFF] OracleIntegrationSuite IllegalAr…
seancxmao Oct 1, 2018
426c2bd
[SPARK-23401][PYTHON][TESTS] Add more data types for PandasUDFTests
alex7c4 Oct 1, 2018
ad7b3f6
[SPARK-25578][BUILD] Update to Scala 2.12.7
srowen Oct 2, 2018
ea4068a
[SPARK-25583][DOC] Add history-server related configuration in the do…
shahidki31 Oct 2, 2018
443d12d
[SPARK-25538][SQL] Zero-out all bytes when writing decimal
mgaido91 Oct 3, 2018
0763b75
[SPARK-25601][PYTHON] Register Grouped aggregate UDF Vectorized UDFs …
HyukjinKwon Oct 4, 2018
c9bb83a
[SPARK-25602][SQL] SparkPlan.getByteArrayRdd should not consume the i…
cloud-fan Oct 4, 2018
2c700ee
[SPARK-25521][SQL] Job id showing null in the logs when insert into c…
sujith71955 Oct 5, 2018
0a70afd
[SPARK-25644][SS] Fix java foreachBatch in DataStreamWriter
zsxwing Oct 5, 2018
a2991d2
[SPARK-25646][K8S] Fix docker-image-tool.sh on dev build.
Oct 6, 2018
48e2e6f
[SPARK-25644][SS][FOLLOWUP][BUILD] Fix Scala 2.12 build error due to …
dongjoon-hyun Oct 6, 2018
c8b9409
[SPARK-25671] Build external/spark-ganglia-lgpl in Jenkins Test
gatorsmile Oct 6, 2018
4214ddd
[SPARK-25673][BUILD] Remove Travis CI which enables Java lint check
HyukjinKwon Oct 8, 2018
692ddb3
[SPARK-25591][PYSPARK][SQL] Avoid overwriting deserialized accumulator
viirya Oct 8, 2018
193ce77
[SPARK-25677][DOC] spark.io.compression.codec = org.apache.spark.io.Z…
shivusondur Oct 8, 2018
4baa4d4
[SPARK-25639][DOCS] Added docs for foreachBatch, python foreach and m…
tdas Oct 8, 2018
404c840
[SPARK-25669][SQL] Check CSV header only when it exists
MaxGekk Oct 9, 2018
8e4a99b
Preparing Spark release v2.4.0-rc3
cloud-fan Oct 10, 2018
71b8739
Preparing development version 2.4.1-SNAPSHOT
cloud-fan Oct 10, 2018
cd40655
[SPARK-25636][CORE] spark-submit cuts off the failure reason when the…
Oct 10, 2018
e80ab13
[SPARK-25674][SQL] If the records are incremented by more than 1 at a…
10110346 Oct 11, 2018
1961f8e
[SPARK-25690][SQL] Analyzer rule HandleNullInputsForUDF does not stab…
maryannxue Oct 12, 2018
3dba5d4
[SPARK-25708][SQL] HAVING without GROUP BY means global aggregate
cloud-fan Oct 12, 2018
bb211cf
[SPARK-25697][CORE] When zstd compression enabled, InProgress applica…
shahidki31 Oct 12, 2018
1a33544
[SPARK-25660][SQL] Fix for the backward slash as CSV fields delimiter
MaxGekk Oct 12, 2018
0f58b98
[STREAMING][DOC] Fix typo & formatting for JavaDoc
mastloui-msft Oct 12, 2018
5554a33
[SPARK-25714] Fix Null Handling in the Optimizer rule BooleanSimplifi…
gatorsmile Oct 13, 2018
765cbca
[MINOR] Fix code comment in BooleanSimplification.
gatorsmile Oct 13, 2018
6634819
[SPARK-25718][SQL] Detect recursive reference in Avro schema and thro…
gengliangwang Oct 13, 2018
c4efcf1
[SPARK-25714][SQL][FOLLOWUP] improve the comment inside BooleanSimpli…
cloud-fan Oct 13, 2018
883ca3f
[SPARK-25726][SQL][TEST] Fix flaky test in SaveIntoDataSourceCommandS…
dongjoon-hyun Oct 14, 2018
3e776d7
[SPARK-25727][SQL] Add outputOrdering to otherCopyArgs in InMemoryRel…
gatorsmile Oct 14, 2018
b6e4aca
[SPARK-25700][SQL][BRANCH-2.4] Partially revert append mode support i…
HyukjinKwon Oct 15, 2018
d64b355
[SPARK-25738][SQL] Fix LOAD DATA INPATH for hdfs port
squito Oct 16, 2018
8bc7ab0
[SPARK-25674][FOLLOW-UP] Update the stats for each ColumnarBatch
gatorsmile Oct 16, 2018
77156f8
[SPARK-25736][SQL][TEST] add tests to verify the behavior of multi-co…
cloud-fan Oct 16, 2018
144cb94
[SPARK-25579][SQL] Use quoted attribute names if needed in pushed ORC…
dongjoon-hyun Oct 16, 2018
3591bd2
[SQL][CATALYST][MINOR] update some error comments
Oct 17, 2018
362103b
[SPARK-25754][DOC] Change CDN for MathJax
gengliangwang Oct 17, 2018
b698bd4
[SPARK-21402][SQL] Fix java array of structs deserialization
vofque Oct 17, 2018
ac9a6f0
[SPARK-25741][WEBUI] Long URLs are not rendered properly in web UI
gengliangwang Oct 17, 2018
71a6a9c
[SPARK-25758][ML] Deprecate computeCost on BisectingKMeans
mgaido91 Oct 18, 2018
7153551
[SPARK-24499][SQL][DOC] Split the page of sql-programming-guide.html …
xuanyuanking Oct 18, 2018
fd5b247
[SPARK-24499][DOC][FOLLOW-UP] Split the page of sql-programming-guide…
gatorsmile Oct 18, 2018
36307b1
[SPARK-25764][ML][EXAMPLES] Update BisectingKMeans example to use Clu…
mgaido91 Oct 19, 2018
9ed2e42
[MINOR][DOC] Spacing items in migration guide for readability and con…
HyukjinKwon Oct 19, 2018
df60d9f
[SPARK-25044][FOLLOW-UP] Change ScalaUDF constructor signature
maryannxue Oct 19, 2018
6a06b8c
[SPARK-25768][SQL] fix constant argument expecting UDAFs
peter-toth Oct 19, 2018
8926c4a
fix security issue of zinc
cloud-fan Oct 19, 2018
1ff8dd4
Preparing Spark release v2.4.0-rc4
cloud-fan Oct 19, 2018
9c0c6d4
Preparing development version 2.4.1-SNAPSHOT
cloud-fan Oct 19, 2018
1001d23
[SPARK-25704][CORE] Allocate a bit less than Int.MaxValue
squito Oct 19, 2018
432697c
Revert "[SPARK-25758][ML] Deprecate computeCost on BisectingKMeans"
gatorsmile Oct 19, 2018
e3a60b0
Revert "[SPARK-25764][ML][EXAMPLES] Update BisectingKMeans example to…
cloud-fan Oct 20, 2018
d6a02c5
[SPARK-24499][SQL][DOC][FOLLOWUP] Fix some broken links
dilipbiswal Oct 20, 2018
869242c
[MINOR][DOC] Update the building doc to use Maven 3.5.4 and Java 8 only
dongjoon-hyun Oct 20, 2018
0239277
[DOC][MINOR] Fix minor error in the code of graphx guide
WeichenXu123 Oct 20, 2018
c21d7e1
fix security issue of zinc(simplier version)
cloud-fan Oct 19, 2018
e69e2bf
Preparing Spark release v2.4.0-rc4
cloud-fan Oct 22, 2018
f33d888
Preparing development version 2.4.1-SNAPSHOT
cloud-fan Oct 22, 2018
b9b594a
[SPARK-25795][R][EXAMPLE] Fix CSV SparkR SQL Example
dongjoon-hyun Oct 22, 2018
4099565
[SPARK-24499][SQL][DOC][FOLLOW-UP] Fix spelling in doc
kiszk Oct 23, 2018
d5e6948
[SPARK-25805][SQL][TEST] Fix test for SPARK-25159
squito Oct 23, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
50 changes: 0 additions & 50 deletions .travis.yml

This file was deleted.

2 changes: 1 addition & 1 deletion R/pkg/DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: SparkR
Type: Package
Version: 2.4.0
Version: 2.4.1
Title: R Frontend for Apache Spark
Description: Provides an R Frontend for Apache Spark.
Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
Expand Down
1 change: 0 additions & 1 deletion R/pkg/R/DataFrame.R
Original file line number Diff line number Diff line change
Expand Up @@ -503,7 +503,6 @@ setMethod("createOrReplaceTempView",
#' @param x A SparkDataFrame
#' @param tableName A character vector containing the name of the table
#'
#' @family SparkDataFrame functions
#' @seealso \link{createOrReplaceTempView}
#' @rdname registerTempTable-deprecated
#' @name registerTempTable
Expand Down
1 change: 0 additions & 1 deletion R/pkg/R/catalog.R
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,6 @@ createExternalTable <- function(x, ...) {
#' @param ... additional named parameters as options for the data source.
#' @return A SparkDataFrame.
#' @rdname createTable
#' @seealso \link{createExternalTable}
#' @examples
#'\dontrun{
#' sparkR.session()
Expand Down
43 changes: 31 additions & 12 deletions R/pkg/R/context.R
Original file line number Diff line number Diff line change
Expand Up @@ -167,18 +167,30 @@ parallelize <- function(sc, coll, numSlices = 1) {
# 2-tuples of raws
serializedSlices <- lapply(slices, serialize, connection = NULL)

# The PRC backend cannot handle arguments larger than 2GB (INT_MAX)
# The RPC backend cannot handle arguments larger than 2GB (INT_MAX)
# If serialized data is safely less than that threshold we send it over the PRC channel.
# Otherwise, we write it to a file and send the file name
if (objectSize < sizeLimit) {
jrdd <- callJStatic("org.apache.spark.api.r.RRDD", "createRDDFromArray", sc, serializedSlices)
} else {
fileName <- writeToTempFile(serializedSlices)
jrdd <- tryCatch(callJStatic(
"org.apache.spark.api.r.RRDD", "createRDDFromFile", sc, fileName, as.integer(numSlices)),
finally = {
file.remove(fileName)
})
if (callJStatic("org.apache.spark.api.r.RUtils", "getEncryptionEnabled", sc)) {
# the length of slices here is the parallelism to use in the jvm's sc.parallelize()
parallelism <- as.integer(numSlices)
jserver <- newJObject("org.apache.spark.api.r.RParallelizeServer", sc, parallelism)
authSecret <- callJMethod(jserver, "secret")
port <- callJMethod(jserver, "port")
conn <- socketConnection(port = port, blocking = TRUE, open = "wb", timeout = 1500)
doServerAuth(conn, authSecret)
writeToConnection(serializedSlices, conn)
jrdd <- callJMethod(jserver, "getResult")
} else {
fileName <- writeToTempFile(serializedSlices)
jrdd <- tryCatch(callJStatic(
"org.apache.spark.api.r.RRDD", "createRDDFromFile", sc, fileName, as.integer(numSlices)),
finally = {
file.remove(fileName)
})
}
}

RDD(jrdd, "byte")
Expand All @@ -194,14 +206,21 @@ getMaxAllocationLimit <- function(sc) {
))
}

writeToConnection <- function(serializedSlices, conn) {
tryCatch({
for (slice in serializedSlices) {
writeBin(as.integer(length(slice)), conn, endian = "big")
writeBin(slice, conn, endian = "big")
}
}, finally = {
close(conn)
})
}

writeToTempFile <- function(serializedSlices) {
fileName <- tempfile()
conn <- file(fileName, "wb")
for (slice in serializedSlices) {
writeBin(as.integer(length(slice)), conn, endian = "big")
writeBin(slice, conn, endian = "big")
}
close(conn)
writeToConnection(serializedSlices, conn)
fileName
}

Expand Down
26 changes: 20 additions & 6 deletions R/pkg/R/functions.R
Original file line number Diff line number Diff line change
Expand Up @@ -2203,9 +2203,16 @@ setMethod("from_json", signature(x = "Column", schema = "characterOrstructType")
})

#' @details
#' \code{from_utc_timestamp}: Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a
#' time in UTC, and renders that time as a timestamp in the given time zone. For example, 'GMT+1'
#' would yield '2017-07-14 03:40:00.0'.
#' \code{from_utc_timestamp}: This is a common function for databases supporting TIMESTAMP WITHOUT
#' TIMEZONE. This function takes a timestamp which is timezone-agnostic, and interprets it as a
#' timestamp in UTC, and renders that timestamp as a timestamp in the given time zone.
#' However, timestamp in Spark represents number of microseconds from the Unix epoch, which is not
#' timezone-agnostic. So in Spark this function just shift the timestamp value from UTC timezone to
#' the given timezone.
#' This function may return confusing result if the input is a string with timezone, e.g.
#' (\code{2018-03-13T06:18:23+00:00}). The reason is that, Spark firstly cast the string to
#' timestamp according to the timezone in the string, and finally display the result by converting
#' the timestamp to string according to the session local timezone.
#'
#' @rdname column_datetime_diff_functions
#'
Expand Down Expand Up @@ -2261,9 +2268,16 @@ setMethod("next_day", signature(y = "Column", x = "character"),
})

#' @details
#' \code{to_utc_timestamp}: Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a
#' time in the given time zone, and renders that time as a timestamp in UTC. For example, 'GMT+1'
#' would yield '2017-07-14 01:40:00.0'.
#' \code{to_utc_timestamp}: This is a common function for databases supporting TIMESTAMP WITHOUT
#' TIMEZONE. This function takes a timestamp which is timezone-agnostic, and interprets it as a
#' timestamp in the given timezone, and renders that timestamp as a timestamp in UTC.
#' However, timestamp in Spark represents number of microseconds from the Unix epoch, which is not
#' timezone-agnostic. So in Spark this function just shift the timestamp value from the given
#' timezone to UTC timezone.
#' This function may return confusing result if the input is a string with timezone, e.g.
#' (\code{2018-03-13T06:18:23+00:00}). The reason is that, Spark firstly cast the string to
#' timestamp according to the timezone in the string, and finally display the result by converting
#' the timestamp to string according to the session local timezone.
#'
#' @rdname column_datetime_diff_functions
#' @aliases to_utc_timestamp to_utc_timestamp,Column,character-method
Expand Down
32 changes: 32 additions & 0 deletions R/pkg/tests/fulltests/test_Serde.R
Original file line number Diff line number Diff line change
Expand Up @@ -124,3 +124,35 @@ test_that("SerDe of list of lists", {
})

sparkR.session.stop()

# Note that this test should be at the end of tests since the configruations used here are not
# specific to sessions, and the Spark context is restarted.
test_that("createDataFrame large objects", {
for (encryptionEnabled in list("true", "false")) {
# To simulate a large object scenario, we set spark.r.maxAllocationLimit to a smaller value
conf <- list(spark.r.maxAllocationLimit = "100",
spark.io.encryption.enabled = encryptionEnabled)

suppressWarnings(sparkR.session(master = sparkRTestMaster,
sparkConfig = conf,
enableHiveSupport = FALSE))

sc <- getSparkContext()
actual <- callJStatic("org.apache.spark.api.r.RUtils", "getEncryptionEnabled", sc)
expected <- as.logical(encryptionEnabled)
expect_equal(actual, expected)

tryCatch({
# suppress warnings from dot in the field names. See also SPARK-21536.
df <- suppressWarnings(createDataFrame(iris, numPartitions = 3))
expect_equal(getNumPartitions(df), 3)
expect_equal(dim(df), dim(iris))

df <- createDataFrame(cars, numPartitions = 3)
expect_equal(collect(df), cars)
},
finally = {
sparkR.stop()
})
}
})
12 changes: 0 additions & 12 deletions R/pkg/tests/fulltests/test_sparkSQL.R
Original file line number Diff line number Diff line change
Expand Up @@ -316,18 +316,6 @@ test_that("create DataFrame from RDD", {
unsetHiveContext()
})

test_that("createDataFrame uses files for large objects", {
# To simulate a large file scenario, we set spark.r.maxAllocationLimit to a smaller value
conf <- callJMethod(sparkSession, "conf")
callJMethod(conf, "set", "spark.r.maxAllocationLimit", "100")
df <- suppressWarnings(createDataFrame(iris, numPartitions = 3))
expect_equal(getNumPartitions(df), 3)

# Resetting the conf back to default value
callJMethod(conf, "set", "spark.r.maxAllocationLimit", toString(.Machine$integer.max / 10))
expect_equal(dim(df), dim(iris))
})

test_that("read/write csv as DataFrame", {
if (windows_with_hadoop()) {
csvPath <- tempfile(pattern = "sparkr-test", fileext = ".csv")
Expand Down
83 changes: 44 additions & 39 deletions R/pkg/tests/run-all.R
Original file line number Diff line number Diff line change
Expand Up @@ -18,50 +18,55 @@
library(testthat)
library(SparkR)

# Turn all warnings into errors
options("warn" = 2)
# SPARK-25572
if (identical(Sys.getenv("NOT_CRAN"), "true")) {

if (.Platform$OS.type == "windows") {
Sys.setenv(TZ = "GMT")
}
# Turn all warnings into errors
options("warn" = 2)

# Setup global test environment
# Install Spark first to set SPARK_HOME
if (.Platform$OS.type == "windows") {
Sys.setenv(TZ = "GMT")
}

# NOTE(shivaram): We set overwrite to handle any old tar.gz files or directories left behind on
# CRAN machines. For Jenkins we should already have SPARK_HOME set.
install.spark(overwrite = TRUE)
# Setup global test environment
# Install Spark first to set SPARK_HOME

sparkRDir <- file.path(Sys.getenv("SPARK_HOME"), "R")
sparkRWhitelistSQLDirs <- c("spark-warehouse", "metastore_db")
invisible(lapply(sparkRWhitelistSQLDirs,
function(x) { unlink(file.path(sparkRDir, x), recursive = TRUE, force = TRUE)}))
sparkRFilesBefore <- list.files(path = sparkRDir, all.files = TRUE)
# NOTE(shivaram): We set overwrite to handle any old tar.gz files or directories left behind on
# CRAN machines. For Jenkins we should already have SPARK_HOME set.
install.spark(overwrite = TRUE)

sparkRTestMaster <- "local[1]"
sparkRTestConfig <- list()
if (identical(Sys.getenv("NOT_CRAN"), "true")) {
sparkRTestMaster <- ""
} else {
# Disable hsperfdata on CRAN
old_java_opt <- Sys.getenv("_JAVA_OPTIONS")
Sys.setenv("_JAVA_OPTIONS" = paste("-XX:-UsePerfData", old_java_opt))
tmpDir <- tempdir()
tmpArg <- paste0("-Djava.io.tmpdir=", tmpDir)
sparkRTestConfig <- list(spark.driver.extraJavaOptions = tmpArg,
spark.executor.extraJavaOptions = tmpArg)
}
sparkRDir <- file.path(Sys.getenv("SPARK_HOME"), "R")
sparkRWhitelistSQLDirs <- c("spark-warehouse", "metastore_db")
invisible(lapply(sparkRWhitelistSQLDirs,
function(x) { unlink(file.path(sparkRDir, x), recursive = TRUE, force = TRUE)}))
sparkRFilesBefore <- list.files(path = sparkRDir, all.files = TRUE)

test_package("SparkR")
sparkRTestMaster <- "local[1]"
sparkRTestConfig <- list()
if (identical(Sys.getenv("NOT_CRAN"), "true")) {
sparkRTestMaster <- ""
} else {
# Disable hsperfdata on CRAN
old_java_opt <- Sys.getenv("_JAVA_OPTIONS")
Sys.setenv("_JAVA_OPTIONS" = paste("-XX:-UsePerfData", old_java_opt))
tmpDir <- tempdir()
tmpArg <- paste0("-Djava.io.tmpdir=", tmpDir)
sparkRTestConfig <- list(spark.driver.extraJavaOptions = tmpArg,
spark.executor.extraJavaOptions = tmpArg)
}

if (identical(Sys.getenv("NOT_CRAN"), "true")) {
# set random seed for predictable results. mostly for base's sample() in tree and classification
set.seed(42)
# for testthat 1.0.2 later, change reporter from "summary" to default_reporter()
testthat:::run_tests("SparkR",
file.path(sparkRDir, "pkg", "tests", "fulltests"),
NULL,
"summary")
}
test_package("SparkR")

if (identical(Sys.getenv("NOT_CRAN"), "true")) {
# set random seed for predictable results. mostly for base's sample() in tree and classification
set.seed(42)
# for testthat 1.0.2 later, change reporter from "summary" to default_reporter()
testthat:::run_tests("SparkR",
file.path(sparkRDir, "pkg", "tests", "fulltests"),
NULL,
"summary")
}

SparkR:::uninstallDownloadedSpark()
SparkR:::uninstallDownloadedSpark()

}
2 changes: 1 addition & 1 deletion assembly/README
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,4 @@ This module is off by default. To activate it specify the profile in the command

If you need to build an assembly for a different version of Hadoop the
hadoop-version system property needs to be set as in this example:
-Dhadoop.version=2.7.7
-Dhadoop.version=2.7.3
2 changes: 1 addition & 1 deletion assembly/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
<parent>
<groupId>org.apache.spark</groupId>
<artifactId>spark-parent_2.11</artifactId>
<version>2.4.0-SNAPSHOT</version>
<version>2.4.1-SNAPSHOT</version>
<relativePath>../pom.xml</relativePath>
</parent>

Expand Down
2 changes: 2 additions & 0 deletions bin/docker-image-tool.sh
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,8 @@ function build {
img_path=$IMG_PATH
--build-arg
spark_jars=assembly/target/scala-$SPARK_SCALA_VERSION/jars
--build-arg
k8s_tests=resource-managers/kubernetes/integration-tests/tests
)
else
# Not passed as an argument to docker, but used to validate the Spark directory.
Expand Down
4 changes: 4 additions & 0 deletions build/mvn
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,7 @@ if [ -n "${ZINC_INSTALL_FLAG}" -o -z "`"${ZINC_BIN}" -status -port ${ZINC_PORT}`
export ZINC_OPTS=${ZINC_OPTS:-"$_COMPILE_JVM_OPTS"}
"${ZINC_BIN}" -shutdown -port ${ZINC_PORT}
"${ZINC_BIN}" -start -port ${ZINC_PORT} \
-server 127.0.0.1 -idle-timeout 30m \
-scala-compiler "${SCALA_COMPILER}" \
-scala-library "${SCALA_LIBRARY}" &>/dev/null
fi
Expand All @@ -164,3 +165,6 @@ echo "Using \`mvn\` from path: $MVN_BIN" 1>&2

# Last, call the `mvn` command as usual
"${MVN_BIN}" -DzincPort=${ZINC_PORT} "$@"

# Try to shut down zinc explicitly
"${ZINC_BIN}" -shutdown -port ${ZINC_PORT}
2 changes: 1 addition & 1 deletion common/kvstore/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
<parent>
<groupId>org.apache.spark</groupId>
<artifactId>spark-parent_2.11</artifactId>
<version>2.4.0-SNAPSHOT</version>
<version>2.4.1-SNAPSHOT</version>
<relativePath>../../pom.xml</relativePath>
</parent>

Expand Down
2 changes: 1 addition & 1 deletion common/network-common/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
<parent>
<groupId>org.apache.spark</groupId>
<artifactId>spark-parent_2.11</artifactId>
<version>2.4.0-SNAPSHOT</version>
<version>2.4.1-SNAPSHOT</version>
<relativePath>../../pom.xml</relativePath>
</parent>

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,10 @@
*/
public abstract class ManagedBuffer {

/** Number of bytes of the data. */
/**
* Number of bytes of the data. If this buffer will decrypt for all of the views into the data,
* this is the size of the decrypted data.
*/
public abstract long size();

/**
Expand Down
Loading