Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
201 commits
Select commit Hold shift + click to select a range
8ac0910
[SPARK-17848][ML] Move LabelCol datatype cast into Predictor.fit
zhengruifeng Nov 1, 2016
8cdf143
[SPARK-18103][FOLLOW-UP][SQL][MINOR] Rename `MetadataLogFileCatalog` …
lw-lin Nov 1, 2016
8a538c9
[SPARK-18189][SQL] Fix serialization issue in KeyValueGroupedDataset
seyfe Nov 1, 2016
d0272b4
[SPARK-18148][SQL] Misleading Error Message for Aggregation Without W…
jiangxb1987 Nov 1, 2016
cfac17e
[SPARK-18167] Disable flaky SQLQuerySuite test
ericl Nov 1, 2016
01dd008
[SPARK-17764][SQL] Add `to_json` supporting to convert nested struct …
HyukjinKwon Nov 1, 2016
6e62981
[SPARK-17350][SQL] Disable default use of KryoSerializer in Thrift Se…
JoshRosen Nov 1, 2016
b929537
[SPARK-18182] Expose ReplayListenerBus.read() overload which takes st…
JoshRosen Nov 1, 2016
91c33a0
[SPARK-18088][ML] Various ChiSqSelector cleanups
jkbradley Nov 2, 2016
77a9816
[SPARK-18025] Use commit protocol API in structured streaming
rxin Nov 2, 2016
ad4832a
[SPARK-18216][SQL] Make Column.expr public
rxin Nov 2, 2016
1ecfafa
[SPARK-17838][SPARKR] Check named arguments for options and use forma…
HyukjinKwon Nov 2, 2016
1bbf9ff
[SPARK-17992][SQL] Return all partitions from HiveShim when Hive thro…
Nov 2, 2016
620da3b
[SPARK-17475][STREAMING] Delete CRC files if the filesystem doesn't u…
frreiss Nov 2, 2016
abefe2e
[SPARK-18183][SPARK-18184] Fix INSERT [INTO|OVERWRITE] TABLE ... PART…
ericl Nov 2, 2016
a36653c
[SPARK-18192] Support all file formats in structured streaming
rxin Nov 2, 2016
85c5424
[SPARK-18144][SQL] logging StreamingQueryListener$QueryStartedEvent
CodingCat Nov 2, 2016
2dc0480
[SPARK-17532] Add lock debugging info to thread dumps.
rdblue Nov 2, 2016
bcbe444
[MINOR] Use <= for clarity in Pi examples' Monte Carlo process
mrydzy Nov 2, 2016
98ede49
[SPARK-18198][DOC][STREAMING] Highlight code snippets
lw-lin Nov 2, 2016
70a5db7
[SPARK-18204][WEBUI] Remove SparkUI.appUIAddress
jaceklaskowski Nov 2, 2016
9c8deef
[SPARK-18076][CORE][SQL] Fix default Locale used in DateFormat, Numbe…
srowen Nov 2, 2016
f151bd1
[SPARK-16839][SQL] Simplify Struct creation code path
Nov 2, 2016
4af0ce2
[SPARK-17683][SQL] Support ArrayType in Literal.apply
maropu Nov 2, 2016
742e0fe
[SPARK-17895] Improve doc for rangeBetween and rowsBetween
david-weiluo-ren Nov 2, 2016
02f2031
[SPARK-14393][SQL] values generated by non-deterministic functions sh…
mengxr Nov 2, 2016
3c24299
[SPARK-18160][CORE][YARN] spark.files & spark.jars should not be pass…
zjffdu Nov 2, 2016
37d9522
[SPARK-17058][BUILD] Add maven snapshots-and-staging profile to build…
steveloughran Nov 2, 2016
fd90541
[SPARK-18214][SQL] Simplify RuntimeReplaceable type coercion
rxin Nov 2, 2016
3a1bc6f
[SPARK-17470][SQL] unify path for data source table and locationUri f…
cloud-fan Nov 3, 2016
7eb2ca8
[SPARK-17963][SQL][DOCUMENTATION] Add examples (extend) in each expre…
HyukjinKwon Nov 3, 2016
9ddec86
[SPARK-18175][SQL] Improve the test case coverage of implicit type ca…
gatorsmile Nov 3, 2016
d24e736
[SPARK-18200][GRAPHX] Support zero as an initial capacity in OpenHashSet
dongjoon-hyun Nov 3, 2016
96cc1b5
[SPARK-17122][SQL] support drop current database
adrian-wang Nov 3, 2016
937af59
[SPARK-18219] Move commit protocol API (internal) from sql/core to co…
rxin Nov 3, 2016
0ea5d5b
[SQL] minor - internal doc improvement for InsertIntoTable.
rxin Nov 3, 2016
9dc9f9a
[SPARK-18177][ML][PYSPARK] Add missing 'subsamplingRate' of pyspark G…
zhengruifeng Nov 3, 2016
66a99f4
[SPARK-17981][SPARK-17957][SQL] Fix Incorrect Nullability Setting to …
gatorsmile Nov 3, 2016
27daf6b
[SPARK-17949][SQL] A JVM object based aggregate operator
liancheng Nov 3, 2016
b17057c
[SPARK-18244][SQL] Rename partitionProviderIsHive -> tracksPartitions…
rxin Nov 3, 2016
1629331
[SPARK-18237][HIVE] hive.exec.stagingdir have no effect
Nov 3, 2016
098e4ca
[SPARK-18099][YARN] Fail if same files added to distributed cache for…
kishorvpatil Nov 3, 2016
67659c9
[SPARK-18212][SS][KAFKA] increase executor poll timeout
koeninger Nov 3, 2016
e892025
[SPARKR][TEST] remove unnecessary suppressWarnings
wangmiao1981 Nov 3, 2016
f22954a
[SPARK-18257][SS] Improve error reporting for FileStressSuite
rxin Nov 3, 2016
dc4c600
[SPARK-18138][DOCS] Document that Java 7, Python 2.6, Scala 2.10, Had…
srowen Nov 4, 2016
aa412c5
[SPARK-18259][SQL] Do not capture Throwable in QueryExecution
hvanhovell Nov 4, 2016
a08463b
[SPARK-14393][SQL][DOC] update doc for python and R
felixcheung Nov 4, 2016
27602c3
[SPARK-18200][GRAPHX][FOLLOW-UP] Support zero as an initial capacity …
dongjoon-hyun Nov 4, 2016
14f235d
Closing some stale/invalid pull requests
rxin Nov 4, 2016
a42d738
[SPARK-18197][CORE] Optimise AppendOnlyMap implementation
a-roberts Nov 4, 2016
550cd56
[SPARK-17337][SQL] Do not pushdown predicates through filters with p…
hvanhovell Nov 4, 2016
4cee2ce
[SPARK-18167] Re-enable the non-flaky parts of SQLQuerySuite
ericl Nov 4, 2016
0e3312e
[SPARK-18256] Improve the performance of event log replay in HistoryS…
JoshRosen Nov 5, 2016
0f7c9e8
[SPARK-18189] [SQL] [Followup] Move test from ReplSuite to prevent ja…
rxin Nov 5, 2016
8a9ca19
[SPARK-17710][FOLLOW UP] Add comments to state why 'Utils.classForNam…
weiqingy Nov 5, 2016
6e27018
[SPARK-18260] Make from_json null safe
brkyvz Nov 5, 2016
95ec4e2
[SPARK-17183][SPARK-17983][SPARK-18101][SQL] put hive serde table sch…
cloud-fan Nov 5, 2016
e2648d3
[SPARK-18287][SQL] Move hash expressions from misc.scala into hash.scala
rxin Nov 5, 2016
a87471c
[SPARK-18192][MINOR][FOLLOWUP] Missed json test in FileStreamSinkSuite
HyukjinKwon Nov 5, 2016
fb0d608
[SPARK-17849][SQL] Fix NPE problem when using grouping sets
Nov 5, 2016
9a87c31
[SPARK-17964][SPARKR] Enable SparkR with Mesos client mode and cluste…
susanxhuynh Nov 5, 2016
15d3926
[MINOR][DOCUMENTATION] Fix some minor descriptions in functions consi…
HyukjinKwon Nov 6, 2016
23ce0d1
[SPARK-18276][ML] ML models should copy the training summary and set …
sethah Nov 6, 2016
340f09d
[SPARK-17854][SQL] rand/randn allows null/long as input seed
HyukjinKwon Nov 6, 2016
b89d055
[SPARK-18210][ML] Pipeline.copy does not create an instance with the …
wojtek-szymanski Nov 6, 2016
556a3b7
[SPARK-18269][SQL] CSV datasource should read null properly when sche…
HyukjinKwon Nov 7, 2016
46b2e49
[SPARK-18173][SQL] data source tables should support truncating parti…
cloud-fan Nov 7, 2016
07ac3f0
[SPARK-18167][SQL] Disable flaky hive partition pruning test.
rxin Nov 7, 2016
9db06c4
[SPARK-18296][SQL] Use consistent naming for expression test suites
rxin Nov 7, 2016
57626a5
[SPARK-16904][SQL] Removal of Hive Built-in Hash Functions and TestHi…
gatorsmile Nov 7, 2016
a814eea
[SPARK-18125][SQL] Fix a compilation error in codegen due to splitExp…
viirya Nov 7, 2016
daa975f
[SPARK-18291][SPARKR][ML] SparkR glm predict should output original l…
yanboliang Nov 7, 2016
b06c23d
[SPARK-18283][STRUCTURED STREAMING][KAFKA] Added test to check whethe…
tdas Nov 7, 2016
0d95662
[SPARK-17108][SQL] Fix BIGINT and INT comparison failure in spark sql
weiqingy Nov 7, 2016
8f0ea01
[SPARK-14914][CORE] Fix Resource not closed after using, mostly for u…
HyukjinKwon Nov 7, 2016
19cf208
[SPARK-17490][SQL] Optimize SerializeFromObject() for a primitive array
kiszk Nov 7, 2016
3a710b9
[SPARK-18236] Reduce duplicate objects in Spark UI and HistoryServer
JoshRosen Nov 8, 2016
3eda057
[SPARK-18295][SQL] Make to_json function null safe (matching it to fr…
HyukjinKwon Nov 8, 2016
9b0593d
[SPARK-18086] Add support for Hive session vars.
rdblue Nov 8, 2016
c1a0c66
[SPARK-18261][STRUCTURED STREAMING] Add statistics to MemorySink for …
lw-lin Nov 8, 2016
1da64e1
[SPARK-18217][SQL] Disallow creating permanent views based on tempora…
gatorsmile Nov 8, 2016
6f36971
[SPARK-16575][CORE] partition calculation mismatch with sc.binaryFiles
fidato13 Nov 8, 2016
47731e1
[SPARK-18207][SQL] Fix a compilation error due to HashExpression.doGe…
kiszk Nov 8, 2016
c291bd2
[SPARK-18137][SQL] Fix RewriteDistinctAggregates UnresolvedException …
Nov 8, 2016
ee2e741
[SPARK-13770][DOCUMENTATION][ML] Document the ML feature Interaction
Nov 8, 2016
b1033fb
[MINOR][DOC] Unify example marks
zhengruifeng Nov 8, 2016
344dcad
[SPARK-17868][SQL] Do not use bitmasks during parsing and analysis of…
jiangxb1987 Nov 8, 2016
73feaa3
[SPARK-18346][SQL] TRUNCATE TABLE should fail if no partition is matc…
cloud-fan Nov 8, 2016
9c41969
[SPARK-18191][CORE] Port RDD API to use commit protocol
jiangxb1987 Nov 8, 2016
245e5a2
[SPARK-18357] Fix yarn files/archive broken issue andd unit tests
kishorvpatil Nov 8, 2016
26e1c53
[SPARK-17748][ML] Minor cleanups to one-pass linear regression with e…
jkbradley Nov 8, 2016
b6de0c9
[SPARK-18280][CORE] Fix potential deadlock in `StandaloneSchedulerBac…
zsxwing Nov 8, 2016
6f7ecb0
[SPARK-18342] Make rename failures fatal in HDFSBackedStateStore
brkyvz Nov 8, 2016
55964c1
[SPARK-18239][SPARKR] Gradient Boosted Tree for R
felixcheung Nov 9, 2016
4afa39e
[SPARK-18333][SQL] Revert hacks in parquet and orc reader to support …
ericl Nov 9, 2016
b9192bb
[SPARK-18368] Fix regexp_replace with task serialization.
rdblue Nov 9, 2016
e256392
[SPARK-17659][SQL] Partitioned View is Not Supported By SHOW CREATE T…
gatorsmile Nov 9, 2016
02c5325
[SPARK-18292][SQL] LogicalPlanToSQLSuite should not use resource depe…
dongjoon-hyun Nov 9, 2016
205e6d5
[SPARK-18338][SQL][TEST-MAVEN] Fix test case initialization order und…
liancheng Nov 9, 2016
06a13ec
[SPARK-16808][CORE] History Server main page does not honor APPLICATI…
vijoshi Nov 9, 2016
4763661
Revert "[SPARK-18368] Fix regexp_replace with task serialization."
yhuai Nov 9, 2016
d4028de
[SPARK-18368][SQL] Fix regexp replace when serialized
rdblue Nov 9, 2016
d8b81f7
[SPARK-18370][SQL] Add table information to InsertIntoHadoopFsRelatio…
hvanhovell Nov 9, 2016
64fbdf1
[SPARK-18191][CORE][FOLLOWUP] Call `setConf` if `OutputFormat` is `Co…
jiangxb1987 Nov 9, 2016
3f62e1b
[SPARK-17829][SQL] Stable format for offset log
Nov 9, 2016
6021c95
[SPARK-18147][SQL] do not fail for very complex aggregator result type
cloud-fan Nov 10, 2016
cc86fcd
[MINOR][PYSPARK] Improve error message when running PySpark with diff…
viirya Nov 10, 2016
96a5910
[SPARK-18268][ML][MLLIB] ALS fail with better message if ratings is e…
techaddict Nov 10, 2016
22a9d06
[SPARK-14914][CORE] Fix Resource not closed after using, for unit tes…
wangmiao1981 Nov 10, 2016
16eaad9
[SPARK-18262][BUILD][SQL] JSON.org license is now CatX
srowen Nov 10, 2016
b533fa2
[SPARK-17993][SQL] Fix Parquet log output redirection
Nov 10, 2016
2f7461f
[SPARK-17990][SPARK-18302][SQL] correct several partition related beh…
cloud-fan Nov 10, 2016
e0deee1
[SPARK-18403][SQL] Temporarily disable flaky ObjectHashAggregateSuite
liancheng Nov 10, 2016
a335634
[SPARK-18185] Fix all forms of INSERT / OVERWRITE TABLE for Datasourc…
ericl Nov 11, 2016
5ddf694
[SPARK-18401][SPARKR][ML] SparkR random forest should support output …
yanboliang Nov 11, 2016
4f15d94
[SPARK-13331] AES support for over-the-wire encryption
Nov 11, 2016
a531fe1
[SPARK-17843][WEB UI] Indicate event logs pending for processing on h…
vijoshi Nov 11, 2016
d42bb7c
[SPARK-17982][SQL] SQLBuilder should wrap the generated SQL with pare…
dongjoon-hyun Nov 11, 2016
6e95325
[SPARK-18387][SQL] Add serialization to checkEvaluation.
rdblue Nov 11, 2016
ba23f76
[SPARK-18264][SPARKR] build vignettes with package, update vignettes …
felixcheung Nov 11, 2016
46b2550
[SPARK-18060][ML] Avoid unnecessary computation for MLOR
sethah Nov 12, 2016
3af8945
[SPARK-16759][CORE] Add a configuration property to pass caller conte…
weiqingy Nov 12, 2016
bc41d99
[SPARK-18375][SPARK-18383][BUILD][CORE] Upgrade netty to 4.0.42.Final
witgo Nov 12, 2016
22cb3a0
[SPARK-14077][ML][FOLLOW-UP] Minor refactor and cleanup for NaiveBayes
yanboliang Nov 12, 2016
1386fd2
[SPARK-18418] Fix flags for make_binary_release for hadoop profile
holdenk Nov 12, 2016
b91a51b
[SPARK-18426][STRUCTURED STREAMING] Python Documentation Fix for Stru…
Nov 14, 2016
07be232
[SPARK-18412][SPARKR][ML] Fix exception for some SparkR ML algorithms…
yanboliang Nov 14, 2016
f95b124
[SPARK-18382][WEBUI] "run at null:-1" in UI when no file/line info in…
srowen Nov 14, 2016
ae6cddb
[SPARK-18166][MLLIB] Fix Poisson GLM bug due to wrong requirement of …
actuaryzhang Nov 14, 2016
637a0bb
[SPARK-18396][HISTORYSERVER] Duration" column makes search result con…
WangTaoTheTonic Nov 14, 2016
9d07cee
[SPARK-18432][DOC] Changed HDFS default block size from 64MB to 128MB
moomindani Nov 14, 2016
bdfe60a
[SPARK-18416][STRUCTURED STREAMING] Fixed temp file leak in state store
tdas Nov 14, 2016
89d1fa5
[SPARK-17510][STREAMING][KAFKA] config max rate on a per-partition basis
koeninger Nov 14, 2016
7593445
[SPARK-11496][GRAPHX][FOLLOWUP] Add param checking for runParallelPer…
zhengruifeng Nov 14, 2016
bd85603
[SPARK-17348][SQL] Incorrect results from subquery transformation
nsyca Nov 14, 2016
c071878
[SPARK-18124] Observed delay based Event Time Watermarks
marmbrus Nov 15, 2016
c31def1
[SPARK-18428][DOC] Update docs for GraphX
zhengruifeng Nov 15, 2016
86430cc
[SPARK-18430][SQL] Fixed Exception Messages when Hitting an Invocatio…
gatorsmile Nov 15, 2016
d89bfc9
[SPARK-18232][MESOS] Support CNI
Nov 15, 2016
33be4da
[SPARK-18427][DOC] Update docs of mllib.KMeans
zhengruifeng Nov 15, 2016
f14ae49
[SPARK-18300][SQL] Do not apply foldable propagation with expand as a…
hvanhovell Nov 15, 2016
745ab8b
[SPARK-18379][SQL] Make the parallelism of parallelPartitionDiscovery…
Nov 15, 2016
6f9e598
[SPARK-13027][STREAMING] Added batch time as a parameter to updateSta…
Nov 15, 2016
2afdaa9
[SPARK-18337] Complete mode memory sinks should be able to recover fr…
brkyvz Nov 15, 2016
5bcb9a7
[SPARK-18417][YARN] Define 'spark.yarn.am.port' in yarn config object
weiqingy Nov 15, 2016
1ae4652
[SPARK-18440][STRUCTURED STREAMING] Pass correct query execution to F…
tdas Nov 15, 2016
503378f
[SPARK-18423][STREAMING] ReceiverTracker should close checkpoint dir …
HyukjinKwon Nov 15, 2016
3ce057d
[SPARK-17732][SQL] ALTER TABLE DROP PARTITION should support comparators
dongjoon-hyun Nov 15, 2016
4b35d13
[SPARK-18300][SQL] Fix scala 2.10 build for FoldablePropagation
hvanhovell Nov 16, 2016
4ac9759
[SPARK-18377][SQL] warehouse path should be a static conf
cloud-fan Nov 16, 2016
95eb06b
[SPARK-18438][SPARKR][ML] spark.mlp should support RFormula.
yanboliang Nov 16, 2016
74f5c21
[SPARK-18433][SQL] Improve DataSource option keys to be more case-ins…
dongjoon-hyun Nov 16, 2016
3e01f12
[DOC][MINOR] Kafka doc: breakup into lines
lw-lin Nov 16, 2016
43a2689
[SPARK-18400][STREAMING] NPE when resharding Kinesis Stream
srowen Nov 16, 2016
e614577
[SPARK-18410][STREAMING] Add structured kafka example
uncleGen Nov 16, 2016
241e04b
[MINOR][DOC] Fix typos in the 'configuration', 'monitoring' and 'sql-…
weiqingy Nov 16, 2016
c68f1a3
[SPARK-18434][ML] Add missing ParamValidations for ML algos
zhengruifeng Nov 16, 2016
a75e3fe
[SPARK-18446][ML][DOCS] Add links to API docs for ML algos
zhengruifeng Nov 16, 2016
7569cf6
[SPARK-18420][BUILD] Fix the errors caused by lint check in Java
Nov 16, 2016
608ecc5
[SPARK-18415][SQL] Weird Plan Output when CTE used in RunnableCommand
gatorsmile Nov 16, 2016
0048ce7
[SPARK-18459][SPARK-18460][STRUCTUREDSTREAMING] Rename triggerId to b…
tdas Nov 16, 2016
bb6cdfd
[SPARK-18461][DOCS][STRUCTUREDSTREAMING] Added more information about…
tdas Nov 16, 2016
a36a76a
[SPARK-1267][SPARK-18129] Allow PySpark to be pip installed
holdenk Nov 16, 2016
2ca8ae9
[SPARK-18186] Migrate HiveUDAFFunction to TypedImperativeAggregate fo…
liancheng Nov 16, 2016
5558998
[YARN][DOC] Increasing NodeManager's heap size with External Shuffle …
Devian-ua Nov 16, 2016
170eeb3
[SPARK-18442][SQL] Fix nullability of WrapOption.
ueshin Nov 17, 2016
07b3f04
[SPARK-18464][SQL] support old table which doesn't store schema in me…
cloud-fan Nov 17, 2016
a3cac7b
[YARN][DOC] Remove non-Yarn specific configurations from running-on-y…
weiqingy Nov 17, 2016
49b6f45
[SPARK-18365][DOCS] Improve Sample Method Documentation
bllchmbrs Nov 17, 2016
de77c67
[SPARK-17462][MLLIB]use VersionUtils to parse Spark version strings
Nov 17, 2016
cdaf4ce
[SPARK-18480][DOCS] Fix wrong links for ML guide docs
zhengruifeng Nov 17, 2016
b0aa1aa
[SPARK-18490][SQL] duplication nodename extrainfo for ShuffleExchange
Nov 17, 2016
ce13c26
[SPARK-18360][SQL] default table path of tables in default database s…
cloud-fan Nov 18, 2016
d9dd979
[SPARK-18462] Fix ClassCastException in SparkListenerDriverAccumUpdat…
JoshRosen Nov 18, 2016
7aee59f
Merge branch 'master' into rk/merge
Nov 18, 2016
51baca2
[SPARK-18187][SQL] CompactibleFileStreamLog should not use "compactIn…
Nov 18, 2016
795e9fc
[SPARK-18457][SQL] ORC and other columnar formats using HiveShim read…
aray Nov 18, 2016
40d59ff
[SPARK-18422][CORE] Fix wholeTextFiles test to pass on Windows in Jav…
HyukjinKwon Nov 18, 2016
b5d2cb2
Merge branch 'master' into rk/merge
Nov 18, 2016
e5f5c29
[SPARK-18477][SS] Enable interrupts for HDFS in HDFSMetadataLog
zsxwing Nov 19, 2016
6f7ff75
[SPARK-18505][SQL] Simplify AnalyzeColumnCommand
rxin Nov 19, 2016
2a40de4
[SPARK-18497][SS] Make ForeachSink support watermark
zsxwing Nov 19, 2016
db9fb9b
[SPARK-18448][CORE] SparkSession should implement java.lang.AutoClose…
srowen Nov 19, 2016
d5b1d5f
[SPARK-18445][BUILD][DOCS] Fix the markdown for `Note:`/`NOTE:`/`Note…
HyukjinKwon Nov 19, 2016
8b1e108
[SPARK-18353][CORE] spark.rpc.askTimeout defalut value is not 120s
srowen Nov 19, 2016
ded5fef
[SPARK-18448][CORE] Fix @since 2.1.0 on new SparkSession.close() method
srowen Nov 19, 2016
ea77c81
[SPARK-17062][MESOS] add conf option to mesos dispatcher
skonto Nov 20, 2016
856e004
[SPARK-18456][ML][FOLLOWUP] Use matrix abstraction for coefficients i…
sethah Nov 20, 2016
d93b655
[SPARK-18458][CORE] Fix signed integer overflow problem at an express…
kiszk Nov 20, 2016
bce9a03
[SPARK-18508][SQL] Fix documentation error for DateDiff
rxin Nov 20, 2016
a64f25d
[SQL] Fix documentation for Concat and ConcatWs
rxin Nov 20, 2016
7ca7a63
[SPARK-15214][SQL] Code-generation for Generate
hvanhovell Nov 20, 2016
c528812
[SPARK-3359][BUILD][DOCS] Print examples and disable group and tparam…
HyukjinKwon Nov 20, 2016
6659ae5
Fix Mesos build break for Scala 2.10.
rxin Nov 20, 2016
b625a36
[HOTFIX][SQL] Fix DDLSuite failure.
rxin Nov 21, 2016
6585479
[SPARK-18467][SQL] Extracts method for preparing arguments from Stati…
ueshin Nov 21, 2016
e811fbf
[SPARK-18282][ML][PYSPARK] Add python clustering summaries for GMM an…
sethah Nov 21, 2016
9f262ae
[SPARK-18398][SQL] Fix nullabilities of MapObjects and ExternalMapToC…
ueshin Nov 21, 2016
07beb5d
[SPARK-18413][SQL] Add `maxConnections` JDBCOption
dongjoon-hyun Nov 21, 2016
189d143
Merge branch 'master' into rk/merge
Nov 21, 2016
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,8 @@ project/plugins/project/build.properties
project/plugins/src_managed/
project/plugins/target/
python/lib/pyspark.zip
python/deps
python/pyspark/python
reports/
scalastyle-on-compile.generated.xml
scalastyle-output.xml
Expand Down
3 changes: 0 additions & 3 deletions NOTICE
Original file line number Diff line number Diff line change
Expand Up @@ -421,9 +421,6 @@ Copyright (c) 2011, Terrence Parr.
This product includes/uses ASM (http://asm.ow2.org/),
Copyright (c) 2000-2007 INRIA, France Telecom.

This product includes/uses org.json (http://www.json.org/java/index.html),
Copyright (c) 2002 JSON.org

This product includes/uses JLine (http://jline.sourceforge.net/),
Copyright (c) 2002-2006, Marc Prud'hommeaux <mwp1@cornell.edu>.

Expand Down
91 changes: 91 additions & 0 deletions R/CRAN_RELEASE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# SparkR CRAN Release

To release SparkR as a package to CRAN, we would use the `devtools` package. Please work with the
`dev@spark.apache.org` community and R package maintainer on this.

### Release

First, check that the `Version:` field in the `pkg/DESCRIPTION` file is updated. Also, check for stale files not under source control.

Note that while `check-cran.sh` is running `R CMD check`, it is doing so with `--no-manual --no-vignettes`, which skips a few vignettes or PDF checks - therefore it will be preferred to run `R CMD check` on the source package built manually before uploading a release.

To upload a release, we would need to update the `cran-comments.md`. This should generally contain the results from running the `check-cran.sh` script along with comments on status of all `WARNING` (should not be any) or `NOTE`. As a part of `check-cran.sh` and the release process, the vignettes is build - make sure `SPARK_HOME` is set and Spark jars are accessible.

Once everything is in place, run in R under the `SPARK_HOME/R` directory:

```R
paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); devtools::release(); .libPaths(paths)
```

For more information please refer to http://r-pkgs.had.co.nz/release.html#release-check

### Testing: build package manually

To build package manually such as to inspect the resulting `.tar.gz` file content, we would also use the `devtools` package.

Source package is what get released to CRAN. CRAN would then build platform-specific binary packages from the source package.

#### Build source package

To build source package locally without releasing to CRAN, run in R under the `SPARK_HOME/R` directory:

```R
paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); devtools::build("pkg"); .libPaths(paths)
```

(http://r-pkgs.had.co.nz/vignettes.html#vignette-workflow-2)

Similarly, the source package is also created by `check-cran.sh` with `R CMD build pkg`.

For example, this should be the content of the source package:

```sh
DESCRIPTION R inst tests
NAMESPACE build man vignettes

inst/doc/
sparkr-vignettes.html
sparkr-vignettes.Rmd
sparkr-vignettes.Rman

build/
vignette.rds

man/
*.Rd files...

vignettes/
sparkr-vignettes.Rmd
```

#### Test source package

To install, run this:

```sh
R CMD INSTALL SparkR_2.1.0.tar.gz
```

With "2.1.0" replaced with the version of SparkR.

This command installs SparkR to the default libPaths. Once that is done, you should be able to start R and run:

```R
library(SparkR)
vignette("sparkr-vignettes", package="SparkR")
```

#### Build binary package

To build binary package locally, run in R under the `SPARK_HOME/R` directory:

```R
paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); devtools::build("pkg", binary = TRUE); .libPaths(paths)
```

For example, this should be the content of the binary package:

```sh
DESCRIPTION Meta R html tests
INDEX NAMESPACE help profile worker
```
8 changes: 4 additions & 4 deletions R/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ SparkR is an R package that provides a light-weight frontend to use Spark from R

Libraries of sparkR need to be created in `$SPARK_HOME/R/lib`. This can be done by running the script `$SPARK_HOME/R/install-dev.sh`.
By default the above script uses the system wide installation of R. However, this can be changed to any user installed location of R by setting the environment variable `R_HOME` the full path of the base directory where R is installed, before running install-dev.sh script.
Example:
Example:
```bash
# where /home/username/R is where R is installed and /home/username/R/bin contains the files R and RScript
export R_HOME=/home/username/R
Expand Down Expand Up @@ -46,19 +46,19 @@ Sys.setenv(SPARK_HOME="/Users/username/spark")
# This line loads SparkR from the installed directory
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
library(SparkR)
sc <- sparkR.init(master="local")
sparkR.session()
```

#### Making changes to SparkR

The [instructions](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark) for making contributions to Spark also apply to SparkR.
If you only make R file changes (i.e. no Scala changes) then you can just re-install the R package using `R/install-dev.sh` and test your changes.
Once you have made your changes, please include unit tests for them and run existing unit tests using the `R/run-tests.sh` script as described below.

#### Generating documentation

The SparkR documentation (Rd files and HTML files) are not a part of the source repository. To generate them you can run the script `R/create-docs.sh`. This script uses `devtools` and `knitr` to generate the docs and these packages need to be installed on the machine before using the script. Also, you may need to install these [prerequisites](https://github.com/apache/spark/tree/master/docs#prerequisites). See also, `R/DOCUMENTATION.md`

### Examples, Unit tests

SparkR comes with several sample programs in the `examples/src/main/r` directory.
Expand Down
33 changes: 27 additions & 6 deletions R/check-cran.sh
Original file line number Diff line number Diff line change
Expand Up @@ -36,11 +36,27 @@ if [ ! -z "$R_HOME" ]
fi
echo "USING R_HOME = $R_HOME"

# Build the latest docs
# Build the latest docs, but not vignettes, which is built with the package next
$FWDIR/create-docs.sh

# Build a zip file containing the source package
"$R_SCRIPT_PATH/"R CMD build $FWDIR/pkg
# Build source package with vignettes
SPARK_HOME="$(cd "${FWDIR}"/..; pwd)"
. "${SPARK_HOME}"/bin/load-spark-env.sh
if [ -f "${SPARK_HOME}/RELEASE" ]; then
SPARK_JARS_DIR="${SPARK_HOME}/jars"
else
SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars"
fi

if [ -d "$SPARK_JARS_DIR" ]; then
# Build a zip file containing the source package with vignettes
SPARK_HOME="${SPARK_HOME}" "$R_SCRIPT_PATH/"R CMD build $FWDIR/pkg

find pkg/vignettes/. -not -name '.' -not -name '*.Rmd' -not -name '*.md' -not -name '*.pdf' -not -name '*.html' -delete
else
echo "Error Spark JARs not found in $SPARK_HOME"
exit 1
fi

# Run check as-cran.
VERSION=`grep Version $FWDIR/pkg/DESCRIPTION | awk '{print $NF}'`
Expand All @@ -54,11 +70,16 @@ fi

if [ -n "$NO_MANUAL" ]
then
CRAN_CHECK_OPTIONS=$CRAN_CHECK_OPTIONS" --no-manual"
CRAN_CHECK_OPTIONS=$CRAN_CHECK_OPTIONS" --no-manual --no-vignettes"
fi

echo "Running CRAN check with $CRAN_CHECK_OPTIONS options"

"$R_SCRIPT_PATH/"R CMD check $CRAN_CHECK_OPTIONS SparkR_"$VERSION".tar.gz

if [ -n "$NO_TESTS" ] && [ -n "$NO_MANUAL" ]
then
"$R_SCRIPT_PATH/"R CMD check $CRAN_CHECK_OPTIONS SparkR_"$VERSION".tar.gz
else
# This will run tests and/or build vignettes, and require SPARK_HOME
SPARK_HOME="${SPARK_HOME}" "$R_SCRIPT_PATH/"R CMD check $CRAN_CHECK_OPTIONS SparkR_"$VERSION".tar.gz
fi
popd > /dev/null
19 changes: 1 addition & 18 deletions R/create-docs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
# Script to create API docs and vignettes for SparkR
# This requires `devtools`, `knitr` and `rmarkdown` to be installed on the machine.

# After running this script the html docs can be found in
# After running this script the html docs can be found in
# $SPARK_HOME/R/pkg/html
# The vignettes can be found in
# $SPARK_HOME/R/pkg/vignettes/sparkr_vignettes.html
Expand Down Expand Up @@ -52,21 +52,4 @@ Rscript -e 'libDir <- "../../lib"; library(SparkR, lib.loc=libDir); library(knit

popd

# Find Spark jars.
if [ -f "${SPARK_HOME}/RELEASE" ]; then
SPARK_JARS_DIR="${SPARK_HOME}/jars"
else
SPARK_JARS_DIR="${SPARK_HOME}/assembly/target/scala-$SPARK_SCALA_VERSION/jars"
fi

# Only create vignettes if Spark JARs exist
if [ -d "$SPARK_JARS_DIR" ]; then
# render creates SparkR vignettes
Rscript -e 'library(rmarkdown); paths <- .libPaths(); .libPaths(c("lib", paths)); Sys.setenv(SPARK_HOME=tools::file_path_as_absolute("..")); render("pkg/vignettes/sparkr-vignettes.Rmd"); .libPaths(paths)'

find pkg/vignettes/. -not -name '.' -not -name '*.Rmd' -not -name '*.md' -not -name '*.pdf' -not -name '*.html' -delete
else
echo "Skipping R vignettes as Spark JARs not found in $SPARK_HOME"
fi

popd
9 changes: 6 additions & 3 deletions R/pkg/DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
Package: SparkR
Type: Package
Title: R Frontend for Apache Spark
Version: 2.0.0
Date: 2016-08-27
Version: 2.1.0
Date: 2016-11-06
Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
email = "shivaram@cs.berkeley.edu"),
person("Xiangrui", "Meng", role = "aut",
Expand All @@ -18,7 +18,9 @@ Depends:
Suggests:
testthat,
e1071,
survival
survival,
knitr,
rmarkdown
Description: The SparkR package provides an R frontend for Apache Spark.
License: Apache License (== 2.0)
Collate:
Expand Down Expand Up @@ -48,3 +50,4 @@ Collate:
'utils.R'
'window.R'
RoxygenNote: 5.0.1
VignetteBuilder: knitr
9 changes: 7 additions & 2 deletions R/pkg/NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,8 @@ exportMethods("glm",
"spark.als",
"spark.kstest",
"spark.logit",
"spark.randomForest")
"spark.randomForest",
"spark.gbt")

# Job group lifecycle management methods
export("setJobGroup",
Expand Down Expand Up @@ -353,7 +354,9 @@ export("as.DataFrame",
"read.ml",
"print.summary.KSTest",
"print.summary.RandomForestRegressionModel",
"print.summary.RandomForestClassificationModel")
"print.summary.RandomForestClassificationModel",
"print.summary.GBTRegressionModel",
"print.summary.GBTClassificationModel")

export("structField",
"structField.jobj",
Expand All @@ -380,6 +383,8 @@ S3method(print, summary.GeneralizedLinearRegressionModel)
S3method(print, summary.KSTest)
S3method(print, summary.RandomForestRegressionModel)
S3method(print, summary.RandomForestClassificationModel)
S3method(print, summary.GBTRegressionModel)
S3method(print, summary.GBTClassificationModel)
S3method(structField, character)
S3method(structField, jobj)
S3method(structType, jobj)
Expand Down
14 changes: 8 additions & 6 deletions R/pkg/R/DataFrame.R
Original file line number Diff line number Diff line change
Expand Up @@ -788,7 +788,7 @@ setMethod("write.json",
function(x, path, mode = "error", ...) {
write <- callJMethod(x@sdf, "write")
write <- setWriteOptions(write, mode = mode, ...)
invisible(callJMethod(write, "json", path))
invisible(handledCallJMethod(write, "json", path))
})

#' Save the contents of SparkDataFrame as an ORC file, preserving the schema.
Expand Down Expand Up @@ -819,7 +819,7 @@ setMethod("write.orc",
function(x, path, mode = "error", ...) {
write <- callJMethod(x@sdf, "write")
write <- setWriteOptions(write, mode = mode, ...)
invisible(callJMethod(write, "orc", path))
invisible(handledCallJMethod(write, "orc", path))
})

#' Save the contents of SparkDataFrame as a Parquet file, preserving the schema.
Expand Down Expand Up @@ -851,7 +851,7 @@ setMethod("write.parquet",
function(x, path, mode = "error", ...) {
write <- callJMethod(x@sdf, "write")
write <- setWriteOptions(write, mode = mode, ...)
invisible(callJMethod(write, "parquet", path))
invisible(handledCallJMethod(write, "parquet", path))
})

#' @rdname write.parquet
Expand Down Expand Up @@ -895,7 +895,7 @@ setMethod("write.text",
function(x, path, mode = "error", ...) {
write <- callJMethod(x@sdf, "write")
write <- setWriteOptions(write, mode = mode, ...)
invisible(callJMethod(write, "text", path))
invisible(handledCallJMethod(write, "text", path))
})

#' Distinct
Expand Down Expand Up @@ -936,7 +936,9 @@ setMethod("unique",

#' Sample
#'
#' Return a sampled subset of this SparkDataFrame using a random seed.
#' Return a sampled subset of this SparkDataFrame using a random seed.
#' Note: this is not guaranteed to provide exactly the fraction specified
#' of the total count of of the given SparkDataFrame.
#'
#' @param x A SparkDataFrame
#' @param withReplacement Sampling with replacement or not
Expand Down Expand Up @@ -3342,7 +3344,7 @@ setMethod("write.jdbc",
jprops <- varargsToJProperties(...)
write <- callJMethod(x@sdf, "write")
write <- callJMethod(write, "mode", jmode)
invisible(callJMethod(write, "jdbc", url, tableName, jprops))
invisible(handledCallJMethod(write, "jdbc", url, tableName, jprops))
})

#' randomSplit
Expand Down
17 changes: 9 additions & 8 deletions R/pkg/R/SQLContext.R
Original file line number Diff line number Diff line change
Expand Up @@ -350,7 +350,7 @@ read.json.default <- function(path, ...) {
paths <- as.list(suppressWarnings(normalizePath(path)))
read <- callJMethod(sparkSession, "read")
read <- callJMethod(read, "options", options)
sdf <- callJMethod(read, "json", paths)
sdf <- handledCallJMethod(read, "json", paths)
dataFrame(sdf)
}

Expand Down Expand Up @@ -422,7 +422,7 @@ read.orc <- function(path, ...) {
path <- suppressWarnings(normalizePath(path))
read <- callJMethod(sparkSession, "read")
read <- callJMethod(read, "options", options)
sdf <- callJMethod(read, "orc", path)
sdf <- handledCallJMethod(read, "orc", path)
dataFrame(sdf)
}

Expand All @@ -444,7 +444,7 @@ read.parquet.default <- function(path, ...) {
paths <- as.list(suppressWarnings(normalizePath(path)))
read <- callJMethod(sparkSession, "read")
read <- callJMethod(read, "options", options)
sdf <- callJMethod(read, "parquet", paths)
sdf <- handledCallJMethod(read, "parquet", paths)
dataFrame(sdf)
}

Expand Down Expand Up @@ -496,7 +496,7 @@ read.text.default <- function(path, ...) {
paths <- as.list(suppressWarnings(normalizePath(path)))
read <- callJMethod(sparkSession, "read")
read <- callJMethod(read, "options", options)
sdf <- callJMethod(read, "text", paths)
sdf <- handledCallJMethod(read, "text", paths)
dataFrame(sdf)
}

Expand Down Expand Up @@ -914,12 +914,13 @@ read.jdbc <- function(url, tableName,
} else {
numPartitions <- numToInt(numPartitions)
}
sdf <- callJMethod(read, "jdbc", url, tableName, as.character(partitionColumn),
numToInt(lowerBound), numToInt(upperBound), numPartitions, jprops)
sdf <- handledCallJMethod(read, "jdbc", url, tableName, as.character(partitionColumn),
numToInt(lowerBound), numToInt(upperBound), numPartitions, jprops)
} else if (length(predicates) > 0) {
sdf <- callJMethod(read, "jdbc", url, tableName, as.list(as.character(predicates)), jprops)
sdf <- handledCallJMethod(read, "jdbc", url, tableName, as.list(as.character(predicates)),
jprops)
} else {
sdf <- callJMethod(read, "jdbc", url, tableName, jprops)
sdf <- handledCallJMethod(read, "jdbc", url, tableName, jprops)
}
dataFrame(sdf)
}
Loading