Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
163 commits
Select commit Hold shift + click to select a range
04d417a
[SPARK-17830][SQL] Annotate remaining SQL APIs with InterfaceStability
rxin Oct 13, 2016
84f149e
[SPARK-17827][SQL] maxColLength type should be Int for String and Binary
robbinspg Oct 13, 2016
08eac35
[SPARK-17834][SQL] Fetch the earliest offsets manually in KafkaSource…
zsxwing Oct 13, 2016
7106866
[SPARK-17731][SQL][STREAMING] Metrics for structured streaming
tdas Oct 13, 2016
adc1124
[SPARK-17661][SQL] Consolidate various listLeafFiles implementations
petermaxlee Oct 13, 2016
9dc0ca0
[SPARK-17368][SQL] Add support for value class serialization and dese…
jodersky Oct 14, 2016
44cbb61
[SPARK-15957][FOLLOW-UP][ML][PYSPARK] Add Python API for RFormula for…
yanboliang Oct 14, 2016
8543996
[SPARK-17927][SQL] Remove dead code in WriterContainer.
rxin Oct 14, 2016
6c29b3d
[SPARK-17925][SQL] Break fileSourceInterfaces.scala into multiple pieces
rxin Oct 14, 2016
2fb12b0
[SPARK-17903][SQL] MetastoreRelation should talk to external catalog …
cloud-fan Oct 14, 2016
1db8fea
[SPARK-15402][ML][PYSPARK] PySpark ml.evaluation should support save/…
yanboliang Oct 14, 2016
a1b136d
[SPARK-14634][ML] Add BisectingKMeansSummary
zhengruifeng Oct 14, 2016
c8b612d
[SPARK-17870][MLLIB][ML] Change statistic to pValue for SelectKBest a…
Oct 14, 2016
28b645b
[SPARK-17855][CORE] Remove query string from jar url
invkrh Oct 14, 2016
7486442
[SPARK-17073][SQL][FOLLOWUP] generate column-level statistics
Oct 14, 2016
a0ebcb3
[DOC] Fix typo in sql hive doc
dhruve Oct 14, 2016
fa37877
Typo: form -> from
ash211 Oct 14, 2016
05800b4
[TEST] Ignore flaky test in StreamingQueryListenerSuite
tdas Oct 14, 2016
de1c1ca
[SPARK-17941][ML][TEST] Logistic regression tests should use sample w…
sethah Oct 14, 2016
7ab8624
[SPARK-17620][SQL] Determine Serde by hive.default.fileformat when Cr…
dilipbiswal Oct 14, 2016
522dd0d
Revert "[SPARK-17620][SQL] Determine Serde by hive.default.fileformat…
yhuai Oct 14, 2016
da9aeb0
[SPARK-17863][SQL] should not add column into Distinct
Oct 14, 2016
5aeb738
[SPARK-16063][SQL] Add storageLevel to Dataset
Oct 14, 2016
f00df40
[SPARK-11775][PYSPARK][SQL] Allow PySpark to register Java UDF
zjffdu Oct 14, 2016
72adfbf
[SPARK-17900][SQL] Graduate a list of Spark SQL APIs to stable
rxin Oct 14, 2016
2d96d35
[SPARK-17946][PYSPARK] Python crossJoin API similar to Scala
srinathshankar Oct 15, 2016
6ce1b67
[SPARK-16980][SQL] Load only catalog table partition metadata require…
Oct 15, 2016
36d81c2
[SPARK-17953][DOCUMENTATION] Fix typo in SparkSession scaladoc
tae-jun Oct 15, 2016
ed14633
[SPARK-17637][SCHEDULER] Packed scheduling for Spark tasks across exe…
Oct 16, 2016
72a6e7a
Revert "[SPARK-17637][SCHEDULER] Packed scheduling for Spark tasks ac…
rxin Oct 16, 2016
59e3eb5
[SPARK-17819][SQL] Support default database in connection URIs for Sp…
dongjoon-hyun Oct 17, 2016
e18d02c
[SPARK-17947][SQL] Add Doc and Comment about spark.sql.debug
gatorsmile Oct 17, 2016
56b0f5f
[MINOR][SQL] Add prettyName for current_database function
weiqingy Oct 17, 2016
e3bf37f
Fix example of tf_idf with minDocFreq
maximerihouey Oct 17, 2016
c7ac027
[SPARK-17839][CORE] Use Nio's directbuffer instead of BufferedInputSt…
Oct 17, 2016
d88a1ba
[SPARK-17751][SQL] Remove spark.sql.eagerAnalysis and Output the Plan…
gatorsmile Oct 17, 2016
813ab5e
[SPARK-17620][SQL] Determine Serde by hive.default.fileformat when Cr…
dilipbiswal Oct 18, 2016
8daa1a2
[SPARK-17974] Refactor FileCatalog classes to simplify the inheritanc…
ericl Oct 18, 2016
1c5a7d7
Revert "[SPARK-17974] Refactor FileCatalog classes to simplify the in…
rxin Oct 18, 2016
7d878cf
[SQL][STREAMING][TEST] Fix flaky tests in StreamingQueryListenerSuite
lw-lin Oct 18, 2016
a9e79a4
[SQL][STREAMING][TEST] Follow up to remove Option.contains for Scala …
tdas Oct 18, 2016
e59df62
[SPARK-17899][SQL][FOLLOW-UP] debug mode should work for corrupted table
cloud-fan Oct 18, 2016
3768653
[SPARK-17388] [SQL] Support for inferring type date/timestamp/decimal…
HyukjinKwon Oct 18, 2016
231f39e
[SPARK-17711] Compress rolled executor log
loneknightpy Oct 18, 2016
4ef39c2
[SPARK-17974] try 2) Refactor FileCatalog classes to simplify the inh…
ericl Oct 18, 2016
bfe7885
[SPARK-17985][CORE] Bump commons-lang3 version to 3.5.
ueshin Oct 18, 2016
20dd110
[MINOR][DOC] Add more built-in sources in sql-programming-guide.md
weiqingy Oct 18, 2016
4518642
[SPARK-17930][CORE] The SerializerInstance instance used when deseria…
witgo Oct 18, 2016
b3130c7
[SPARK-17955][SQL] Make DataFrameReader.jdbc call DataFrameReader.for…
HyukjinKwon Oct 18, 2016
cd662bc
Revert "[SPARK-17985][CORE] Bump commons-lang3 version to 3.5."
rxin Oct 18, 2016
cd106b0
[SPARK-17841][STREAMING][KAFKA] drain commitQueue
koeninger Oct 18, 2016
1e35e96
[SPARK-17817] [PYSPARK] [FOLLOWUP] PySpark RDD Repartitioning Results…
viirya Oct 18, 2016
941b3f9
[SPARK-17731][SQL][STREAMING][FOLLOWUP] Refactored StreamingQueryList…
tdas Oct 19, 2016
5f20ae0
[SPARK-17980][SQL] Fix refreshByPath for converted Hive tables
ericl Oct 19, 2016
2629cd7
[SPARK-17711][TEST-HADOOP2.2] Fix hadoop2.2 compilation error
loneknightpy Oct 19, 2016
4329c5c
[SPARK-17873][SQL] ALTER TABLE RENAME TO should allow users to specif…
cloud-fan Oct 19, 2016
f39852e
[SPARK-18001][DOCUMENT] fix broke link to SparkDataFrame
Wenpei Oct 19, 2016
9540357
[SPARK-17985][CORE] Bump commons-lang3 version to 3.5.
ueshin Oct 19, 2016
444c2d2
[SPARK-10541][WEB UI] Allow ApplicationHistoryProviders to provide th…
ajbozarth Oct 19, 2016
4b2011e
[SPARK-17989][SQL] Check ascendingOrder type in sort_array function r…
HyukjinKwon Oct 20, 2016
f313117
[SPARK-18012][SQL] Simplify WriterContainer
rxin Oct 20, 2016
3975516
[SPARK-18003][SPARK CORE] Fix bug of RDD zipWithIndex & zipWithUnique…
WeichenXu123 Oct 20, 2016
4bd17c4
[SPARK-17991][SQL] Enable metastore partition pruning by default.
ericl Oct 20, 2016
c2c107a
[SPARK-11653][DEPLOY] Allow spark-daemon.sh to run in the foreground
mikejihbe Oct 20, 2016
986a3b8
[SPARK-17796][SQL] Support wildcard character in filename for LOAD DA…
dongjoon-hyun Oct 20, 2016
e895bc2
[SPARK-17860][SQL] SHOW COLUMN's database conflict check should respe…
dilipbiswal Oct 20, 2016
fb0894b
[SPARK-17698][SQL] Join predicates should not contain filter clauses
tejasapatil Oct 20, 2016
84b245f
[SPARK-15780][SQL] Support mapValues on KeyValueGroupedDataset
koertkuipers Oct 20, 2016
947f4f2
[SPARK-17999][KAFKA][SQL] Add getPreferredLocations for KafkaSourceRDD
jerryshao Oct 20, 2016
7f9ec19
[SPARK-18021][SQL] Refactor file name specification for data sources
rxin Oct 20, 2016
2d14ab7
[DOCS] Update docs to not suggest to package Spark before running tests.
markgrover Oct 20, 2016
1bb99c4
[SPARK-18030][TESTS] Adds more checks to collect more info about File…
zsxwing Oct 21, 2016
3180272
[SPARKR] fix warnings
felixcheung Oct 21, 2016
57e97fc
[SPARK-18029][SQL] PruneFileSourcePartitions should not change the ou…
cloud-fan Oct 21, 2016
595893d
[SPARK-17960][PYSPARK][UPGRADE TO PY4J 0.10.4]
jagadeesanas2 Oct 21, 2016
a8ea4da
[SPARK-17331][FOLLOWUP][ML][CORE] Avoid allocating 0-length arrays
zhengruifeng Oct 21, 2016
3a23751
[SPARK-13275][WEB UI] Visually clarified executors start time in time…
ajbozarth Oct 21, 2016
b3b4b95
[SPARK-18034] Upgrade to MiMa 0.1.11 to fix flakiness
JoshRosen Oct 21, 2016
4efdc76
[SPARK-17674][SPARKR] check for warning in test output
felixcheung Oct 21, 2016
e21e1c9
[SPARK-18013][SPARKR] add crossJoin API
felixcheung Oct 21, 2016
e371040
[SPARK-17811] SparkR cannot parallelize data.frame with NA or NULL in…
falaki Oct 21, 2016
7a531e3
[SPARK-17926][SQL][STREAMING] Added json for statuses
tdas Oct 21, 2016
c1f344f
[SPARK-17929][CORE] Fix deadlock when CoarseGrainedSchedulerBackend r…
scwf Oct 21, 2016
1405702
[SPARK-18044][STREAMING] FileStreamSource should not infer partitions…
cloud-fan Oct 21, 2016
268ccb9
[SPARK-17812][SQL][KAFKA] Assign and specific startingOffsets for str…
koeninger Oct 21, 2016
c9720b2
[STREAMING][KAFKA][DOC] clarify kafka settings needed for larger batches
koeninger Oct 21, 2016
3fbf5a5
[SPARK-18042][SQL] OutputWriter should expose file path written
rxin Oct 22, 2016
7178c56
[SPARK-16606][MINOR] Tiny follow-up to , to correct more instances of…
srowen Oct 22, 2016
625fddd
[SPARK-17944][DEPLOY] sbin/start-* scripts use of `hostname -f` fail …
JnyJny Oct 22, 2016
01b26a0
[SPARK-17898][DOCS] repositories needs username and password
srowen Oct 22, 2016
ab3363e
[SPARK-17986][ML] SQLTransformer should remove temporary tables
drewrobb Oct 22, 2016
3eca283
[SPARK-17994][SQL] Add back a file status cache for catalog tables
ericl Oct 22, 2016
5fa9f87
[SPARK-17123][SQL] Use type-widened encoder for DataFrame rather than…
HyukjinKwon Oct 22, 2016
4f1dcd3
[SPARK-18051][SPARK CORE] fix bug of custom PartitionCoalescer causin…
WeichenXu123 Oct 22, 2016
bc167a2
[SPARK-928][CORE] Add support for Unsafe-based serializer in Kryo
techaddict Oct 22, 2016
eff4aed
[SPARK-18035][SQL] Introduce performant and memory efficient APIs to …
tejasapatil Oct 23, 2016
21c7539
[SPARK-18038][SQL] Move output partitioning definition from UnaryNode…
tejasapatil Oct 23, 2016
b158256
[SPARK-18045][SQL][TESTS] Move `HiveDataFrameAnalyticsSuite` to packa…
jiangxb1987 Oct 23, 2016
a81fba0
[SPARK-18058][SQL] Comparing column types ignoring Nullability in Uni…
CodingCat Oct 23, 2016
3a423f5
[SPARKR][BRANCH-2.0] R merge API doc and example fix
felixcheung Oct 23, 2016
c64a8ff
[SPARK-18049][MLLIB][TEST] Add missing tests for truePositiveRate and…
zhengruifeng Oct 24, 2016
4ecbe1b
[SPARK-17810][SQL] Default spark.sql.warehouse.dir is relative to loc…
srowen Oct 24, 2016
81d6933
[SPARK-17894][CORE] Ensure uniqueness of TaskSetManager name.
erenavsarogullari Oct 24, 2016
407c3ce
[SPARK-17624][SQL][STREAMING][TEST] Fixed flaky StateStoreSuite.maint…
tdas Oct 25, 2016
84a3399
[SPARK-18028][SQL] simplify TableFileCatalog
cloud-fan Oct 25, 2016
d479c52
[SPARK-17409][SQL][FOLLOW-UP] Do Not Optimize Query in CTAS More Than…
gatorsmile Oct 25, 2016
483c37c
[SPARK-17894][HOTFIX] Fix broken build from
kayousterhout Oct 25, 2016
78d740a
[SPARK-17748][ML] One pass solver for Weighted Least Squares with Ela…
sethah Oct 25, 2016
6f31833
[SPARK-18026][SQL] should not always lowercase partition columns of p…
cloud-fan Oct 25, 2016
38cdd6c
[SPARK-14634][ML][FOLLOWUP] Delete superfluous line in BisectingKMeans
zhengruifeng Oct 25, 2016
ac8ff92
[SPARK-17748][FOLLOW-UP][ML] Fix build error for Scala 2.10.
yanboliang Oct 25, 2016
c5fe3dd
[SPARK-18010][CORE] Reduce work performed for building up the applica…
vijoshi Oct 25, 2016
a21791e
[SPARK-18070][SQL] binary operator should not consider nullability wh…
cloud-fan Oct 25, 2016
2c7394a
[SPARK-18019][ML] Add instrumentation to GBTs
sethah Oct 25, 2016
c329a56
[SPARK-16988][SPARK SHELL] spark history server log needs to be fixed…
Oct 25, 2016
12b3e8d
[SPARK-18007][SPARKR][ML] update SparkR MLP - add initalWeights param…
WeichenXu123 Oct 26, 2016
93b8ad1
[SPARK-17693][SQL] Fixed Insert Failure To Data Source Tables when th…
gatorsmile Oct 26, 2016
6c7d094
[SPARK-18022][SQL] java.lang.NullPointerException instead of real exc…
srowen Oct 26, 2016
2978136
[SPARK-18027][YARN] .sparkStaging not clean on RM ApplicationNotFound…
srowen Oct 26, 2016
5d0f81d
[SPARK-4411][WEB UI] Add "kill" link for jobs in the UI
ajbozarth Oct 26, 2016
402205d
[SPARK-17802] Improved caller context logging.
lins05 Oct 26, 2016
3c02357
[SPARK-17733][SQL] InferFiltersFromConstraints rule never terminates …
jiangxb1987 Oct 26, 2016
4bee954
[SPARK-18093][SQL] Fix default value test in SQLConfSuite to work rega…
markgrover Oct 26, 2016
312ea3f
[SPARK-17748][FOLLOW-UP][ML] Reorg variables of WeightedLeastSquares.
yanboliang Oct 26, 2016
7ac70e7
[SPARK-13747][SQL] Fix concurrent executions in ForkJoinPool for SQL
zsxwing Oct 26, 2016
fa7d9d7
[SPARK-18063][SQL] Failed to infer constraints over multiple aliases
jiangxb1987 Oct 26, 2016
7d10631
[SPARK-18104][DOC] Don't build KafkaSource doc
zsxwing Oct 26, 2016
ea3605e
[MINOR][ML] Refactor clustering summary.
yanboliang Oct 26, 2016
fb0a8a8
[SPARK-17961][SPARKR][SQL] Add storageLevel to DataFrame for SparkR
WeichenXu123 Oct 26, 2016
dcdda19
[SPARK-14300][DOCS][MLLIB] Scala MLlib examples code merge and clean up
keypointt Oct 26, 2016
5b7d403
[SPARK-18094][SQL][TESTS] Move group analytics test cases from `SQLQu…
jiangxb1987 Oct 26, 2016
29cea8f
[SPARK-17157][SPARKR] Add multiclass logistic regression SparkR Wrapper
wangmiao1981 Oct 26, 2016
a76846c
[SPARK-18126][SPARK-CORE] getIteratorZipWithIndex accepts negative va…
Oct 26, 2016
5b27598
[SPARK-16963][STREAMING][SQL] Changes to Source trait and related imp…
frreiss Oct 27, 2016
f1aeed8
[SPARK-17770][CATALYST] making ObjectType public
Oct 27, 2016
dd4f088
[SPARK-18009][SQL] Fix ClassCastException while calling toLocalIterat…
dilipbiswal Oct 27, 2016
d3b4831
[SPARK-18132] Fix checkstyle
yhuai Oct 27, 2016
1dbe989
[SPARK-17157][SPARKR][FOLLOW-UP] doc fixes
felixcheung Oct 27, 2016
44c8bfd
[SQL][DOC] updating doc for JSON source to link to jsonlines.org
felixcheung Oct 27, 2016
701a9d3
[SPARK-CORE][TEST][MINOR] Fix the wrong comment in test
wangmiao1981 Oct 27, 2016
1042325
[SPARK-17813][SQL][KAFKA] Maximum data per trigger
koeninger Oct 27, 2016
0b076d4
[SPARK-17219][ML] enhanced NaN value handling in Bucketizer
Oct 27, 2016
79fd0cc
[SPARK-16963][SQL] Fix test "StreamExecution metadata garbage collect…
zsxwing Oct 27, 2016
ccb1154
[SPARK-17970][SQL] store partition spec in metastore for data source …
ericl Oct 27, 2016
ab5f938
[SPARK-18121][SQL] Unable to query global temp views when hive suppor…
skambha Oct 28, 2016
569788a
[SPARK-18109][ML] Add instrumentation to GMM
zhengruifeng Oct 28, 2016
e9746f8
[SPARK-18133][EXAMPLES][ML] Python ML Pipeline Example has syntax e…
jagadeesanas2 Oct 28, 2016
ac26e9c
[SPARK-5992][ML] Locality Sensitive Hashing
Yunni Oct 28, 2016
59cccbd
[SPARK-18164][SQL] ForeachSink should fail the Spark job if `process`…
zsxwing Oct 29, 2016
d2d438d
[SPARK-18167][SQL] Add debug code for SQLQuerySuite flakiness when me…
ericl Oct 29, 2016
505b927
[SPARK-16312][FOLLOW-UP][STREAMING][KAFKA][DOC] Add java code snippet…
lw-lin Oct 30, 2016
a489567
[SPARK-3261][MLLIB] KMeans clusterer can return duplicate cluster cen…
srowen Oct 30, 2016
3ad99f1
[SPARK-18146][SQL] Avoid using Union to chain together create table a…
ericl Oct 30, 2016
90d3b91
[SPARK-18103][SQL] Rename *FileCatalog to *FileIndex
ericl Oct 30, 2016
8ae2da0
[SPARK-18106][SQL] ANALYZE TABLE should raise a ParseException for in…
dongjoon-hyun Oct 30, 2016
2881a2d
[SPARK-17919] Make timeout to RBackend configurable in SparkR
falaki Oct 30, 2016
b6879b8
[SPARK-16137][SPARKR] randomForest for R
felixcheung Oct 30, 2016
7c37869
[SPARK-18110][PYTHON][ML] add missing parameter in Python for RandomF…
felixcheung Oct 30, 2016
d2923f1
[SPARK-18143][SQL] Ignore Structured Streaming event logs to avoid br…
zsxwing Oct 31, 2016
26b07f1
[BUILD] Close stale Pull Requests.
srowen Oct 31, 2016
1350845
Merge branch 'master' into palantir-master
Oct 31, 2016
ee3953d
fix merge
Oct 31, 2016
08fac96
parquet bump
Oct 31, 2016
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -263,7 +263,7 @@ The text of each license is also included at licenses/LICENSE-[project].txt.
(New BSD license) Protocol Buffer Java API (org.spark-project.protobuf:protobuf-java:2.4.1-shaded - http://code.google.com/p/protobuf)
(The BSD License) Fortran to Java ARPACK (net.sourceforge.f2j:arpack_combined_all:0.1 - http://f2j.sourceforge.net)
(The BSD License) xmlenc Library (xmlenc:xmlenc:0.52 - http://xmlenc.sourceforge.net)
(The New BSD License) Py4J (net.sf.py4j:py4j:0.10.3 - http://py4j.sourceforge.net/)
(The New BSD License) Py4J (net.sf.py4j:py4j:0.10.4 - http://py4j.sourceforge.net/)
(Two-clause BSD-style license) JUnit-Interface (com.novocode:junit-interface:0.10 - https://github.com/szeiger/junit-interface/)
(BSD licence) sbt and sbt-launch-lib.bash
(BSD 3 Clause) d3.min.js (https://github.com/mbostock/d3/blob/master/LICENSE)
Expand Down
14 changes: 11 additions & 3 deletions R/pkg/NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
importFrom("methods", "setGeneric", "setMethod", "setOldClass")
importFrom("methods", "is", "new", "signature", "show")
importFrom("stats", "gaussian", "setNames")
importFrom("utils", "download.file", "packageVersion", "untar")
importFrom("utils", "download.file", "object.size", "packageVersion", "untar")

# Disable native libraries till we figure out how to package it
# See SPARKR-7839
Expand Down Expand Up @@ -43,7 +43,9 @@ exportMethods("glm",
"spark.isoreg",
"spark.gaussianMixture",
"spark.als",
"spark.kstest")
"spark.kstest",
"spark.logit",
"spark.randomForest")

# Job group lifecycle management methods
export("setJobGroup",
Expand Down Expand Up @@ -71,6 +73,7 @@ exportMethods("arrange",
"covar_samp",
"covar_pop",
"createOrReplaceTempView",
"crossJoin",
"crosstab",
"dapply",
"dapplyCollect",
Expand Down Expand Up @@ -123,6 +126,7 @@ exportMethods("arrange",
"selectExpr",
"show",
"showDF",
"storageLevel",
"subset",
"summarize",
"summary",
Expand Down Expand Up @@ -347,7 +351,9 @@ export("as.DataFrame",
"uncacheTable",
"print.summary.GeneralizedLinearRegressionModel",
"read.ml",
"print.summary.KSTest")
"print.summary.KSTest",
"print.summary.RandomForestRegressionModel",
"print.summary.RandomForestClassificationModel")

export("structField",
"structField.jobj",
Expand All @@ -372,6 +378,8 @@ S3method(print, structField)
S3method(print, structType)
S3method(print, summary.GeneralizedLinearRegressionModel)
S3method(print, summary.KSTest)
S3method(print, summary.RandomForestRegressionModel)
S3method(print, summary.RandomForestClassificationModel)
S3method(structField, character)
S3method(structField, jobj)
S3method(structType, jobj)
Expand Down
92 changes: 76 additions & 16 deletions R/pkg/R/DataFrame.R
Original file line number Diff line number Diff line change
Expand Up @@ -365,7 +365,7 @@ setMethod("colnames<-",

# Check if the column names have . in it
if (any(regexec(".", value, fixed = TRUE)[[1]][1] != -1)) {
stop("Colum names cannot contain the '.' symbol.")
stop("Column names cannot contain the '.' symbol.")
}

sdf <- callJMethod(x@sdf, "toDF", as.list(value))
Expand Down Expand Up @@ -633,7 +633,7 @@ setMethod("persist",
#' @param ... further arguments to be passed to or from other methods.
#'
#' @family SparkDataFrame functions
#' @rdname unpersist-methods
#' @rdname unpersist
#' @aliases unpersist,SparkDataFrame-method
#' @name unpersist
#' @export
Expand All @@ -654,6 +654,32 @@ setMethod("unpersist",
x
})

#' StorageLevel
#'
#' Get storagelevel of this SparkDataFrame.
#'
#' @param x the SparkDataFrame to get the storageLevel.
#'
#' @family SparkDataFrame functions
#' @rdname storageLevel
#' @aliases storageLevel,SparkDataFrame-method
#' @name storageLevel
#' @export
#' @examples
#'\dontrun{
#' sparkR.session()
#' path <- "path/to/file.json"
#' df <- read.json(path)
#' persist(df, "MEMORY_AND_DISK")
#' storageLevel(df)
#'}
#' @note storageLevel since 2.1.0
setMethod("storageLevel",
signature(x = "SparkDataFrame"),
function(x) {
storageLevelToString(callJMethod(x@sdf, "storageLevel"))
})

#' Repartition
#'
#' The following options for repartition are possible:
Expand Down Expand Up @@ -735,7 +761,8 @@ setMethod("toJSON",

#' Save the contents of SparkDataFrame as a JSON file
#'
#' Save the contents of a SparkDataFrame as a JSON file (one object per line). Files written out
#' Save the contents of a SparkDataFrame as a JSON file (\href{http://jsonlines.org/}{
#' JSON Lines text format or newline-delimited JSON}). Files written out
#' with this method can be read back in as a SparkDataFrame using read.json().
#'
#' @param x A SparkDataFrame
Expand Down Expand Up @@ -2271,12 +2298,13 @@ setMethod("dropDuplicates",

#' Join
#'
#' Join two SparkDataFrames based on the given join expression.
#' Joins two SparkDataFrames based on the given join expression.
#'
#' @param x A SparkDataFrame
#' @param y A SparkDataFrame
#' @param joinExpr (Optional) The expression used to perform the join. joinExpr must be a
#' Column expression. If joinExpr is omitted, join() will perform a Cartesian join
#' Column expression. If joinExpr is omitted, the default, inner join is attempted and an error is
#' thrown if it would be a Cartesian Product. For Cartesian join, use crossJoin instead.
#' @param joinType The type of join to perform. The following join types are available:
#' 'inner', 'outer', 'full', 'fullouter', leftouter', 'left_outer', 'left',
#' 'right_outer', 'rightouter', 'right', and 'leftsemi'. The default joinType is "inner".
Expand All @@ -2285,23 +2313,24 @@ setMethod("dropDuplicates",
#' @aliases join,SparkDataFrame,SparkDataFrame-method
#' @rdname join
#' @name join
#' @seealso \link{merge}
#' @seealso \link{merge} \link{crossJoin}
#' @export
#' @examples
#'\dontrun{
#' sparkR.session()
#' df1 <- read.json(path)
#' df2 <- read.json(path2)
#' join(df1, df2) # Performs a Cartesian
#' join(df1, df2, df1$col1 == df2$col2) # Performs an inner join based on expression
#' join(df1, df2, df1$col1 == df2$col2, "right_outer")
#' join(df1, df2) # Attempts an inner join
#' }
#' @note join since 1.4.0
setMethod("join",
signature(x = "SparkDataFrame", y = "SparkDataFrame"),
function(x, y, joinExpr = NULL, joinType = NULL) {
if (is.null(joinExpr)) {
sdf <- callJMethod(x@sdf, "crossJoin", y@sdf)
# this may not fail until the planner checks for Cartesian join later on.
sdf <- callJMethod(x@sdf, "join", y@sdf)
} else {
if (class(joinExpr) != "Column") stop("joinExpr must be a Column")
if (is.null(joinType)) {
Expand All @@ -2322,22 +2351,52 @@ setMethod("join",
dataFrame(sdf)
})

#' CrossJoin
#'
#' Returns Cartesian Product on two SparkDataFrames.
#'
#' @param x A SparkDataFrame
#' @param y A SparkDataFrame
#' @return A SparkDataFrame containing the result of the join operation.
#' @family SparkDataFrame functions
#' @aliases crossJoin,SparkDataFrame,SparkDataFrame-method
#' @rdname crossJoin
#' @name crossJoin
#' @seealso \link{merge} \link{join}
#' @export
#' @examples
#'\dontrun{
#' sparkR.session()
#' df1 <- read.json(path)
#' df2 <- read.json(path2)
#' crossJoin(df1, df2) # Performs a Cartesian
#' }
#' @note crossJoin since 2.1.0
setMethod("crossJoin",
signature(x = "SparkDataFrame", y = "SparkDataFrame"),
function(x, y) {
sdf <- callJMethod(x@sdf, "crossJoin", y@sdf)
dataFrame(sdf)
})

#' Merges two data frames
#'
#' @name merge
#' @param x the first data frame to be joined
#' @param y the second data frame to be joined
#' @param x the first data frame to be joined.
#' @param y the second data frame to be joined.
#' @param by a character vector specifying the join columns. If by is not
#' specified, the common column names in \code{x} and \code{y} will be used.
#' If by or both by.x and by.y are explicitly set to NULL or of length 0, the Cartesian
#' Product of x and y will be returned.
#' @param by.x a character vector specifying the joining columns for x.
#' @param by.y a character vector specifying the joining columns for y.
#' @param all a boolean value setting \code{all.x} and \code{all.y}
#' if any of them are unset.
#' @param all.x a boolean value indicating whether all the rows in x should
#' be including in the join
#' be including in the join.
#' @param all.y a boolean value indicating whether all the rows in y should
#' be including in the join
#' @param sort a logical argument indicating whether the resulting columns should be sorted
#' be including in the join.
#' @param sort a logical argument indicating whether the resulting columns should be sorted.
#' @param suffixes a string vector of length 2 used to make colnames of
#' \code{x} and \code{y} unique.
#' The first element is appended to each colname of \code{x}.
Expand All @@ -2351,20 +2410,21 @@ setMethod("join",
#' @family SparkDataFrame functions
#' @aliases merge,SparkDataFrame,SparkDataFrame-method
#' @rdname merge
#' @seealso \link{join}
#' @seealso \link{join} \link{crossJoin}
#' @export
#' @examples
#'\dontrun{
#' sparkR.session()
#' df1 <- read.json(path)
#' df2 <- read.json(path2)
#' merge(df1, df2) # Performs a Cartesian
#' merge(df1, df2) # Performs an inner join by common columns
#' merge(df1, df2, by = "col1") # Performs an inner join based on expression
#' merge(df1, df2, by.x = "col1", by.y = "col2", all.y = TRUE)
#' merge(df1, df2, by.x = "col1", by.y = "col2", all.x = TRUE)
#' merge(df1, df2, by.x = "col1", by.y = "col2", all.x = TRUE, all.y = TRUE)
#' merge(df1, df2, by.x = "col1", by.y = "col2", all = TRUE, sort = FALSE)
#' merge(df1, df2, by = "col1", all = TRUE, suffixes = c("-X", "-Y"))
#' merge(df1, df2, by = NULL) # Performs a Cartesian join
#' }
#' @note merge since 1.5.0
setMethod("merge",
Expand Down Expand Up @@ -2401,7 +2461,7 @@ setMethod("merge",
joinY <- by
} else {
# if by or both by.x and by.y have length 0, use Cartesian Product
joinRes <- join(x, y)
joinRes <- crossJoin(x, y)
return (joinRes)
}

Expand Down
2 changes: 1 addition & 1 deletion R/pkg/R/RDD.R
Original file line number Diff line number Diff line change
Expand Up @@ -261,7 +261,7 @@ setMethod("persistRDD",
#' cache(rdd) # rdd@@env$isCached == TRUE
#' unpersistRDD(rdd) # rdd@@env$isCached == FALSE
#'}
#' @rdname unpersist-methods
#' @rdname unpersist
#' @aliases unpersist,RDD-method
#' @noRd
setMethod("unpersistRDD",
Expand Down
3 changes: 2 additions & 1 deletion R/pkg/R/SQLContext.R
Original file line number Diff line number Diff line change
Expand Up @@ -324,7 +324,8 @@ setMethod("toDF", signature(x = "RDD"),

#' Create a SparkDataFrame from a JSON file.
#'
#' Loads a JSON file (one object per line), returning the result as a SparkDataFrame
#' Loads a JSON file (\href{http://jsonlines.org/}{JSON Lines text format or newline-delimited JSON}
#' ), returning the result as a SparkDataFrame
#' It goes through the entire dataset once to determine the schema.
#'
#' @param path Path of file to read. A vector of multiple paths is allowed.
Expand Down
20 changes: 17 additions & 3 deletions R/pkg/R/backend.R
Original file line number Diff line number Diff line change
Expand Up @@ -108,13 +108,27 @@ invokeJava <- function(isStatic, objId, methodName, ...) {
conn <- get(".sparkRCon", .sparkREnv)
writeBin(requestMessage, conn)

# TODO: check the status code to output error information
returnStatus <- readInt(conn)
handleErrors(returnStatus, conn)

# Backend will send +1 as keep alive value to prevent various connection timeouts
# on very long running jobs. See spark.r.heartBeatInterval
while (returnStatus == 1) {
returnStatus <- readInt(conn)
handleErrors(returnStatus, conn)
}

readObject(conn)
}

# Helper function to check for returned errors and print appropriate error message to user
handleErrors <- function(returnStatus, conn) {
if (length(returnStatus) == 0) {
stop("No status is returned. Java SparkR backend might have failed.")
}
if (returnStatus != 0) {

# 0 is success and +1 is reserved for heartbeats. Other negative values indicate errors.
if (returnStatus < 0) {
stop(readString(conn))
}
readObject(conn)
}
2 changes: 1 addition & 1 deletion R/pkg/R/client.R
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@

# Creates a SparkR client connection object
# if one doesn't already exist
connectBackend <- function(hostname, port, timeout = 6000) {
connectBackend <- function(hostname, port, timeout) {
if (exists(".sparkRcon", envir = .sparkREnv)) {
if (isOpen(.sparkREnv[[".sparkRCon"]])) {
cat("SparkRBackend client connection already exists\n")
Expand Down
Loading