Skip to content

Commit 2a81517

Browse files
author
Hemant Bhanawat
committed
Merge branch 'master' into pluggableScheduler
Conflicts: core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
2 parents 696cc71 + f4be094 commit 2a81517

File tree

1,502 files changed

+65132
-32864
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,502 files changed

+65132
-32864
lines changed

LICENSE

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -238,6 +238,7 @@ The text of each license is also included at licenses/LICENSE-[project].txt.
238238
(BSD 3 Clause) netlib core (com.github.fommil.netlib:core:1.1.2 - https://github.com/fommil/netlib-java/core)
239239
(BSD 3 Clause) JPMML-Model (org.jpmml:pmml-model:1.2.7 - https://github.com/jpmml/jpmml-model)
240240
(BSD License) AntLR Parser Generator (antlr:antlr:2.7.7 - http://www.antlr.org/)
241+
(BSD License) ANTLR 4.5.2-1 (org.antlr:antlr4:4.5.2-1 - http://wwww.antlr.org/)
241242
(BSD licence) ANTLR ST4 4.0.4 (org.antlr:ST4:4.0.4 - http://www.stringtemplate.org)
242243
(BSD licence) ANTLR StringTemplate (org.antlr:stringtemplate:3.2.1 - http://www.stringtemplate.org)
243244
(BSD License) Javolution (javolution:javolution:5.5.1 - http://javolution.org)
@@ -256,9 +257,8 @@ The text of each license is also included at licenses/LICENSE-[project].txt.
256257
(BSD-style) scalacheck (org.scalacheck:scalacheck_2.11:1.10.0 - http://www.scalacheck.org)
257258
(BSD-style) spire (org.spire-math:spire_2.11:0.7.1 - http://spire-math.org)
258259
(BSD-style) spire-macros (org.spire-math:spire-macros_2.11:0.7.1 - http://spire-math.org)
259-
(New BSD License) Kryo (com.esotericsoftware.kryo:kryo:2.21 - http://code.google.com/p/kryo/)
260-
(New BSD License) MinLog (com.esotericsoftware.minlog:minlog:1.2 - http://code.google.com/p/minlog/)
261-
(New BSD License) ReflectASM (com.esotericsoftware.reflectasm:reflectasm:1.07 - http://code.google.com/p/reflectasm/)
260+
(New BSD License) Kryo (com.esotericsoftware:kryo:3.0.3 - https://github.com/EsotericSoftware/kryo)
261+
(New BSD License) MinLog (com.esotericsoftware:minlog:1.3.0 - https://github.com/EsotericSoftware/minlog)
262262
(New BSD license) Protocol Buffer Java API (com.google.protobuf:protobuf-java:2.5.0 - http://code.google.com/p/protobuf)
263263
(New BSD license) Protocol Buffer Java API (org.spark-project.protobuf:protobuf-java:2.4.1-shaded - http://code.google.com/p/protobuf)
264264
(The BSD License) Fortran to Java ARPACK (net.sourceforge.f2j:arpack_combined_all:0.1 - http://f2j.sourceforge.net)

NOTICE

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,6 @@ Eclipse Public License 1.0
4848

4949
The following components are provided under the Eclipse Public License 1.0. See project link for details.
5050

51-
(Eclipse Public License - Version 1.0) mqtt-client (org.eclipse.paho:mqtt-client:0.4.0 - http://www.eclipse.org/paho/mqtt-client)
5251
(Eclipse Public License v1.0) Eclipse JDT Core (org.eclipse.jdt:core:3.1.1 - http://www.eclipse.org/jdt/)
5352

5453
========================================================================

R/README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ To set other options like driver memory, executor memory etc. you can pass in th
4040
If you wish to use SparkR from RStudio or other R frontends you will need to set some environment variables which point SparkR to your Spark installation. For example
4141
```
4242
# Set this to where Spark is installed
43-
Sys.setenv(SPARK_HOME="/Users/shivaram/spark")
43+
Sys.setenv(SPARK_HOME="/Users/username/spark")
4444
# This line loads SparkR from the installed directory
4545
.libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
4646
library(SparkR)
@@ -51,7 +51,7 @@ sc <- sparkR.init(master="local")
5151

5252
The [instructions](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark) for making contributions to Spark also apply to SparkR.
5353
If you only make R file changes (i.e. no Scala changes) then you can just re-install the R package using `R/install-dev.sh` and test your changes.
54-
Once you have made your changes, please include unit tests for them and run existing unit tests using the `run-tests.sh` script as described below.
54+
Once you have made your changes, please include unit tests for them and run existing unit tests using the `R/run-tests.sh` script as described below.
5555

5656
#### Generating documentation
5757

@@ -60,17 +60,17 @@ The SparkR documentation (Rd files and HTML files) are not a part of the source
6060
### Examples, Unit tests
6161

6262
SparkR comes with several sample programs in the `examples/src/main/r` directory.
63-
To run one of them, use `./bin/sparkR <filename> <args>`. For example:
63+
To run one of them, use `./bin/spark-submit <filename> <args>`. For example:
6464

65-
./bin/sparkR examples/src/main/r/dataframe.R
65+
./bin/spark-submit examples/src/main/r/dataframe.R
6666

6767
You can also run the unit-tests for SparkR by running (you need to install the [testthat](http://cran.r-project.org/web/packages/testthat/index.html) package first):
6868

6969
R -e 'install.packages("testthat", repos="http://cran.us.r-project.org")'
7070
./R/run-tests.sh
7171

7272
### Running on YARN
73-
The `./bin/spark-submit` and `./bin/sparkR` can also be used to submit jobs to YARN clusters. You will need to set YARN conf dir before doing so. For example on CDH you can run
73+
The `./bin/spark-submit` can also be used to submit jobs to YARN clusters. You will need to set YARN conf dir before doing so. For example on CDH you can run
7474
```
7575
export YARN_CONF_DIR=/etc/hadoop/conf
7676
./bin/spark-submit --master yarn examples/src/main/r/dataframe.R

R/pkg/DESCRIPTION

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,9 @@ Depends:
1111
R (>= 3.0),
1212
methods,
1313
Suggests:
14-
testthat
14+
testthat,
15+
e1071,
16+
survival
1517
Description: R frontend for Spark
1618
License: Apache License (== 2.0)
1719
Collate:

R/pkg/NAMESPACE

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,9 @@ exportMethods("glm",
1515
"predict",
1616
"summary",
1717
"kmeans",
18-
"fitted")
18+
"fitted",
19+
"naiveBayes",
20+
"survreg")
1921

2022
# Job group lifecycle management methods
2123
export("setJobGroup",
@@ -263,6 +265,7 @@ exportMethods("%in%",
263265
"var_samp",
264266
"weekofyear",
265267
"when",
268+
"window",
266269
"year")
267270

268271
exportClasses("GroupedData")
@@ -289,7 +292,8 @@ export("as.DataFrame",
289292
"tableToDF",
290293
"tableNames",
291294
"tables",
292-
"uncacheTable")
295+
"uncacheTable",
296+
"print.summary.GeneralizedLinearRegressionModel")
293297

294298
export("structField",
295299
"structField.jobj",

R/pkg/R/functions.R

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2131,6 +2131,69 @@ setMethod("from_unixtime", signature(x = "Column"),
21312131
column(jc)
21322132
})
21332133

2134+
#' window
2135+
#'
2136+
#' Bucketize rows into one or more time windows given a timestamp specifying column. Window
2137+
#' starts are inclusive but the window ends are exclusive, e.g. 12:05 will be in the window
2138+
#' [12:05,12:10) but not in [12:00,12:05). Windows can support microsecond precision. Windows in
2139+
#' the order of months are not supported.
2140+
#'
2141+
#' The time column must be of TimestampType.
2142+
#'
2143+
#' Durations are provided as strings, e.g. '1 second', '1 day 12 hours', '2 minutes'. Valid
2144+
#' interval strings are 'week', 'day', 'hour', 'minute', 'second', 'millisecond', 'microsecond'.
2145+
#' If the `slideDuration` is not provided, the windows will be tumbling windows.
2146+
#'
2147+
#' The startTime is the offset with respect to 1970-01-01 00:00:00 UTC with which to start
2148+
#' window intervals. For example, in order to have hourly tumbling windows that start 15 minutes
2149+
#' past the hour, e.g. 12:15-13:15, 13:15-14:15... provide `startTime` as `15 minutes`.
2150+
#'
2151+
#' The output column will be a struct called 'window' by default with the nested columns 'start'
2152+
#' and 'end'.
2153+
#'
2154+
#' @family datetime_funcs
2155+
#' @rdname window
2156+
#' @name window
2157+
#' @export
2158+
#' @examples
2159+
#'\dontrun{
2160+
#' # One minute windows every 15 seconds 10 seconds after the minute, e.g. 09:00:10-09:01:10,
2161+
#' # 09:00:25-09:01:25, 09:00:40-09:01:40, ...
2162+
#' window(df$time, "1 minute", "15 seconds", "10 seconds")
2163+
#'
2164+
#' # One minute tumbling windows 15 seconds after the minute, e.g. 09:00:15-09:01:15,
2165+
#' # 09:01:15-09:02:15...
2166+
#' window(df$time, "1 minute", startTime = "15 seconds")
2167+
#'
2168+
#' # Thirty second windows every 10 seconds, e.g. 09:00:00-09:00:30, 09:00:10-09:00:40, ...
2169+
#' window(df$time, "30 seconds", "10 seconds")
2170+
#'}
2171+
setMethod("window", signature(x = "Column"),
2172+
function(x, windowDuration, slideDuration = NULL, startTime = NULL) {
2173+
stopifnot(is.character(windowDuration))
2174+
if (!is.null(slideDuration) && !is.null(startTime)) {
2175+
stopifnot(is.character(slideDuration) && is.character(startTime))
2176+
jc <- callJStatic("org.apache.spark.sql.functions",
2177+
"window",
2178+
x@jc, windowDuration, slideDuration, startTime)
2179+
} else if (!is.null(slideDuration)) {
2180+
stopifnot(is.character(slideDuration))
2181+
jc <- callJStatic("org.apache.spark.sql.functions",
2182+
"window",
2183+
x@jc, windowDuration, slideDuration)
2184+
} else if (!is.null(startTime)) {
2185+
stopifnot(is.character(startTime))
2186+
jc <- callJStatic("org.apache.spark.sql.functions",
2187+
"window",
2188+
x@jc, windowDuration, windowDuration, startTime)
2189+
} else {
2190+
jc <- callJStatic("org.apache.spark.sql.functions",
2191+
"window",
2192+
x@jc, windowDuration)
2193+
}
2194+
column(jc)
2195+
})
2196+
21342197
#' locate
21352198
#'
21362199
#' Locate the position of the first occurrence of substr.

R/pkg/R/generics.R

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1152,6 +1152,10 @@ setGeneric("var_samp", function(x) { standardGeneric("var_samp") })
11521152
#' @export
11531153
setGeneric("weekofyear", function(x) { standardGeneric("weekofyear") })
11541154

1155+
#' @rdname window
1156+
#' @export
1157+
setGeneric("window", function(x, ...) { standardGeneric("window") })
1158+
11551159
#' @rdname year
11561160
#' @export
11571161
setGeneric("year", function(x) { standardGeneric("year") })
@@ -1175,3 +1179,11 @@ setGeneric("kmeans")
11751179
#' @rdname fitted
11761180
#' @export
11771181
setGeneric("fitted")
1182+
1183+
#' @rdname naiveBayes
1184+
#' @export
1185+
setGeneric("naiveBayes", function(formula, data, ...) { standardGeneric("naiveBayes") })
1186+
1187+
#' @rdname survreg
1188+
#' @export
1189+
setGeneric("survreg", function(formula, data, ...) { standardGeneric("survreg") })

0 commit comments

Comments
 (0)