Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
342 commits
Select commit Hold shift + click to select a range
608353c
[SPARK-9404][SPARK-9542][SQL] unsafe array data and map data
cloud-fan Aug 3, 2015
98d6d9c
[SPARK-9549][SQL] fix bugs in expressions
yjshen Aug 3, 2015
1ebd41b
[SPARK-9240] [SQL] Hybrid aggregate operator using unsafe row
yhuai Aug 3, 2015
95dccc6
[SPARK-8873] [MESOS] Clean up shuffle files if external shuffle servi…
tnachen Aug 3, 2015
137f478
[SPARK-9551][SQL] add a cheap version of copy for UnsafeRow to reuse …
cloud-fan Aug 3, 2015
191bf26
[SPARK-9518] [SQL] cleanup generated UnsafeRowJoiner and fix bug
Aug 3, 2015
8be198c
Two minor comments from code review on 191bf2689.
rxin Aug 3, 2015
69f5a7c
[SPARK-9528] [ML] Changed RandomForestClassifier to extend Probabilis…
jkbradley Aug 3, 2015
b41a327
[SPARK-1855] Local checkpointing
Aug 3, 2015
dfe7bd1
[SPARK-9511] [SQL] Fixed Table Name Parsing
Aug 3, 2015
7a9d09f
[SQL][minor] Simplify UnsafeRow.calculateBitSetWidthInBytes.
rxin Aug 3, 2015
703e44b
[SPARK-9554] [SQL] Enables in-memory partition pruning by default
liancheng Aug 3, 2015
ff9169a
[SPARK-5133] [ML] Added featureImportance to RandomForestClassifier a…
jkbradley Aug 3, 2015
ba1c4e1
[SPARK-9558][DOCS]Update docs to follow the increase of memory defaults.
sarutak Aug 3, 2015
8ca287e
[SPARK-9191] [ML] [Doc] Add ml.PCA user guide and code examples
yanboliang Aug 3, 2015
e4765a4
[SPARK-9544] [MLLIB] add Python API for RFormula
mengxr Aug 3, 2015
702aa9d
[SPARK-8735] [SQL] Expose memory usage for shuffles, joins and aggreg…
Aug 3, 2015
b2e4b85
Revert "[SPARK-9372] [SQL] Filter nulls in join keys"
rxin Aug 3, 2015
a2409d1
[SPARK-8064] [SQL] Build against Hive 1.2.1
steveloughran Aug 3, 2015
13675c7
[SPARK-8874] [ML] Add missing methods in Word2Vec
MechCoder Aug 3, 2015
7abaaad
Add a prerequisites section for building docs
shivaram Aug 4, 2015
b79b4f5
[SPARK-9483] Fix UTF8String.getPrefix for big-endian.
mtbrandy Aug 4, 2015
1633d0a
[SPARK-9263] Added flags to exclude dependencies when using --packages
brkyvz Aug 4, 2015
3b0e444
[SPARK-8416] highlight and topping the executor threads in thread dum…
CodingCat Aug 4, 2015
5eb89f6
[SPARK-9577][SQL] Surface concrete iterator types in various sort cla…
rxin Aug 4, 2015
0afa6fb
[SPARK-9521] [DOCS] Addendum. Require Maven 3.3.3+ in the build
srowen Aug 4, 2015
9e952ec
[SPARK-3190] [GRAPHX] Fix VertexRDD.count() overflow regression
ankurdave Aug 4, 2015
76d7409
[SPARK-9534] [BUILD] Enable javac lint for scalac parity; fix a lot o…
srowen Aug 4, 2015
b211cbc
[SPARK-8064] [BUILD] Follow-up. Undo change from SPARK-9507 that was …
tedyu Aug 4, 2015
cb7fa0a
[SPARK-2016] [WEBUI] RDD partition table pagination for the RDD Page
carsonwang Aug 4, 2015
d702d53
[SPARK-9583] [BUILD] Do not print mvn debug messages to stdout.
Aug 4, 2015
b1f88a3
[SPARK-8244] [SQL] string function: find in set
tarekbecker Aug 4, 2015
73dedb5
[SPARK-8246] [SQL] Implement get_json_object
Aug 4, 2015
b5034c9
[SPARK-9541] [SQL] DataTimeUtils cleanup
yjshen Aug 4, 2015
6a0f8b9
[SPARK-9562] Change reference to amplab/spark-ec2 from mesos/
shivaram Aug 4, 2015
34a0eb2
[SPARK-9512][SQL] Revert SPARK-9251, Allow evaluation while sorting
marmbrus Aug 4, 2015
5a23213
[SPARK-8069] [ML] Add multiclass thresholds for ProbabilisticClassifier
holdenk Aug 4, 2015
a0cc017
[SPARK-9606] [SQL] Ignore flaky thrift server tests
marmbrus Aug 4, 2015
f4b1ac0
[SPARK-9553][SQL] remove the no-longer-necessary createCode and creat…
cloud-fan Aug 4, 2015
ab8ee1a
[SPARK-9452] [SQL] Support records larger than page size in UnsafeExt…
JoshRosen Aug 4, 2015
9d668b7
[SPARK-9602] remove "Akka/Actor" words from comments
CodingCat Aug 4, 2015
e375456
[SPARK-9447] [ML] [PYTHON] Added HasRawPredictionCol, HasProbabilityC…
jkbradley Aug 4, 2015
ecd4163
Adding a comment
mccheah Aug 4, 2015
1833d9c
[SPARK-9582] [ML] LDA cleanups
jkbradley Aug 4, 2015
571d5b5
[SPARK-6485] [MLLIB] [PYTHON] Add CoordinateMatrix/RowMatrix/IndexedR…
dusenberrymw Aug 4, 2015
b77d3b9
[SPARK-9586] [ML] Update BinaryClassificationEvaluator to use setRawP…
jkbradley Aug 4, 2015
7c8fc1f
[SPARK-9598][SQL] do not expose generic getter in internal row
cloud-fan Aug 5, 2015
629e26f
[SPARK-9609] [MLLIB] Fix spelling of Strategy.defaultStrategy
Aug 5, 2015
d92fa14
[SPARK-8601] [ML] Add an option to disable standardization for linear…
holdenk Aug 5, 2015
a7fe48f
[SPARK-9432][SQL] Audit expression unit tests to make sure we pass th…
yjshen Aug 5, 2015
c9a4c36
[SPARK-8313] R Spark packages support
brkyvz Aug 5, 2015
6f8f0e2
[SPARK-7119] [SQL] Give script a default serde with the user specific…
zhichao-li Aug 5, 2015
2b67fdb
[SPARK-9513] [SQL] [PySpark] Add python API for DataFrame functions
Aug 5, 2015
d34bac0
[SPARK-9504] [STREAMING] [TESTS] Fix o.a.s.streaming.StreamingContext…
zsxwing Aug 5, 2015
f7abd6b
Update docs/README.md to put all prereqs together.
rxin Aug 5, 2015
a02bcf2
[SPARK-9540] [MLLIB] optimize PrefixSpan implementation
mengxr Aug 5, 2015
d345485
[SPARK-8231] [SQL] Add array_contains
Aug 5, 2015
781c8d7
[SPARK-9119] [SPARK-8359] [SQL] match Decimal.precision/scale with De…
Aug 5, 2015
c2a71f0
[SPARK-9217] [STREAMING] Make the kinesis receiver reliable by record…
tdas Aug 5, 2015
1d1a76c
[SPARK-9581][SQL] Add unit test for JSON UDT
drubbo Aug 5, 2015
d8ef538
Closes #7917
rxin Aug 5, 2015
6d8a6e4
[SPARK-9360] [SQL] Support BinaryType in PrefixComparators for Unsafe…
maropu Aug 5, 2015
1bf608b
[SPARK-9601] [DOCS] Fix JavaPairDStream signature for stream-stream a…
namitk Aug 5, 2015
1b0317f
[SPARK-8861][SPARK-8862][SQL] Add basic instrumentation to each Spark…
zsxwing Aug 5, 2015
84ca318
[SPARK-9628][SQL]Rename int to SQLDate, long to SQLTimestamp for bett…
yjshen Aug 5, 2015
26b06f1
[HOTFIX] Add static import to fix build break from #7676.
JoshRosen Aug 5, 2015
e27a8c4
[SPARK-9607] [SPARK-9608] fix zinc-port handling in build/mvn
ryan-williams Aug 5, 2015
70112ff
[SPARK-9593] [SQL] Fixes Hadoop shims loading
liancheng Aug 5, 2015
eb8bfa3
[SPARK-9618] [SQL] Use the specified schema when reading Parquet files
Aug 5, 2015
519cf6d
[SPARK-9381] [SQL] Migrate JSON data source to the new partitioning d…
chenghao-intel Aug 5, 2015
34dcf10
[SPARK-6486] [MLLIB] [PYTHON] Add BlockMatrix to PySpark.
dusenberrymw Aug 5, 2015
23d9822
[SPARK-9141] [SQL] Remove project collapsing from DataFrame API
marmbrus Aug 5, 2015
7a969a6
[SPARK-9519] [YARN] Confirm stop sc successfully when application was…
Sephiroth-Lin Aug 5, 2015
1f8c364
[SPARK-9141] [SQL] [MINOR] Fix comments of PR #7920
yhuai Aug 5, 2015
e1e0587
[SPARK-9403] [SQL] Add codegen support in In and InSet
viirya Aug 5, 2015
eb5b8f4
Closes #7778 since it is done as #7893.
rxin Aug 5, 2015
5f0fb64
[SPARK-9649] Fix flaky test MasterSuite - randomize ports
Aug 5, 2015
f9c2a2a
Closes #7474 since it's marked as won't fix.
rxin Aug 5, 2015
dac090d
[SPARK-9657] Fix return type of getMaxPatternLength
Aug 5, 2015
9c87892
[SPARK-9054] [SQL] Rename RowOrdering to InterpretedOrdering; use new…
JoshRosen Aug 5, 2015
a018b85
[SPARK-5895] [ML] Add VectorSlicer - updated
yinxusen Aug 6, 2015
8c320e4
[SPARK-6591] [SQL] Python data source load options should auto conver…
yjshen Aug 6, 2015
4399b7b
[SPARK-9651] Fix UnsafeExternalSorterSuite.
Aug 6, 2015
4581bad
[SPARK-9611] [SQL] Fixes a few corner cases when we spill a UnsafeFix…
yhuai Aug 6, 2015
119b590
[SPARK-6923] [SPARK-7550] [SQL] Persists data source relations in Hiv…
chenghao-intel Aug 6, 2015
9270bd0
[SPARK-9674][SQL] Remove GeneratedAggregate.
rxin Aug 6, 2015
d5a9af3
[SPARK-9664] [SQL] Remove UDAFRegistration and add apply to UserDefin…
yhuai Aug 6, 2015
aead18f
[SPARK-8266] [SQL] add function translate
zhichao-li Aug 6, 2015
5b965d6
[SPARK-9644] [SQL] Support update DecimalType with precision > 18 in …
Aug 6, 2015
93085c9
[SPARK-9482] [SQL] Fix thread-safey issue of using UnsafeProjection i…
Aug 6, 2015
9f94c85
[SPARK-9593] [SQL] [HOTFIX] Makes the Hadoop shims loading fix more r…
liancheng Aug 6, 2015
c5c6ade
[SPARK-9112] [ML] Implement Stats for LogisticRegression
MechCoder Aug 6, 2015
076ec05
[SPARK-9533] [PYSPARK] [ML] Add missing methods in Word2Vec ML
MechCoder Aug 6, 2015
98e6946
[SPARK-9615] [SPARK-9616] [SQL] [MLLIB] Bugs related to FrequentItems…
brkyvz Aug 6, 2015
5e1b0ef
[SPARK-9659][SQL] Rename inSet to isin to match Pandas function.
rxin Aug 6, 2015
6e009cb
[SPARK-9632][SQL] update InternalRow.toSeq to make it accept data typ…
cloud-fan Aug 6, 2015
2eca46a
Revert "[SPARK-9632][SQL] update InternalRow.toSeq to make it accept …
davies Aug 6, 2015
cdd53b7
[SPARK-9632] [SQL] [HOT-FIX] Fix build.
yhuai Aug 6, 2015
0d7aac9
[SPARK-9641] [DOCS] spark.shuffle.service.port is not documented
srowen Aug 6, 2015
a1bbf1b
[SPARK-8978] [STREAMING] Implements the DirectKafkaRateController
nraychaudhuri Aug 6, 2015
1f62f10
[SPARK-9632][SQL] update InternalRow.toSeq to make it accept data typ…
cloud-fan Aug 6, 2015
54c0789
[SPARK-9493] [ML] add featureIndex to handle vector features in Isoto…
mengxr Aug 6, 2015
abfedb9
[SPARK-9211] [SQL] [TEST] normalize line separators before generating…
ckadner Aug 6, 2015
21fdfd7
[SPARK-9548][SQL] Add a destructive iterator for BytesToBytesMap
viirya Aug 6, 2015
0a07830
[SPARK-9556] [SPARK-9619] [SPARK-9624] [STREAMING] Make BlockGenerato…
tdas Aug 6, 2015
1723e34
[DOCS] [STREAMING] make the existing parameter docs for OffsetRange ac…
koeninger Aug 6, 2015
3462090
[SPARK-9639] [STREAMING] Fix a potential NPE in Streaming JobScheduler
zsxwing Aug 6, 2015
3504bf3
[SPARK-9630] [SQL] Clean up new aggregate operators (SPARK-9240 follo…
yhuai Aug 6, 2015
e234ea1
[SPARK-9645] [YARN] [CORE] Allow shuffle service to read shuffle files.
Aug 6, 2015
681e302
[SPARK-9633] [BUILD] SBT download locations outdated; need an update
srowen Aug 6, 2015
baf4587
[SPARK-9691] [SQL] PySpark SQL rand function treats seed 0 as no seed
yhuai Aug 7, 2015
4e70e82
[SPARK-9228] [SQL] use tungsten.enabled in public for both of codegen…
Aug 7, 2015
0867b23
[SPARK-9650][SQL] Fix quoting behavior on interpolated column names
marmbrus Aug 7, 2015
49b1504
Revert "[SPARK-9228] [SQL] use tungsten.enabled in public for both of…
davies Aug 7, 2015
b878253
[SPARK-9692] Remove SqlNewHadoopRDD's generated Tuple2 and Interrupti…
rxin Aug 7, 2015
014a9f9
[SPARK-9709] [SQL] Avoid starving unsafe operators that use sort
Aug 7, 2015
17284db
[SPARK-9228] [SQL] use tungsten.enabled in public for both of codegen…
Aug 7, 2015
fe12277
Fix doc typo
zjffdu Aug 7, 2015
672f467
[SPARK-8057][Core]Call TaskAttemptContext.getTaskAttemptID using Refl…
zsxwing Aug 7, 2015
f0cda58
[SPARK-7550] [SQL] [MINOR] Fixes logs when persisting DataFrames
liancheng Aug 7, 2015
7aaed1b
[SPARK-8862][SQL]Support multiple SQLContexts in Web UI
zsxwing Aug 7, 2015
4309262
[SPARK-9700] Pick default page size more intelligently.
rxin Aug 7, 2015
15bd6f3
[SPARK-9453] [SQL] support records larger than page size in UnsafeShu…
Aug 7, 2015
e57d6b5
[SPARK-9683] [SQL] copy UTF8String when convert unsafe array/map to safe
cloud-fan Aug 7, 2015
ebfd91c
[SPARK-9467][SQL]Add SQLMetric to specialize accumulators to avoid bo…
zsxwing Aug 7, 2015
76eaa70
[SPARK-9674][SPARK-9667] Remove SparkSqlSerializer2
rxin Aug 7, 2015
2432c2e
[SPARK-8382] [SQL] Improve Analysis Unit test framework
cloud-fan Aug 7, 2015
9897cc5
[SPARK-9736] [SQL] JoinedRow.anyNull should delegate to the underlyin…
rxin Aug 7, 2015
aeddeaf
[SPARK-9667][SQL] followup: Use GenerateUnsafeProjection.canSupport t…
rxin Aug 7, 2015
05d04e1
[SPARK-9733][SQL] Improve physical plan explain for data sources
rxin Aug 7, 2015
881548a
[SPARK-9674] Re-enable ignored test in SQLQuerySuite
Aug 7, 2015
e2fbbe7
[SPARK-8481] [MLLIB] GaussianMixtureModel predict accepting single ve…
dkobylarz Aug 7, 2015
902334f
[SPARK-9748] [MLLIB] Centriod typo in KMeansModel
BertrandDechoux Aug 7, 2015
49702bd
[SPARK-8890] [SQL] Fallback on sorting when writing many dynamic part…
marmbrus Aug 7, 2015
cd540c1
[SPARK-9756] [ML] Make constructors in ML decision trees private
Aug 8, 2015
85be65b
[SPARK-9719] [ML] Clean up Naive Bayes doc
Aug 8, 2015
998f4ff
[SPARK-9754][SQL] Remove TypeCheck in debug package.
rxin Aug 8, 2015
c564b27
[SPARK-9753] [SQL] TungstenAggregate should also accept InternalRow i…
yhuai Aug 8, 2015
ef062c1
[SPARK-9731] Standalone scheduling incorrect cores if spark.executor.…
carsonwang Aug 8, 2015
11caf1c
[SPARK-4176] [SQL] [MINOR] Should use unscaled Long to write decimals…
liancheng Aug 8, 2015
106c078
[SPARK-9738] [SQL] remove FromUnsafe and add its codegen version to G…
cloud-fan Aug 8, 2015
74a6541
[SPARK-4561] [PYSPARK] [SQL] turn Row into dict recursively
Aug 8, 2015
ac507a0
[SPARK-6902] [SQL] [PYSPARK] Row should be read-only
Aug 8, 2015
23695f1
[SPARK-9728][SQL]Support CalendarIntervalType in HiveQL
yjshen Aug 8, 2015
a3aec91
[SPARK-9486][SQL] Add data source aliasing for external packages
Aug 8, 2015
25c363e
[MINOR] inaccurate comments for showString()
CodingCat Aug 9, 2015
3ca995b
[SPARK-6212] [SQL] The EXPLAIN output of CTAS only shows the analyzed…
yjshen Aug 9, 2015
e9c3693
[SPARK-9752][SQL] Support UnsafeRow in Sample operator.
rxin Aug 9, 2015
68ccc6e
[SPARK-8930] [SQL] Throw a AnalysisException with meaningful messages…
yjshen Aug 9, 2015
86fa4ba
[SPARK-9737] [YARN] Add the suggested configuration when required exe…
watermen Aug 9, 2015
a863348
Disable JobGeneratorSuite "Do not clear received block data too soon".
rxin Aug 9, 2015
23cf5af
[SPARK-9703] [SQL] Refactor EnsureRequirements to avoid certain unnec…
JoshRosen Aug 9, 2015
4602561
[CORE] [SPARK-9760] Use Option instead of Some for Ivy repos
shivaram Aug 9, 2015
be80def
[SPARK-9777] [SQL] Window operator can accept UnsafeRows
yhuai Aug 10, 2015
e3fef0f
[SPARK-9743] [SQL] Fixes JSONRelation refreshing
liancheng Aug 10, 2015
0f3366a
[SPARK-9710] [TEST] Fix RPackageUtilsSuite when R is not available.
Aug 10, 2015
00b655c
[SPARK-9755] [MLLIB] Add docs to MultivariateOnlineSummarizer methods
Aug 10, 2015
d285212
Fixed AtmoicReference<> Example
lababidi Aug 10, 2015
0fe6674
[SPARK-9784] [SQL] Exchange.isUnsafe should check whether codegen and…
JoshRosen Aug 10, 2015
40ed2af
[SPARK-9763][SQL] Minimize exposure of internal SQL classes.
rxin Aug 10, 2015
fe2fb7f
[SPARK-9620] [SQL] generated UnsafeProjection should support many col…
Aug 10, 2015
c4fd2a2
[SPARK-9759] [SQL] improve decimal.times() and cast(int, decimalType)
Aug 10, 2015
853809e
[SPARK-5155] [PYSPARK] [STREAMING] Mqtt streaming support in Python
prabeesh Aug 10, 2015
3c9802d
[SPARK-9801] [STREAMING] Check if file exists before deleting tempora…
viadea Aug 11, 2015
071bbad
[SPARK-9340] [SQL] Fixes converting unannotated Parquet lists
dguy Aug 11, 2015
91e9389
[SPARK-9729] [SPARK-9363] [SQL] Use sort merge join for left and righ…
JoshRosen Aug 11, 2015
0f90d60
[SPARK-9640] [STREAMING] [TEST] Do not run Python Kinesis tests when …
tdas Aug 11, 2015
55752d8
[SPARK-9810] [BUILD] Remove individual commit messages from the squas…
rxin Aug 11, 2015
600031e
[SPARK-9727] [STREAMING] [BUILD] Updated streaming kinesis SBT projec…
tdas Aug 11, 2015
d378396
[SPARK-9815] Rename PlatformDependent.UNSAFE -> Platform.
rxin Aug 11, 2015
dfe347d
[SPARK-9785] [SQL] HashPartitioning compatibility should consider exp…
JoshRosen Aug 11, 2015
bce7279
Fix comment error
zjffdu Aug 11, 2015
8cad854
[SPARK-8345] [ML] Add an SQL node as a feature transformer
yanboliang Aug 11, 2015
dbd778d
[SPARK-8764] [ML] string indexer should take option to handle unseen …
holdenk Aug 11, 2015
5b8bb1b
[SPARK-9572] [STREAMING] [PYSPARK] Added StreamingContext.getActiveOr…
tdas Aug 11, 2015
5831294
[SPARK-9646] [SQL] Add metrics for all join and aggregate operators
zsxwing Aug 11, 2015
520ad44
[SPARK-9750] [MLLIB] Improve equals on SparseMatrix and DenseMatrix
Aug 11, 2015
2a3be4d
[SPARK-7726] Add import so Scaladoc doesn't fail.
pwendell Aug 11, 2015
00c0272
[SPARK-9814] [SQL] EqualNotNull not passing to data sources
HyukjinKwon Aug 11, 2015
f16bc68
[SPARK-9824] [CORE] Fix the issue that InternalAccumulator leaks Weak…
zsxwing Aug 11, 2015
423cdfd
Closes #1290
mengxr Aug 11, 2015
be3e271
[SPARK-9788] [MLLIB] Fix LDA Binary Compatibility
Aug 11, 2015
017b5de
[SPARK-8925] [MLLIB] Add @since tags to mllib.util
sthota2014 Aug 11, 2015
736af95
[HOTFIX] Fix style error caused by 017b5de
Aug 11, 2015
5a5bbc2
[SPARK-9074] [LAUNCHER] Allow arbitrary Spark args to be set.
Aug 11, 2015
afa757c
[SPARK-9849] [SQL] DirectParquetOutputCommitter qualified name should…
rxin Aug 12, 2015
ca8f70e
[SPARK-9649] Fix flaky test MasterSuite again - disable REST
Aug 12, 2015
3ef0f32
[SPARK-1517] Refactor release scripts to facilitate nightly publishing
pwendell Aug 12, 2015
74a293f
[SPARK-9713] [ML] Document SparkR MLlib glm() integration in Spark 1.5
ericl Aug 12, 2015
c3e9a12
[SPARK-9831] [SQL] fix serialization with empty broadcast
Aug 12, 2015
b1581ac
[SPARK-9854] [SQL] RuleExecutor.timeMap should be thread-safe
JoshRosen Aug 12, 2015
b85f9a2
[SPARK-8366] maxNumExecutorsNeeded should properly handle failed tasks
XuTingjun Aug 12, 2015
a807fcb
[SPARK-9806] [WEB UI] Don't share ReplayListenerBus between multiple …
Aug 12, 2015
4e3f4b9
[SPARK-9829] [WEBUI] Display the update value for peak execution memory
zsxwing Aug 12, 2015
bab8923
[SPARK-9426] [WEBUI] Job page DAG visualization is not shown
carsonwang Aug 12, 2015
5c99d8b
[SPARK-8798] [MESOS] Allow additional uris to be fetched with mesos
tnachen Aug 12, 2015
741a29f
[SPARK-9575] [MESOS] Add docuemntation around Mesos shuffle service.
tnachen Aug 12, 2015
9d08224
[SPARK-9182] [SQL] Filters are not passed through to jdbc source
yjshen Aug 12, 2015
3ecb379
[SPARK-9407] [SQL] Relaxes Parquet ValidTypeMap to allow ENUM predica…
liancheng Aug 12, 2015
2e68066
[SPARK-8625] [CORE] Propagate user exceptions in tasks back to driver
tomwhite Aug 12, 2015
be5d191
[SPARK-9795] Dynamic allocation: avoid double counting when killing s…
Aug 12, 2015
66d87c1
[SPARK-7583] [MLLIB] User guide update for RegexTokenizer
hhbyyh Aug 12, 2015
e011079
[SPARK-9747] [SQL] Avoid starving an unsafe operator in aggregation
Aug 12, 2015
57ec27d
[SPARK-9804] [HIVE] Use correct value for isSrcLocal parameter.
Aug 12, 2015
70fe558
[SPARK-9847] [ML] Modified copyValues to distinguish between default,…
jkbradley Aug 12, 2015
60103ec
[SPARK-9726] [PYTHON] PySpark DF join no longer accepts on=None
btashton Aug 12, 2015
762bacc
[SPARK-9766] [ML] [PySpark] check and add miss docs for PySpark ML
yanboliang Aug 12, 2015
551def5
[SPARK-9789] [ML] Added logreg threshold param back
jkbradley Aug 12, 2015
6f60298
[SPARK-8967] [DOC] add Since annotation
mengxr Aug 12, 2015
a17384f
[SPARK-9907] [SQL] Python crc32 is mistakenly calling md5
rxin Aug 12, 2015
738f353
[SPARK-9092] Fixed incompatibility when both num-executors and dynami…
Aug 12, 2015
ab7e721
[SPARK-9826] [CORE] Fix cannot use custom classes in log4j.properties
michellemay Aug 12, 2015
7035d88
[SPARK-9894] [SQL] Json writer should handle MapData.
yhuai Aug 12, 2015
caa14d9
[SPARK-9913] [MLLIB] LDAUtils should be private
mengxr Aug 12, 2015
6e409bc
[SPARK-9909] [ML] [TRIVIAL] move weightCol to shared params
holdenk Aug 12, 2015
e6aef55
[SPARK-9912] [MLLIB] QRDecomposition should use QType and RType for t…
mengxr Aug 13, 2015
fc1c7fd
[SPARK-9915] [ML] stopWords should use StringArrayParam
mengxr Aug 13, 2015
660e6dc
[SPARK-9449] [SQL] Include MetastoreRelation's inputFiles
marmbrus Aug 13, 2015
8ce6096
[SPARK-9780] [STREAMING] [KAFKA] prevent NPE if KafkaRDD instantiation …
koeninger Aug 13, 2015
0d1d146
[SPARK-9724] [WEB UI] Avoid unnecessary redirects in the Spark Web UI.
Aug 13, 2015
f4bc01f
[SPARK-9855] [SPARKR] Add expression functions into SparkR whose para…
yu-iskw Aug 13, 2015
7b13ed2
[SPARK-9870] Disable driver UI and Master REST server in SparkSubmitS…
JoshRosen Aug 13, 2015
7c35746
[SPARK-9827] [SQL] fix fd leak in UnsafeRowSerializer
Aug 13, 2015
4413d08
[SPARK-9908] [SQL] When spark.sql.tungsten.enabled is false, broadcas…
yhuai Aug 13, 2015
d2d5e7f
[SPARK-9704] [ML] Made ProbabilisticClassifier, Identifiable, VectorU…
jkbradley Aug 13, 2015
d7053be
[SPARK-9903] [MLLIB] skip local processing in PrefixSpan if there are…
mengxr Aug 13, 2015
2fb4901
[SPARK-9916] [BUILD] [SPARKR] removed left-over sparkr.zip copy/creat…
brkyvz Aug 13, 2015
2278219
[SPARK-9920] [SQL] The simpleString of TungstenAggregate does not sho…
yhuai Aug 13, 2015
a8ab263
[SPARK-9832] [SQL] add a thread-safe lookup for BytesToBytseMap
Aug 13, 2015
5fc058a
[SPARK-9917] [ML] add getMin/getMax and doc for originalMin/origianlM…
mengxr Aug 13, 2015
df54389
[SPARK-8922] [DOCUMENTATION, MLLIB] Add @since tags to mllib.evaluation
mosessky Aug 13, 2015
d7eb371
[SPARK-9914] [ML] define setters explicitly for Java and use setParam…
mengxr Aug 13, 2015
d0b1891
[SPARK-9927] [SQL] Revert 8049 since it's pushing wrong filter down
yjshen Aug 13, 2015
68f9957
[SPARK-9918] [MLLIB] remove runs from k-means and rename epsilon to tol
mengxr Aug 13, 2015
84a2791
[SPARK-9885] [SQL] Also pass barrierPrefixes and sharedPrefixes to Is…
yhuai Aug 13, 2015
6993031
[SPARK-9757] [SQL] Fixes persistence of Parquet relation with decimal…
liancheng Aug 13, 2015
2932e25
[SPARK-9073] [ML] spark.ml Models copy() should call setParent when t…
Lewuathe Aug 13, 2015
7a539ef
[SPARK-8965] [DOCS] Add ml-guide Python Example: Estimator, Transform…
Rosstin Aug 13, 2015
4b70798
[MINOR] [ML] change MultilayerPerceptronClassifierModel to Multilayer…
yanboliang Aug 13, 2015
65fec79
[MINOR] [DOC] fix mllib pydoc warnings
mengxr Aug 13, 2015
8815ba2
[SPARK-9649] Fix MasterSuite, third time's a charm
Aug 13, 2015
864de8e
[SPARK-9661] [MLLIB] [ML] Java compatibility
MechCoder Aug 13, 2015
a8d2f4c
[SPARK-9942] [PYSPARK] [SQL] ignore exceptions while try to import pa…
Aug 13, 2015
c2520f5
[SPARK-9935] [SQL] EqualNotNull not processed in ORC
HyukjinKwon Aug 13, 2015
6c5858b
[SPARK-9922] [ML] rename StringIndexerReverse to IndexToString
mengxr Aug 13, 2015
17c3f3d
Using ExternalList[_] in KryoSerializer. Clean up SpillableCollection…
mccheah Aug 17, 2015
3d066fc
Fixing unit test
mccheah Aug 18, 2015
083f9e2
Merge branch 'master' into external-group-by
mccheah Aug 25, 2015
4c05110
Fix a whole ton of Scalastyle errors
mccheah Aug 25, 2015
8f5d5e3
Continuing to sanitize unit tests
mccheah Aug 25, 2015
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 2 additions & 0 deletions .rat-excludes
Original file line number Diff line number Diff line change
Expand Up @@ -93,3 +93,5 @@ INDEX
.lintr
gen-java.*
.*avpr
org.apache.spark.sql.sources.DataSourceRegister
.*parquet
5 changes: 0 additions & 5 deletions R/install-dev.bat
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,3 @@ set SPARK_HOME=%~dp0..
MKDIR %SPARK_HOME%\R\lib

R.exe CMD INSTALL --library="%SPARK_HOME%\R\lib" %SPARK_HOME%\R\pkg\

rem Zip the SparkR package so that it can be distributed to worker nodes on YARN
pushd %SPARK_HOME%\R\lib
%JAVA_HOME%\bin\jar.exe cfM "%SPARK_HOME%\R\lib\sparkr.zip" SparkR
popd
4 changes: 0 additions & 4 deletions R/install-dev.sh
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,4 @@ Rscript -e ' if("devtools" %in% rownames(installed.packages())) { library(devtoo
# Install SparkR to $LIB_DIR
R CMD INSTALL --library=$LIB_DIR $FWDIR/pkg/

# Zip the SparkR package so that it can be distributed to worker nodes on YARN
cd $LIB_DIR
jar cfM "$LIB_DIR/sparkr.zip" SparkR

popd > /dev/null
1 change: 1 addition & 0 deletions R/pkg/DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ Collate:
'client.R'
'context.R'
'deserialize.R'
'functions.R'
'mllib.R'
'serialize.R'
'sparkR.R'
Expand Down
11 changes: 10 additions & 1 deletion R/pkg/NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@ export("print.jobj")

# MLlib integration
exportMethods("glm",
"predict")
"predict",
"summary")

# Job group lifecycle management methods
export("setJobGroup",
Expand All @@ -28,6 +29,7 @@ exportMethods("arrange",
"count",
"crosstab",
"describe",
"dim",
"distinct",
"dropna",
"dtypes",
Expand All @@ -44,11 +46,16 @@ exportMethods("arrange",
"isLocal",
"join",
"limit",
"merge",
"names",
"ncol",
"nrow",
"orderBy",
"mutate",
"names",
"persist",
"printSchema",
"rbind",
"registerTempTable",
"rename",
"repartition",
Expand All @@ -63,8 +70,10 @@ exportMethods("arrange",
"show",
"showDF",
"summarize",
"summary",
"take",
"unionAll",
"unique",
"unpersist",
"where",
"withColumn",
Expand Down
116 changes: 116 additions & 0 deletions R/pkg/R/DataFrame.R
Original file line number Diff line number Diff line change
Expand Up @@ -255,6 +255,16 @@ setMethod("names",
columns(x)
})

#' @rdname columns
setMethod("names<-",
signature(x = "DataFrame"),
function(x, value) {
if (!is.null(value)) {
sdf <- callJMethod(x@sdf, "toDF", listToSeq(as.list(value)))
dataFrame(sdf)
}
})

#' Register Temporary Table
#'
#' Registers a DataFrame as a Temporary Table in the SQLContext
Expand Down Expand Up @@ -473,6 +483,18 @@ setMethod("distinct",
dataFrame(sdf)
})

#' @title Distinct rows in a DataFrame
#
#' @description Returns a new DataFrame containing distinct rows in this DataFrame
#'
#' @rdname unique
#' @aliases unique
setMethod("unique",
signature(x = "DataFrame"),
function(x) {
distinct(x)
})

#' Sample
#'
#' Return a sampled subset of this DataFrame using a random seed.
Expand Down Expand Up @@ -534,6 +556,58 @@ setMethod("count",
callJMethod(x@sdf, "count")
})

#' @title Number of rows for a DataFrame
#' @description Returns number of rows in a DataFrames
#'
#' @name nrow
#'
#' @rdname nrow
#' @aliases count
setMethod("nrow",
signature(x = "DataFrame"),
function(x) {
count(x)
})

#' Returns the number of columns in a DataFrame
#'
#' @param x a SparkSQL DataFrame
#'
#' @rdname ncol
#' @export
#' @examples
#'\dontrun{
#' sc <- sparkR.init()
#' sqlContext <- sparkRSQL.init(sc)
#' path <- "path/to/file.json"
#' df <- jsonFile(sqlContext, path)
#' ncol(df)
#' }
setMethod("ncol",
signature(x = "DataFrame"),
function(x) {
length(columns(x))
})

#' Returns the dimentions (number of rows and columns) of a DataFrame
#' @param x a SparkSQL DataFrame
#'
#' @rdname dim
#' @export
#' @examples
#'\dontrun{
#' sc <- sparkR.init()
#' sqlContext <- sparkRSQL.init(sc)
#' path <- "path/to/file.json"
#' df <- jsonFile(sqlContext, path)
#' dim(df)
#' }
setMethod("dim",
signature(x = "DataFrame"),
function(x) {
c(count(x), ncol(x))
})

#' Collects all the elements of a Spark DataFrame and coerces them into an R data.frame.
#'
#' @param x A SparkSQL DataFrame
Expand Down Expand Up @@ -1205,6 +1279,15 @@ setMethod("join",
dataFrame(sdf)
})

#' rdname merge
#' aliases join
setMethod("merge",
signature(x = "DataFrame", y = "DataFrame"),
function(x, y, joinExpr = NULL, joinType = NULL, ...) {
join(x, y, joinExpr, joinType)
})


#' UnionAll
#'
#' Return a new DataFrame containing the union of rows in this DataFrame
Expand All @@ -1231,6 +1314,22 @@ setMethod("unionAll",
dataFrame(unioned)
})

#' @title Union two or more DataFrames
#
#' @description Returns a new DataFrame containing rows of all parameters.
#
#' @rdname rbind
#' @aliases unionAll
setMethod("rbind",
signature(... = "DataFrame"),
function(x, ..., deparse.level = 1) {
if (nargs() == 3) {
unionAll(x, ...)
} else {
unionAll(x, Recall(..., deparse.level = 1))
}
})

#' Intersect
#'
#' Return a new DataFrame containing rows only in both this DataFrame
Expand Down Expand Up @@ -1322,9 +1421,11 @@ setMethod("write.df",
"org.apache.spark.sql.parquet")
}
allModes <- c("append", "overwrite", "error", "ignore")
# nolint start
if (!(mode %in% allModes)) {
stop('mode should be one of "append", "overwrite", "error", "ignore"')
}
# nolint end
jmode <- callJStatic("org.apache.spark.sql.api.r.SQLUtils", "saveMode", mode)
options <- varargsToEnv(...)
if (!is.null(path)) {
Expand Down Expand Up @@ -1384,9 +1485,11 @@ setMethod("saveAsTable",
"org.apache.spark.sql.parquet")
}
allModes <- c("append", "overwrite", "error", "ignore")
# nolint start
if (!(mode %in% allModes)) {
stop('mode should be one of "append", "overwrite", "error", "ignore"')
}
# nolint end
jmode <- callJStatic("org.apache.spark.sql.api.r.SQLUtils", "saveMode", mode)
options <- varargsToEnv(...)
callJMethod(df@sdf, "saveAsTable", tableName, source, jmode, options)
Expand Down Expand Up @@ -1430,6 +1533,19 @@ setMethod("describe",
dataFrame(sdf)
})

#' @title Summary
#'
#' @description Computes statistics for numeric columns of the DataFrame
#'
#' @rdname summary
#' @aliases describe
setMethod("summary",
signature(x = "DataFrame"),
function(x) {
describe(x)
})


#' dropna
#'
#' Returns a new DataFrame omitting rows with null values.
Expand Down
13 changes: 8 additions & 5 deletions R/pkg/R/RDD.R
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,9 @@ setMethod("initialize", "PipelinedRDD", function(.Object, prev, func, jrdd_val)

isPipelinable <- function(rdd) {
e <- rdd@env
# nolint start
!(e$isCached || e$isCheckpointed)
# nolint end
}

if (!inherits(prev, "PipelinedRDD") || !isPipelinable(prev)) {
Expand All @@ -97,7 +99,8 @@ setMethod("initialize", "PipelinedRDD", function(.Object, prev, func, jrdd_val)
# prev_serializedMode is used during the delayed computation of JRDD in getJRDD
} else {
pipelinedFunc <- function(partIndex, part) {
func(partIndex, prev@func(partIndex, part))
f <- prev@func
func(partIndex, f(partIndex, part))
}
.Object@func <- cleanClosure(pipelinedFunc)
.Object@prev_jrdd <- prev@prev_jrdd # maintain the pipeline
Expand Down Expand Up @@ -841,7 +844,7 @@ setMethod("sampleRDD",
if (withReplacement) {
count <- rpois(1, fraction)
if (count > 0) {
res[(len + 1):(len + count)] <- rep(list(elem), count)
res[ (len + 1) : (len + count) ] <- rep(list(elem), count)
len <- len + count
}
} else {
Expand Down Expand Up @@ -1261,12 +1264,12 @@ setMethod("pipeRDD",
signature(x = "RDD", command = "character"),
function(x, command, env = list()) {
func <- function(part) {
trim.trailing.func <- function(x) {
trim_trailing_func <- function(x) {
sub("[\r\n]*$", "", toString(x))
}
input <- unlist(lapply(part, trim.trailing.func))
input <- unlist(lapply(part, trim_trailing_func))
res <- system2(command, stdout = TRUE, input = input, env = env)
lapply(res, trim.trailing.func)
lapply(res, trim_trailing_func)
}
lapplyPartition(x, func)
})
Expand Down
4 changes: 3 additions & 1 deletion R/pkg/R/backend.R
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,8 @@ invokeJava <- function(isStatic, objId, methodName, ...) {

# TODO: check the status code to output error information
returnStatus <- readInt(conn)
stopifnot(returnStatus == 0)
if (returnStatus != 0) {
stop(readString(conn))
}
readObject(conn)
}
Loading