Merged Apache branch-1.6 #159

markhamstra · 2016-03-14T22:18:50Z

No description provided.

## What changes were proposed in this pull request? TaskContext supports task completion callback, which gets called regardless of task failures. However, there is no way for the listener to know if there is an error. This patch adds a new listener that gets called when a task fails. ## How was this patch tested? New unit test case and integration test case covering the code path Author: Davies Liu <[email protected]> Closes apache#11478 from davies/add_failure_1.6.

In order to tell OutputStream that the task has failed or not, we should call the failure callbacks BEFORE calling writer.close(). Added new unit tests. Author: Davies Liu <[email protected]> Closes apache#11450 from davies/callback.

Fix race conditions when cleanup files. Existing tests. Author: Davies Liu <[email protected]> Closes apache#11507 from davies/flaky. (cherry picked from commit d062587) Signed-off-by: Davies Liu <[email protected]> Conflicts: sql/hive/src/test/scala/org/apache/spark/sql/sources/CommitFailureTestRelationSuite.scala

…cled ## What changes were proposed in this pull request? `sendRpcSync` should copy the response content because the underlying buffer will be recycled and reused. ## How was this patch tested? Jenkins unit tests. Author: Shixiong Zhu <[email protected]> Closes apache#11499 from zsxwing/SPARK-13652. (cherry picked from commit 465c665) Signed-off-by: Shixiong Zhu <[email protected]>

cc jkbradley Author: Yu ISHIKAWA <[email protected]> Closes apache#9535 from yu-iskw/SPARK-11515. (cherry picked from commit 574571c) Signed-off-by: Sean Owen <[email protected]>

… string datatypes to Oracle VARCHAR datatype mapping A test suite added for the bug fix -SPARK 12941; for the mapping of the StringType to corresponding in Oracle manual tests done (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Author: thomastechs <[email protected]> Author: THOMAS SEBASTIAN <[email protected]> Closes apache#11489 from thomastechs/thomastechs-12941-master-new. (cherry picked from commit f6ac7c3) Signed-off-by: Yin Huai <[email protected]> Conflicts: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala

…DataFrames ## What changes were proposed in this pull request? Change line 113 of QuantileDiscretizer.scala to `val requiredSamples = math.max(numBins * numBins, 10000.0)` so that `requiredSamples` is a `Double`. This will fix the division in line 114 which currently results in zero if `requiredSamples < dataset.count` ## How was the this patch tested? Manual tests. I was having a problems using QuantileDiscretizer with my a dataset and after making this change QuantileDiscretizer behaves as expected. Author: Oliver Pierson <[email protected]> Author: Oliver Pierson <[email protected]> Closes apache#11319 from oliverpierson/SPARK-13444.

…ionSerializer.loads ## What changes were proposed in this pull request? Set the function's module name to `__main__` if it's missing in `TransformFunctionSerializer.loads`. ## How was this patch tested? Manually test in the shell. Before this patch: ``` >>> from pyspark.streaming import StreamingContext >>> from pyspark.streaming.util import TransformFunction >>> ssc = StreamingContext(sc, 1) >>> func = TransformFunction(sc, lambda x: x, sc.serializer) >>> func.rdd_wrapper(lambda x: x) TransformFunction(<function <lambda> at 0x106ac8b18>) >>> bytes = bytearray(ssc._transformerSerializer.serializer.dumps((func.func, func.rdd_wrap_func, func.deserializers))) >>> func2 = ssc._transformerSerializer.loads(bytes) >>> print(func2.func.__module__) None >>> print(func2.rdd_wrap_func.__module__) None >>> ``` After this patch: ``` >>> from pyspark.streaming import StreamingContext >>> from pyspark.streaming.util import TransformFunction >>> ssc = StreamingContext(sc, 1) >>> func = TransformFunction(sc, lambda x: x, sc.serializer) >>> func.rdd_wrapper(lambda x: x) TransformFunction(<function <lambda> at 0x108bf1b90>) >>> bytes = bytearray(ssc._transformerSerializer.serializer.dumps((func.func, func.rdd_wrap_func, func.deserializers))) >>> func2 = ssc._transformerSerializer.loads(bytes) >>> print(func2.func.__module__) __main__ >>> print(func2.rdd_wrap_func.__module__) __main__ >>> ``` Author: Shixiong Zhu <[email protected]> Closes apache#11535 from zsxwing/loads-module. (cherry picked from commit ee913e6) Signed-off-by: Davies Liu <[email protected]>

…tly refers to StatefulNetworkWordCount ## What changes were proposed in this pull request? The reference to StatefulNetworkWordCount.scala from updateStatesByKey documentation should be removed, till there is a example for updateStatesByKey. ## How was this patch tested? Have tested the new documentation with jekyll build. Author: rmishra <[email protected]> Closes apache#11545 from rishitesh/SPARK-13705. (cherry picked from commit 4b13896) Signed-off-by: Sean Owen <[email protected]>

…-hive and spark-hiveserver (branch 1.6) ## What changes were proposed in this pull request? This is just the patch of apache#11449 cherry picked to branch-1.6; the enforcer and dep/ diffs are cut Modifies the dependency declarations of the all the hive artifacts, to explicitly exclude the groovy-all JAR. This stops the groovy classes *and everything else in that uber-JAR* from getting into spark-assembly JAR. ## How was this patch tested? 1. Pre-patch build was made: `mvn clean install -Pyarn,hive,hive-thriftserver` 1. spark-assembly expanded, observed to have the org.codehaus.groovy packages and JARs 1. A maven dependency tree was created `mvn dependency:tree -Pyarn,hive,hive-thriftserver -Dverbose > target/dependencies.txt` 1. This text file examined to confirm that groovy was being imported as a dependency of `org.spark-project.hive` 1. Patch applied 1. Repeated step1: clean build of project with ` -Pyarn,hive,hive-thriftserver` set 1. Examined created spark-assembly, verified no org.codehaus packages 1. Verified that the maven dependency tree no longer references groovy The `master` version updates the dependency files and an enforcer rule to keep groovy out; this patch strips it out. Author: Steve Loughran <[email protected]> Closes apache#11473 from steveloughran/fixes/SPARK-13599-groovy+branch-1.6.

The description of "spark.memory.offHeap.size" in the current document does not clearly state that memory is counted with bytes.... This PR contains a small fix for this tiny issue document fix Author: CodingCat <[email protected]> Closes apache#11561 from CodingCat/master. (cherry picked from commit a3ec50a) Signed-off-by: Shixiong Zhu <[email protected]>

## What changes were proposed in this pull request? Adding the hive-cli classes to the classloader ## How was this patch tested? The hive Versionssuite tests were run This is my original work and I license the work to the project under the project's open source license. Author: Tim Preece <[email protected]> Closes apache#11495 from preecet/master. (cherry picked from commit 46f25c2) Signed-off-by: Michael Armbrust <[email protected]>

…ient as it's in driver ## What changes were proposed in this pull request? AppClient runs in the driver side. It should not call `Utils.tryOrExit` as it will send exception to SparkUncaughtExceptionHandler and call `System.exit`. This PR just removed `Utils.tryOrExit`. ## How was this patch tested? manual tests. Author: Shixiong Zhu <[email protected]> Closes apache#11566 from zsxwing/SPARK-13711.

When generating Graphviz DOT files in the SQL query visualization we need to escape double-quotes inside node labels. This is a followup to apache#11309, which fixed a similar graph in Spark Core's DAG visualization. Author: Josh Rosen <[email protected]> Closes apache#11587 from JoshRosen/graphviz-escaping. (cherry picked from commit 81f54ac) Signed-off-by: Josh Rosen <[email protected]>

## What changes were proposed in this pull request? If a job is being scheduled in one thread which has a dependency on an RDD currently executing a shuffle in another thread, Spark would throw a NullPointerException. This patch synchronizes access to `mapStatuses` and skips null status entries (which are in-progress shuffle tasks). ## How was this patch tested? Our client code unit test suite, which was reliably reproducing the race condition with 10 threads, shows that this fixes it. I have not found a minimal test case to add to Spark, but I will attempt to do so if desired. The same test case was tripping up on SPARK-4454, which was fixed by making other DAGScheduler code thread-safe. shivaram srowen Author: Andy Sloane <[email protected]> Closes apache#11505 from a1k0n/SPARK-13631. (cherry picked from commit cbff280) Signed-off-by: Sean Owen <[email protected]>

## What changes were proposed in this pull request? If there are many branches in a CaseWhen expression, the generated code could go above the 64K limit for single java method, will fail to compile. This PR change it to fallback to interpret mode if there are more than 20 branches. ## How was this patch tested? Add tests Author: Davies Liu <[email protected]> Closes apache#11606 from davies/fix_when_16.

## What changes were proposed in this pull request? A very minor change for using `BigDecimal.decimal(f: Float)` instead of `BigDecimal(f: float)`. The latter is deprecated and can result in inconsistencies due to an implicit conversion to `Double`. ## How was this patch tested? N/A cc yhuai Author: Sameer Agarwal <[email protected]> Closes apache#11597 from sameeragarwal/bigdecimal. (cherry picked from commit 926e9c4) Signed-off-by: Yin Huai <[email protected]>

This reverts commit 926e9c4.

Update snappy to 1.1.2.1 to pull in a single fix -- the OOM fix we already worked around. Supersedes apache#11524 Jenkins tests. Author: Sean Owen <[email protected]> Closes apache#11631 from srowen/SPARK-13663. (cherry picked from commit 927e22e) Signed-off-by: Sean Owen <[email protected]>

## What changes were proposed in this pull request? Today, Spark 1.6.1 and updated docs are release. Unfortunately, there is obsolete hive version information on docs: [Building Spark](http://spark.apache.org/docs/latest/building-spark.html#building-with-hive-and-jdbc-support). This PR fixes the following two lines. ``` -By default Spark will build with Hive 0.13.1 bindings. +By default Spark will build with Hive 1.2.1 bindings. -# Apache Hadoop 2.4.X with Hive 13 support +# Apache Hadoop 2.4.X with Hive 1.2.1 support ``` `sql/README.md` file also describe ## How was this patch tested? Manual. (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Author: Dongjoon Hyun <[email protected]> Closes apache#11639 from dongjoon-hyun/fix_doc_hive_version. (cherry picked from commit 88fa866) Signed-off-by: Reynold Xin <[email protected]>

Author: Oscar D. Lara Yejas <[email protected]> Author: Oscar D. Lara Yejas <[email protected]> Closes apache#11220 from olarayej/SPARK-13312-3. (cherry picked from commit 416e71a) Signed-off-by: Shivaram Venkataraman <[email protected]>

…ions ## What changes were proposed in this pull request? Currently, when a java.net.BindException is thrown, it displays the following message: java.net.BindException: Address already in use: Service '$serviceName' failed after 16 retries! This change adds port configuration suggestions to the BindException, for example, for the UI, it now displays java.net.BindException: Address already in use: Service 'SparkUI' failed after 16 retries! Consider explicitly setting the appropriate port for 'SparkUI' (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries. ## How was this patch tested? Manual tests Author: Bjorn Jonsson <[email protected]> Closes apache#11644 from bjornjon/master. (cherry picked from commit 515e4af) Signed-off-by: Sean Owen <[email protected]>

## What changes were proposed in this pull request? fix typo in DataSourceRegister ## How was this patch tested? found when going through latest code Author: Jacky Li <[email protected]> Closes apache#11686 from jackylk/patch-12. (cherry picked from commit f3daa09) Signed-off-by: Reynold Xin <[email protected]>

## What changes were proposed in this pull request? When studying spark many users just copy examples on the documentation and paste on their terminals and because of that the missing backlashes lead them run into some shell errors. The added backslashes avoid that problem for spark users with that behavior. ## How was this patch tested? I generated the documentation locally using jekyll and checked the generated pages Author: Daniel Santana <[email protected]> Closes apache#11699 from danielsan/master. (cherry picked from commit 9f13f0f) Signed-off-by: Andrew Or <[email protected]>

## What changes were proposed in this pull request? JavaUtils.java has methods to convert time and byte strings for internal use, this change renames a variable used in byteStringAs(), from timeError to byteError. Author: Bjorn Jonsson <[email protected]> Closes apache#11695 from bjornjon/master. (cherry picked from commit e06493c) Signed-off-by: Andrew Or <[email protected]>

Merged Apache branch-1.6

Davies Liu and others added 27 commits March 3, 2016 08:43

[SPARK-11515][ML] QuantileDiscretizer should take random seed

5a27129

cc jkbradley Author: Yu ISHIKAWA <[email protected]> Closes apache#9535 from yu-iskw/SPARK-11515. (cherry picked from commit 574571c) Signed-off-by: Sean Owen <[email protected]>

[HOTFIX] fix the conflict when cherry-pick

ffaf7c0

Revert "[SPARK-13760][SQL] Fix BigDecimal constructor for FloatType"

60cb270

This reverts commit 926e9c4.

Merge branch 'branch-1.6' of github.com:apache/spark into temp

041565b

markhamstra added a commit that referenced this pull request Mar 14, 2016

Merge pull request #159 from markhamstra/temp

fe82049

Merged Apache branch-1.6

markhamstra merged commit fe82049 into alteryx:csd-1.6 Mar 14, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merged Apache branch-1.6 #159

Merged Apache branch-1.6 #159

Uh oh!

markhamstra commented Mar 14, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

15 participants

Merged Apache branch-1.6 #159

Merged Apache branch-1.6 #159

Uh oh!

Conversation

markhamstra commented Mar 14, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

15 participants