Merged Apache branch-1.6 #178

markhamstra · 2016-12-07T21:56:28Z

No description provided.

From the original commit message: This PR also fixes a regression caused by [SPARK-10987] whereby submitting a shutdown causes a race between the local shutdown procedure and the notification of the scheduler driver disconnection. If the scheduler driver disconnection wins the race, the coarse executor incorrectly exits with status 1 (instead of the proper status 0) Author: Charles Allen <charlesallen-net.com> (cherry picked from commit 2eaeafe) Author: Charles Allen <[email protected]> Closes apache#15270 from vanzin/SPARK-17696.

…atrix with SparseVector Backport PR of changes relevant to mllib only, but otherwise identical to apache#15296 jkbradley Author: Bjarne Fruergaard <[email protected]> Closes apache#15311 from bwahlgreen/bugfix-spark-17721-1.6.

This backports apache@733cbaa to Branch 1.6. It's a pretty simple patch, and would be nice to have for Spark 1.6.3. Unit tests Author: Burak Yavuz <[email protected]> Closes apache#15380 from brkyvz/bp-SPARK-15062. Signed-off-by: Michael Armbrust <[email protected]>

## What changes were proposed in this pull request? This is the patch for 1.6. It only adds Spark conf `spark.files.ignoreCorruptFiles` because SQL just uses HadoopRDD directly in 1.6. `spark.files.ignoreCorruptFiles` is `true` by default. ## How was this patch tested? The added test. Author: Shixiong Zhu <[email protected]> Closes apache#15454 from zsxwing/SPARK-17850-1.6.

…cala-2.11 repl ## What changes were proposed in this pull request? Spark 1.6 Scala-2.11 repl doesn't honor "spark.replClassServer.port" configuration, so user cannot set a fixed port number through "spark.replClassServer.port". ## How was this patch tested? N/A Author: jerryshao <[email protected]> Closes apache#15253 from jerryshao/SPARK-17678.

…m empty string to interval type ## What changes were proposed in this pull request? This change adds a check in castToInterval method of Cast expression , such that if converted value is null , then isNull variable should be set to true. Earlier, the expression Cast(Literal(), CalendarIntervalType) was throwing NullPointerException because of the above mentioned reason. ## How was this patch tested? Added test case in CastSuite.scala jira entry for detail: https://issues.apache.org/jira/browse/SPARK-17884 Author: prigarg <[email protected]> Closes apache#15479 from priyankagargnitk/cast_empty_string_bug.

…ld not depends on local timezone ## What changes were proposed in this pull request? Back-port of apache#13784 to `branch-1.6` ## How was this patch tested? Existing tests. Author: Davies Liu <[email protected]> Closes apache#15554 from srowen/SPARK-16078.

…executor loss ## What changes were proposed in this pull request? _This is the master branch-1.6 version of apache#15986; the original description follows:_ This patch fixes a critical resource leak in the TaskScheduler which could cause RDDs and ShuffleDependencies to be kept alive indefinitely if an executor with running tasks is permanently lost and the associated stage fails. This problem was originally identified by analyzing the heap dump of a driver belonging to a cluster that had run out of shuffle space. This dump contained several `ShuffleDependency` instances that were retained by `TaskSetManager`s inside the scheduler but were not otherwise referenced. Each of these `TaskSetManager`s was considered a "zombie" but had no running tasks and therefore should have been cleaned up. However, these zombie task sets were still referenced by the `TaskSchedulerImpl.taskIdToTaskSetManager` map. Entries are added to the `taskIdToTaskSetManager` map when tasks are launched and are removed inside of `TaskScheduler.statusUpdate()`, which is invoked by the scheduler backend while processing `StatusUpdate` messages from executors. The problem with this design is that a completely dead executor will never send a `StatusUpdate`. There is [some code](https://github.com/apache/spark/blob/072f4c518cdc57d705beec6bcc3113d9a6740819/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L338) in `statusUpdate` which handles tasks that exit with the `TaskState.LOST` state (which is supposed to correspond to a task failure triggered by total executor loss), but this state only seems to be used in Mesos fine-grained mode. There doesn't seem to be any code which performs per-task state cleanup for tasks that were running on an executor that completely disappears without sending any sort of final death message. The `executorLost` and [`removeExecutor`](https://github.com/apache/spark/blob/072f4c518cdc57d705beec6bcc3113d9a6740819/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L527) methods don't appear to perform any cleanup of the `taskId -> *` mappings, causing the leaks observed here. This patch's fix is to maintain a `executorId -> running task id` mapping so that these `taskId -> *` maps can be properly cleaned up following an executor loss. There are some potential corner-case interactions that I'm concerned about here, especially some details in [the comment](https://github.com/apache/spark/blob/072f4c518cdc57d705beec6bcc3113d9a6740819/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L523) in `removeExecutor`, so I'd appreciate a very careful review of these changes. ## How was this patch tested? I added a new unit test to `TaskSchedulerImplSuite`. /cc kayousterhout and markhamstra, who reviewed apache#15986. Author: Josh Rosen <[email protected]> Closes apache#16070 from JoshRosen/fix-leak-following-total-executor-loss-1.6.

…functions No tests done for JDBCRDD#compileFilter. Author: Takeshi YAMAMURO <linguin.m.sgmail.com> Closes apache#10409 from maropu/AddTestsInJdbcRdd. (cherry picked from commit 8c1b867) Author: Takeshi YAMAMURO <[email protected]> Closes apache#16124 from dongjoon-hyun/SPARK-12446-BRANCH-1.6.

* Fix pom versioning * fix k8s versions in pom * Change pom string to 2.1.0-k8s-0.1.0-SNAPSHOT

drcrallen and others added 17 commits September 28, 2016 14:39

Merge branch 'branch-1.6' of github.com:apache/spark into csd-1.6

903cc92

Prepare branch-1.6 for 1.6.3 release.

0f57785

Preparing Spark release v1.6.3

7375bb0

Preparing development version 1.6.4-SNAPSHOT

b95ac0d

Merge branch 'branch-1.6' of github.com:apache/spark into csd-1.6

4f9c026

Preparing Spark release v1.6.3-rc2

1e86074

Preparing development version 1.6.4-SNAPSHOT

9136e26

Merge branch 'branch-1.6' of github.com:apache/spark into csd-1.6

91c9700

markhamstra merged commit 6e1c2ad into alteryx:csd-1.6 Dec 7, 2016

markhamstra pushed a commit to markhamstra/spark that referenced this pull request Nov 7, 2017

Fix pom versions (alteryx#178)

a9f1d6e

* Fix pom versioning * fix k8s versions in pom * Change pom string to 2.1.0-k8s-0.1.0-SNAPSHOT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merged Apache branch-1.6 #178

Merged Apache branch-1.6 #178

Uh oh!

markhamstra commented Dec 7, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

Merged Apache branch-1.6 #178

Merged Apache branch-1.6 #178

Uh oh!

Conversation

markhamstra commented Dec 7, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants