Skip to content

Conversation

@philipphoffmann
Copy link
Contributor

What changes were proposed in this pull request?

This commit introduces a setting for configuring the Mesos framework failover timeout (spark.mesos.failoverTimeout). The default timeout is 10 seconds.

Before, the timeout was set to 0 (which causes all tasks to be killed by mesos) or Integer.MAX which kindof leaves them hanging forever, depending on how the driver was launched.

How was this patch tested?

unit test

@SparkQA
Copy link

SparkQA commented Sep 2, 2016

Test build #64852 has finished for PR 14936 at commit c15f207.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@philipphoffmann
Copy link
Contributor Author

rebase to master

@SparkQA
Copy link

SparkQA commented Sep 24, 2016

Test build #65873 has finished for PR 14936 at commit 95cce37.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the parameters indent

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new flag needs to get added to the documentation as well

@philipphoffmann
Copy link
Contributor Author

  • fixed indent
  • added documentation

any objections on the default timeout?

@SparkQA
Copy link

SparkQA commented Oct 10, 2016

Test build #66638 has finished for PR 14936 at commit 6441f87.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 10, 2016

Test build #66639 has finished for PR 14936 at commit ee10ad3.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tnachen
Copy link
Contributor

tnachen commented Oct 10, 2016

Hmm it is currently Integer.MAX because we assume the cluster scheduler to be long living, and without setting it be a long value Mesos will automatically terminate that framework when it's disconnected. I think currently all Spark jobs don't have it specified so when it disconnects it's simply removed.

I think we should keep the same semantics and don't impose a default value for everything, we have it left to be 0 in coarse grain scheduler, and default to Int.MAX in cluster scheduler. But user can always override it no matter what.

@philipphoffmann
Copy link
Contributor Author

Alright, I changed the implementation to keep the existing defaults.

@SparkQA
Copy link

SparkQA commented Oct 15, 2016

Test build #67010 has finished for PR 14936 at commit dec052c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tnachen
Copy link
Contributor

tnachen commented Dec 6, 2016

@philipphoffmann Sorry for the long delay, one last ask. Can you add a simple unit test to verify it works?

@philipphoffmann
Copy link
Contributor Author

will do ...

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Feb 9, 2017

(@philipphoffmann gentle ping)

@philipphoffmann
Copy link
Contributor Author

@HyukjinKwon sry, should find some time tomorrow ...

@philipphoffmann
Copy link
Contributor Author

rebase to master; added tests for all three schedulers

@SparkQA
Copy link

SparkQA commented Feb 10, 2017

Test build #72702 has finished for PR 14936 at commit ca7fdad.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 10, 2017

Test build #72704 has finished for PR 14936 at commit 7383817.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 10, 2017

Test build #72707 has finished for PR 14936 at commit 2602f7f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

This commit introduces a setting for configuring the Mesos framework
failover timeout (`spark.mesos.failoverTimeout`).
@philipphoffmann philipphoffmann force-pushed the mesos-failover-timeout branch from 2602f7f to bd9ace4 Compare March 14, 2017 14:52
@philipphoffmann
Copy link
Contributor Author

rebase to master

@SparkQA
Copy link

SparkQA commented Mar 14, 2017

Test build #74529 has finished for PR 14936 at commit bd9ace4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 21, 2018

Test build #87573 has finished for PR 14936 at commit bd9ace4.

  • This patch fails PySpark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Jan 18, 2020
@github-actions github-actions bot closed this Jan 19, 2020
dongjoon-hyun pushed a commit that referenced this pull request Jan 23, 2024
### What changes were proposed in this pull request?
This pr aims to upgrade Arrow from 14.0.2 to 15.0.0, this version fixes the compatibility issue with Netty 4.1.104.Final(GH-39265).

Additionally, since the `arrow-vector` module uses `eclipse-collections` to replace `netty-common` as a compile-level dependency, Apache Spark has added a dependency on `eclipse-collections` after upgrading to use Arrow 15.0.0.

### Why are the changes needed?
The new version brings the following major changes:

Bug Fixes
GH-34610 - [Java] Fix valueCount and field name when loading/transferring NullVector
GH-38242 - [Java] Fix incorrect internal struct accounting for DenseUnionVector#getBufferSizeFor
GH-38254 - [Java] Add reusable buffer getters to char/binary vectors
GH-38366 - [Java] Fix Murmur hash on buffers less than 4 bytes
GH-38387 - [Java] Fix JDK8 compilation issue with TestAllTypes
GH-38614 - [Java] Add VarBinary and VarCharWriter helper methods to more writers
GH-38725 - [Java] decompression in Lz4CompressionCodec.java does not set writer index

New Features and Improvements
GH-38511 - [Java] Add getTransferPair(Field, BufferAllocator, CallBack) for StructVector and MapVector
GH-14936 - [Java] Remove netty dependency from arrow-vector
GH-38990 - [Java] Upgrade to flatc version 23.5.26
GH-39265 - [Java] Make it run well with the netty newest version 4.1.104

The full release notes as follows:

- https://arrow.apache.org/release/15.0.0.html

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #44797 from LuciferYang/SPARK-46718.

Authored-by: yangjie01 <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants