-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-7877][MESOS] Allow configuration of framework timeout #14936
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-7877][MESOS] Allow configuration of framework timeout #14936
Conversation
|
Test build #64852 has finished for PR 14936 at commit
|
c15f207 to
95cce37
Compare
|
rebase to master |
|
Test build #65873 has finished for PR 14936 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the parameters indent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new flag needs to get added to the documentation as well
95cce37 to
6441f87
Compare
any objections on the default timeout? |
|
Test build #66638 has finished for PR 14936 at commit
|
6441f87 to
ee10ad3
Compare
|
Test build #66639 has finished for PR 14936 at commit
|
|
Hmm it is currently Integer.MAX because we assume the cluster scheduler to be long living, and without setting it be a long value Mesos will automatically terminate that framework when it's disconnected. I think currently all Spark jobs don't have it specified so when it disconnects it's simply removed. I think we should keep the same semantics and don't impose a default value for everything, we have it left to be 0 in coarse grain scheduler, and default to Int.MAX in cluster scheduler. But user can always override it no matter what. |
ee10ad3 to
dec052c
Compare
|
Alright, I changed the implementation to keep the existing defaults. |
|
Test build #67010 has finished for PR 14936 at commit
|
|
@philipphoffmann Sorry for the long delay, one last ask. Can you add a simple unit test to verify it works? |
|
will do ... |
|
(@philipphoffmann gentle ping) |
|
@HyukjinKwon sry, should find some time tomorrow ... |
dec052c to
ca7fdad
Compare
|
rebase to master; added tests for all three schedulers |
ca7fdad to
7383817
Compare
|
Test build #72702 has finished for PR 14936 at commit
|
|
Test build #72704 has finished for PR 14936 at commit
|
7383817 to
2602f7f
Compare
|
Test build #72707 has finished for PR 14936 at commit
|
This commit introduces a setting for configuring the Mesos framework failover timeout (`spark.mesos.failoverTimeout`).
2602f7f to
bd9ace4
Compare
|
rebase to master |
|
Test build #74529 has finished for PR 14936 at commit
|
|
Test build #87573 has finished for PR 14936 at commit
|
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
### What changes were proposed in this pull request? This pr aims to upgrade Arrow from 14.0.2 to 15.0.0, this version fixes the compatibility issue with Netty 4.1.104.Final(GH-39265). Additionally, since the `arrow-vector` module uses `eclipse-collections` to replace `netty-common` as a compile-level dependency, Apache Spark has added a dependency on `eclipse-collections` after upgrading to use Arrow 15.0.0. ### Why are the changes needed? The new version brings the following major changes: Bug Fixes GH-34610 - [Java] Fix valueCount and field name when loading/transferring NullVector GH-38242 - [Java] Fix incorrect internal struct accounting for DenseUnionVector#getBufferSizeFor GH-38254 - [Java] Add reusable buffer getters to char/binary vectors GH-38366 - [Java] Fix Murmur hash on buffers less than 4 bytes GH-38387 - [Java] Fix JDK8 compilation issue with TestAllTypes GH-38614 - [Java] Add VarBinary and VarCharWriter helper methods to more writers GH-38725 - [Java] decompression in Lz4CompressionCodec.java does not set writer index New Features and Improvements GH-38511 - [Java] Add getTransferPair(Field, BufferAllocator, CallBack) for StructVector and MapVector GH-14936 - [Java] Remove netty dependency from arrow-vector GH-38990 - [Java] Upgrade to flatc version 23.5.26 GH-39265 - [Java] Make it run well with the netty newest version 4.1.104 The full release notes as follows: - https://arrow.apache.org/release/15.0.0.html ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #44797 from LuciferYang/SPARK-46718. Authored-by: yangjie01 <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
This commit introduces a setting for configuring the Mesos framework failover timeout (
spark.mesos.failoverTimeout). The default timeout is 10 seconds.Before, the timeout was set to 0 (which causes all tasks to be killed by mesos) or Integer.MAX which kindof leaves them hanging forever, depending on how the driver was launched.
How was this patch tested?
unit test