Skip to content

Conversation

@sunpe
Copy link

@sunpe sunpe commented Jun 30, 2021

SPARK-35949

What changes were proposed in this pull request?

From v3.1, spark context will close after invoking main method. In some case, it is necessary to keep spark context alive. Such as start app as a server. I add keep-saprk-context-alive arg to set whether should keep spark context alive after main method.

Why are the changes needed?

Due to pr c625eb4#diff-f8564df81d845c0cd2f621bc2ed22761cbf9731f28cb2828d9cbd0491f4e7584. In client mode, the spark context will be stopped on application start. it is necessary to keep spark context alive. Such as start app as a server

Does this PR introduce any user-facing change?

Yes. Added a keep-saprk-context-alive args to set keeping the spark context alive until the app exit. Usage spark-submit --keep-saprk-context-alive true

How was this patch tested?

Manually test.

@github-actions github-actions bot added the CORE label Jun 30, 2021
@sunpe sunpe closed this Jun 30, 2021
@sunpe sunpe reopened this Jun 30, 2021
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Jul 1, 2021

cc @kotlovs, @dongjoon-hyun, @mridulm FYI

@kotlovs
Copy link
Contributor

kotlovs commented Jul 1, 2021

@sunpe Could you please tell in more details, what is the problem with client mode? This code is called after exit from main() method, when the application is moving towards termination.
As far as I now, in k8s driver pod, spark-submit is executed with --deploy-mode client, and if I understand correctly, proposed changes will return back SPARK-34674 problem.

@mridulm
Copy link
Contributor

mridulm commented Jul 1, 2021

Agree with @kotlovs - this is happening when application is terminating.
Projects like Apache Livy are already depending on client mode currently, so I would like to understand what is the change in behavior we are expected here.

@sunpe
Copy link
Author

sunpe commented Jul 1, 2021

hello @kotlovs , @mridulm .

Issue SPARK-34674 said the spark context could not close on k8s. But this pr, will close context only not shell or thrift server.

In my case, I use springboot and spark together to create a web app. This app wait for user's request, and do some job on spark. I regist 'SparkSession' object as a spring bean. Like this.

@Bean
@ConditionalOnMissingBean(SparkSession.class)
public SparkSession sparkSession(SparkConf conf) {
    return SparkSession.builder()
    .enableHiveSupport()
    .config(conf)
    .getOrCreate();
}

And then Pack the spring application as a jar. Use the spark submit command to start my application. While application is started up, the spark context will stop.
soonly.

As testing, I add 3 log in core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala.

try {
    logInfo("## app start ")
    app.start(childArgs.toArray, sparkConf)
    logInfo("## app start over ")
} catch {
    case t: Throwable =>
    throw findCause(t)
} finally {
    logInfo("## finally clean spark context ")
    if (!isShell(args.primaryResource) && !isSqlShell(args.mainClass) &&
    !isThriftServer(args.mainClass)) {
    try {
    SparkContext.getActive.foreach(_.stop())
    } catch {
    case e: Throwable => logError(s"Failed to close SparkContext: $e")
    }
    }
}

While application started, app logs will like this. While the app started, the spark context will stop soonly.

## app start

  .   ____          _            __ _ _
 /\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
 \\/  ___)| |_)| | | | | || (_| |  ) ) ) )
  '  |____| .__|_| |_|_| |_\__, | / / / /
 =========|_|==============|___/=/_/_/_/
 :: Spring Boot ::                (v2.5.2)

...
21/07/01 15:05:10 INFO YarnClientSchedulerBackend: Application application_1619358582322_6529891 has started running.
21/07/01 15:05:10 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 40461.
...
21/07/01 15:05:10 INFO SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
21/07/01 15:05:10 INFO Utils: Using initial executors = 1, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
21/07/01 15:05:10 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
21/07/01 15:05:11 INFO YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000000000(ns)
21/07/01 15:05:11 INFO WelcomePageHandlerMapping: Adding welcome page template: index
21/07/01 15:05:12 INFO Http11NioProtocol: Starting ProtocolHandler ["http-nio-9000"]
21/07/01 15:05:12 INFO TomcatWebServer: Tomcat started on port(s): 9000 (http) with context path ''
21/07/01 15:05:12 INFO SpringApplication: Started application in 2939.592 seconds (JVM running for 2942.937)
## app start over
## finally clean spark context
21/07/01 15:05:12 INFO AbstractConnector: Stopped Spark@23cd4246{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
21/07/01 15:05:12 INFO SparkUI: Stopped Spark web UI at http://xxxxxxxxx:4040
21/07/01 15:05:12 INFO YarnClientSchedulerBackend: Interrupting monitor thread
21/07/01 15:05:12 INFO YarnClientSchedulerBackend: Shutting down all executors
21/07/01 15:05:12 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
21/07/01 15:05:12 INFO YarnClientSchedulerBackend: YARN client scheduler backend Stopped
21/07/01 15:05:12 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
21/07/01 15:05:13 INFO MemoryStore: MemoryStore cleared
21/07/01 15:05:13 INFO BlockManager: BlockManager stopped
21/07/01 15:05:13 INFO BlockManagerMaster: BlockManagerMaster stopped
21/07/01 15:05:13 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
21/07/01 15:05:13 INFO SparkContext: Successfully stopped SparkContext

And after fixes.

## app start

  .   ____          _            __ _ _
 /\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
 \\/  ___)| |_)| | | | | || (_| |  ) ) ) )
  '  |____| .__|_| |_|_| |_\__, | / / / /
 =========|_|==============|___/=/_/_/_/
 :: Spring Boot ::                (v2.5.2)

 ....
21/07/01 16:05:51 INFO YarnClientSchedulerBackend: Application application_1619358582322_6532933 has started running.
21/07/01 16:05:51 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42549.
21/07/01 16:05:51 INFO NettyBlockTransferService: Server created on xxxxxxxxx:42549
21/07/01 16:05:51 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/07/01 16:05:51 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, xxxxxxx, 42549, None)
21/07/01 16:05:51 INFO BlockManagerMasterEndpoint: Registering block manager xxxxxxxxx:42549 with 5.2 GiB RAM, BlockManagerId(driver, xxxxxxx, 42549, None)
21/07/01 16:05:51 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, xxxxxxxx, 42549, None)
21/07/01 16:05:51 INFO BlockManager: external shuffle service port = 7337
21/07/01 16:05:51 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, xxxxxxxx, 42549, None)
21/07/01 16:05:51 INFO ServerInfo: Adding filter to /metrics/json: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
21/07/01 16:05:51 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@60510791{/metrics/json,null,AVAILABLE,@Spark}
21/07/01 16:05:52 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM)
21/07/01 16:05:54 INFO Utils: Using initial executors = 1, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
21/07/01 16:05:54 INFO YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after waiting maxRegisteredResourcesWaitingTime: 30000000000(ns)
21/07/01 16:05:55 INFO WelcomePageHandlerMapping: Adding welcome page template: index
21/07/01 16:05:55 INFO Http11NioProtocol: Starting ProtocolHandler ["http-nio-9000"]
21/07/01 16:05:55 INFO TomcatWebServer: Tomcat started on port(s): 9000 (http) with context path ''
21/07/01 16:05:55 INFO SpringApplication: Started application in 848.362 seconds (JVM running for 851.343)
## app start over
## finally clean spark context

@kotlovs
Copy link
Contributor

kotlovs commented Jul 1, 2021

Thanks for detailed explanation. Now I understand your problem. You create a server inside the app and await user's requests.
The similar thing occurs when ThriftServer is started and it is reason, why these conditions (like !isThriftServer(args.mainClass)) were added. But as I understood, unlike ThriftServer, your case can't be checked in SparkSubmit.
I would not bind the closing of the context to k8s, even if there is not such problem in other modes like yarn.
Instead, I would suggest to introduce another spark application arg, for example --isServer (by default - false). Then you will pass it flag in your spark-submit command (spark-submit --is-server ...).
And this flag will be checked when closing the spark context:

if (!args.isServer && !isShell(args.primaryResource) &&
    !isSqlShell(args.mainClass) && !isThriftServer(args.mainClass)) {
        try {
          SparkContext.getActive.foreach(_.stop())
		  ...

I just think it would be a more universal solution. I even already have code adding such flag in one of my branches. I could make this PR if this solution would be acceptable.

@sunpe
Copy link
Author

sunpe commented Jul 2, 2021

Hello @kotlovs. Thanks for reply. After discussing with my partner, I think adding another arg is a good solution.
I will finish this solution few days. Thanks for your suggestion.

@github-actions github-actions bot added the PYTHON label Jul 2, 2021
@sunpe sunpe changed the title [SPARK-35949][CORE]Fixes bug for sparkContext stopped on client mode [SPARK-35949][CORE]Add is-server arg for to prevent closing spark context on server mode Jul 2, 2021
@sunpe sunpe changed the title [SPARK-35949][CORE]Add is-server arg for to prevent closing spark context on server mode [SPARK-35949][CORE]Add is-server arg for to prevent closing spark context when starting as a server. Jul 2, 2021
@sunpe sunpe changed the title [SPARK-35949][CORE]Add is-server arg for to prevent closing spark context when starting as a server. [SPARK-35949][CORE]Add keep-saprk-context-alive arg for to prevent closing spark context after invoking main for some case Jul 5, 2021
@HyukjinKwon HyukjinKwon changed the title [SPARK-35949][CORE]Add keep-saprk-context-alive arg for to prevent closing spark context after invoking main for some case [SPARK-35949][CORE]Add keep-spark-context-alive arg for to prevent closing spark context after invoking main for some case Jul 5, 2021
@sunpe
Copy link
Author

sunpe commented Jul 6, 2021

Hello @kotlovs . I added a arg called keep-spark-context-alive to set whether should keep spark context alive after invoke main method.

sunpeng01 and others added 5 commits July 6, 2021 11:55
…psWithState in Structured Streaming

This PR aims to add support for specifying a user defined initial state for arbitrary structured streaming stateful processing using [flat]MapGroupsWithState operator.

Users can load previous state of their stateful processing as an initial state instead of redoing the entire processing once again.

Yes this PR introduces new API
```
  def mapGroupsWithState[S: Encoder, U: Encoder](
      timeoutConf: GroupStateTimeout,
      initialState: KeyValueGroupedDataset[K, S])(
      func: (K, Iterator[V], GroupState[S]) => U): Dataset[U]

  def flatMapGroupsWithState[S: Encoder, U: Encoder](
      outputMode: OutputMode,
      timeoutConf: GroupStateTimeout,
      initialState: KeyValueGroupedDataset[K, S])(
      func: (K, Iterator[V], GroupState[S]) => Iterator[U])

```

Through unit tests in FlatMapGroupsWithStateSuite

Closes #33093 from rahulsmahadev/flatMapGroupsWithState.

Authored-by: Rahul Mahadev <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Oct 25, 2021
@github-actions github-actions bot closed this Oct 26, 2021
@clementguillot
Copy link

Hello, this keep-spark-context-alive could be very useful when working with DI and injected services that use SparkSession.

Any chance to see this PR merged?

@sunpe
Copy link
Author

sunpe commented Sep 19, 2022

Hello, this keep-spark-context-alive could be very useful when working with DI and injected services that use SparkSession.

Any chance to see this PR merged?

Hello @clementguillot. Thank you for attention. This bug has been fixed in v3.2.0. Please refer to this code https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L963 for more detail.

@clementguillot
Copy link

Hello @sunpe, thank you for your very fast answer.

Please let me give you some more context, I am using Spark v3.3.0 in K8s using Spark on K8S operator.
My Spark driver is a Spring Boot application, very similar to the situation you described above, but instead of starting a web server, I subscribe to a RabbitMQ queue.
Basically, I would like to have the SparkSession available as @Bean in my services.

I tried the following:

  1. Activate --verbose
  2. Rebuild from source (v3.3.0), adding a simple logWarning before SparkContext.getActive.foreach(_.stop())

Here are my findings:

Spark Operator pod submits the application using this command:

submission.go:65] spark-submit arguments: [/opt/spark/bin/spark-submit
--class org.test.myapplication.MyApplication
--master k8s://https://10.32.0.1:443
--deploy-mode cluster
--conf spark.kubernetes.namespace=my-app-namespace
--conf spark.app.name=my-application
--conf spark.kubernetes.driver.pod.name=my-application-driver
--conf spark.kubernetes.container.image=repo/my-application:latest
--conf spark.kubernetes.container.image.pullPolicy=Always
--conf spark.kubernetes.submission.waitAppCompletion=false
--conf spark.kubernetes.driver.label.sparkoperator.k8s.io/app-name=my-application
--conf spark.kubernetes.driver.label.sparkoperator.k8s.io/launched-by-spark-operator=true
--conf spark.kubernetes.driver.label.sparkoperator.k8s.io/submission-id=2b225916-cc12-47cc-a898-8549301fdce4
--conf spark.driver.cores=1
--conf spark.kubernetes.driver.request.cores=200m
--conf spark.driver.memory=512m
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-operator-spark
--conf spark.kubernetes.executor.label.sparkoperator.k8s.io/app-name=my-application
--conf spark.kubernetes.executor.label.sparkoperator.k8s.io/launched-by-spark-operator=true
--conf spark.kubernetes.executor.label.sparkoperator.k8s.io/submission-id=2b225916-cc12-47cc-a898-8549301fdce4
--conf spark.executor.instances=1
--conf spark.executor.cores=1
--conf spark.executor.memory=512m
--conf spark.kubernetes.executor.label.app.kubernetes.io/instance=my-app-namespace-my-application
local:///opt/spark/work-dir/my-application-0.0.1-SNAPSHOT-all.jar]

When Driver pod starts, I have the following logs:

(spark.driver.memory,512m)
(spark.driver.port,7078)
(spark.executor.cores,1)
(spark.executor.instances,1)
(spark.executor.memory,512m)
[…]
(spark.kubernetes.resource.type,java)
(spark.kubernetes.submission.waitAppCompletion,false)
(spark.kubernetes.submitInDriver,true)
(spark.master,k8s://https://10.32.0.1:443)

As you can see, args.master starts by k8s.
Once the application is started and main() thread is release, my custom log is printed, SparkContext is being closed and the executor is stopped.
As I understand in source code, my primary resource is not spark-shell, neither the main class is org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver nor org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.
It produces the following logs:

2022-09-19 13:05:18.621 INFO  30 --- [           main] o.s.a.r.c.CachingConnectionFactory       : Created new connection: rabbitConnectionFactory#1734b1a:0/SimpleConnection@66859ea9 [delegate=amqp://[email protected]:5672/, localPort= 49038]
2022-09-19 13:05:18.850 INFO  30 --- [           main] MyApplication : Started MyApplication in 18.787 seconds (JVM running for 23.051)
Warning: SparkContext is going to be stopped!
2022-09-19 13:05:18.903 INFO  30 --- [           main] o.s.j.s.AbstractConnector                : Stopped Spark@45d389f2{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
2022-09-19 13:05:18.914 INFO  30 --- [           main] o.a.s.u.SparkUI                          : Stopped Spark web UI at http://my-app-ee12f78355d97dc2-driver-svc.my-app-namespace.svc:4040
2022-09-19 13:05:18.927 INFO  30 --- [           main] .s.c.k.KubernetesClusterSchedulerBackend : Shutting down all executors
2022-09-19 13:05:18.928 INFO  30 --- [rainedScheduler] chedulerBackend$KubernetesDriverEndpoint : Asking each executor to shut down
2022-09-19 13:05:18.938 DEBUG 30 --- [ rpc-server-4-1] o.a.s.n.u.NettyLogger                    : [id: 0x07e91ece, L:/100.64.15.15:7078 - R:/100.64.15.215:40996] WRITE: MessageWithHeader [headerLength: 13, bodyLength: 198]
2022-09-19 13:05:18.939 DEBUG 30 --- [ rpc-server-4-1] o.a.s.n.u.NettyLogger                    : [id: 0x07e91ece, L:/100.64.15.15:7078 - R:/100.64.15.215:40996] FLUSH
2022-09-19 13:05:18.952 WARN  30 --- [           main] .s.s.c.k.ExecutorPodsWatchSnapshotSource : Kubernetes client has been closed.
2022-09-19 13:05:22.190 DEBUG 30 --- [ rpc-server-4-1] o.a.s.n.u.NettyLogger                    : [id: 0x07e91ece, L:/100.64.15.15:7078 - R:/100.64.15.215:40996] READ 266B

Do you have a clue why the context is still being closed?

@sunpe
Copy link
Author

sunpe commented Sep 20, 2022

Hello @sunpe, thank you for your very fast answer.

Please let me give you some more context, I am using Spark v3.3.0 in K8s using Spark on K8S operator. My Spark driver is a Spring Boot application, very similar to the situation you described above, but instead of starting a web server, I subscribe to a RabbitMQ queue. Basically, I would like to have the SparkSession available as @Bean in my services.

I tried the following:

  1. Activate --verbose
  2. Rebuild from source (v3.3.0), adding a simple logWarning before SparkContext.getActive.foreach(_.stop())

Here are my findings:

Spark Operator pod submits the application using this command:

submission.go:65] spark-submit arguments: [/opt/spark/bin/spark-submit
--class org.test.myapplication.MyApplication
--master k8s://https://10.32.0.1:443
--deploy-mode cluster
--conf spark.kubernetes.namespace=my-app-namespace
--conf spark.app.name=my-application
--conf spark.kubernetes.driver.pod.name=my-application-driver
--conf spark.kubernetes.container.image=repo/my-application:latest
--conf spark.kubernetes.container.image.pullPolicy=Always
--conf spark.kubernetes.submission.waitAppCompletion=false
--conf spark.kubernetes.driver.label.sparkoperator.k8s.io/app-name=my-application
--conf spark.kubernetes.driver.label.sparkoperator.k8s.io/launched-by-spark-operator=true
--conf spark.kubernetes.driver.label.sparkoperator.k8s.io/submission-id=2b225916-cc12-47cc-a898-8549301fdce4
--conf spark.driver.cores=1
--conf spark.kubernetes.driver.request.cores=200m
--conf spark.driver.memory=512m
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-operator-spark
--conf spark.kubernetes.executor.label.sparkoperator.k8s.io/app-name=my-application
--conf spark.kubernetes.executor.label.sparkoperator.k8s.io/launched-by-spark-operator=true
--conf spark.kubernetes.executor.label.sparkoperator.k8s.io/submission-id=2b225916-cc12-47cc-a898-8549301fdce4
--conf spark.executor.instances=1
--conf spark.executor.cores=1
--conf spark.executor.memory=512m
--conf spark.kubernetes.executor.label.app.kubernetes.io/instance=my-app-namespace-my-application
local:///opt/spark/work-dir/my-application-0.0.1-SNAPSHOT-all.jar]

When Driver pod starts, I have the following logs:

(spark.driver.memory,512m)
(spark.driver.port,7078)
(spark.executor.cores,1)
(spark.executor.instances,1)
(spark.executor.memory,512m)
[…]
(spark.kubernetes.resource.type,java)
(spark.kubernetes.submission.waitAppCompletion,false)
(spark.kubernetes.submitInDriver,true)
(spark.master,k8s://https://10.32.0.1:443)

As you can see, args.master starts by k8s. Once the application is started and main() thread is release, my custom log is printed, SparkContext is being closed and the executor is stopped. As I understand in source code, my primary resource is not spark-shell, neither the main class is org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver nor org.apache.spark.sql.hive.thriftserver.HiveThriftServer2. It produces the following logs:

2022-09-19 13:05:18.621 INFO  30 --- [           main] o.s.a.r.c.CachingConnectionFactory       : Created new connection: rabbitConnectionFactory#1734b1a:0/SimpleConnection@66859ea9 [delegate=amqp://[email protected]:5672/, localPort= 49038]
2022-09-19 13:05:18.850 INFO  30 --- [           main] MyApplication : Started MyApplication in 18.787 seconds (JVM running for 23.051)
Warning: SparkContext is going to be stopped!
2022-09-19 13:05:18.903 INFO  30 --- [           main] o.s.j.s.AbstractConnector                : Stopped Spark@45d389f2{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
2022-09-19 13:05:18.914 INFO  30 --- [           main] o.a.s.u.SparkUI                          : Stopped Spark web UI at http://my-app-ee12f78355d97dc2-driver-svc.my-app-namespace.svc:4040
2022-09-19 13:05:18.927 INFO  30 --- [           main] .s.c.k.KubernetesClusterSchedulerBackend : Shutting down all executors
2022-09-19 13:05:18.928 INFO  30 --- [rainedScheduler] chedulerBackend$KubernetesDriverEndpoint : Asking each executor to shut down
2022-09-19 13:05:18.938 DEBUG 30 --- [ rpc-server-4-1] o.a.s.n.u.NettyLogger                    : [id: 0x07e91ece, L:/100.64.15.15:7078 - R:/100.64.15.215:40996] WRITE: MessageWithHeader [headerLength: 13, bodyLength: 198]
2022-09-19 13:05:18.939 DEBUG 30 --- [ rpc-server-4-1] o.a.s.n.u.NettyLogger                    : [id: 0x07e91ece, L:/100.64.15.15:7078 - R:/100.64.15.215:40996] FLUSH
2022-09-19 13:05:18.952 WARN  30 --- [           main] .s.s.c.k.ExecutorPodsWatchSnapshotSource : Kubernetes client has been closed.
2022-09-19 13:05:22.190 DEBUG 30 --- [ rpc-server-4-1] o.a.s.n.u.NettyLogger                    : [id: 0x07e91ece, L:/100.64.15.15:7078 - R:/100.64.15.215:40996] READ 266B

Do you have a clue why the context is still being closed?

hello @clementguillot . I understand the problem now. This bug is Introduced in this commit c625eb4 in version v3.1. In your case, starting a server in spark on k8s is still has some problem. Because this commit fd3e9ce. This code if (args.master.startsWith("k8s") && !isShell(args.primaryResource) && !isSqlShell(args.mainClass) && !isThriftServer(args.mainClass)){ in https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L963 is not work in your case. In my view, the fast way to fix this case is changing the spark version to v3.0.

In my view, we should delete code in SparkSubmit.scala L963-L969 , and we should stop spark context in signal hook. @dongjoon-hyun .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants