-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-39468][CORE] Improve RpcAddress to add [] to IPv6 if needed
#36868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Could you review this, @Ngone51 ? |
| } | ||
|
|
||
| test("SPARK-39468: IPv6 hostPort") { | ||
| val address = RpcAddress("::1", 1234) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the main use case of this PR.
IPv6 in RpcAddressRpcAddress to add [] to IPv6 if needed
|
cc @cloud-fan and @HyukjinKwon , too |
|
Thank you so much, @cloud-fan . Merged to master |
| private[spark] case class RpcAddress(host: String, port: Int) { | ||
| private[spark] case class RpcAddress(_host: String, port: Int) { | ||
|
|
||
| val host: String = Utils.addBracketsIfNeeded(_host) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be a lazy val instead of val ?
We end up almost doubling the size of the serialized instance of this object otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya, of course, we can. Thank you for review, @mridulm .
Actually, at the first commit, I used def host first to save the size . Later I switched to val in the second commit.
If the memory consumption of RpcAddress instances matter, we can revert back def host instead of lazy val.
Among three options, which one do you prefer, @mridulm and @cloud-fan ?
def host(Initial commit)lazy val host(New alternative)val host(AS-IS)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer lazy val - it keeps the serialized size the same, while making the runtime cost of accessing host is almost the same as before (which is increased by def host).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @mridulm . Here is the followup.
### What changes were proposed in this pull request? This PR aims to use `lazy val host` instead of `val host`. ### Why are the changes needed? To address the review comments about `RpcAddress` object size. - #36868 (comment) - #36868 (comment) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. Closes #36882 from dongjoon-hyun/SPARK-39468-2. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
Resolves kubeflow#1344 Spark 3.4 supports IPv6: - apache/spark#36868 So I want to make the operator support IPv6. I can confirm that this can submit the spark-job in IPv6-only environment. Although it is necessary to add the following environment variables to the operator ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: spark-on-k8s-spark-operator spec: template: spec: containers: - name: spark-operator env: - name: _JAVA_OPTIONS value: "-Djava.net.preferIPv6Addresses=true" - name: KUBERNETES_DISABLE_HOSTNAME_VERIFICATION value: "true" ```
Resolves #1344 Spark 3.4 supports IPv6: - apache/spark#36868 So I want to make the operator support IPv6. I can confirm that this can submit the spark-job in IPv6-only environment. Although it is necessary to add the following environment variables to the operator ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: spark-on-k8s-spark-operator spec: template: spec: containers: - name: spark-operator env: - name: _JAVA_OPTIONS value: "-Djava.net.preferIPv6Addresses=true" - name: KUBERNETES_DISABLE_HOSTNAME_VERIFICATION value: "true" ```
Resolves kubeflow#1344 Spark 3.4 supports IPv6: - apache/spark#36868 So I want to make the operator support IPv6. I can confirm that this can submit the spark-job in IPv6-only environment. Although it is necessary to add the following environment variables to the operator ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: spark-on-k8s-spark-operator spec: template: spec: containers: - name: spark-operator env: - name: _JAVA_OPTIONS value: "-Djava.net.preferIPv6Addresses=true" - name: KUBERNETES_DISABLE_HOSTNAME_VERIFICATION value: "true" ``` Signed-off-by: Peter McClonski <[email protected]>
Resolves kubeflow#1344 Spark 3.4 supports IPv6: - apache/spark#36868 So I want to make the operator support IPv6. I can confirm that this can submit the spark-job in IPv6-only environment. Although it is necessary to add the following environment variables to the operator ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: spark-on-k8s-spark-operator spec: template: spec: containers: - name: spark-operator env: - name: _JAVA_OPTIONS value: "-Djava.net.preferIPv6Addresses=true" - name: KUBERNETES_DISABLE_HOSTNAME_VERIFICATION value: "true" ```
Resolves kubeflow#1344 Spark 3.4 supports IPv6: - apache/spark#36868 So I want to make the operator support IPv6. I can confirm that this can submit the spark-job in IPv6-only environment. Although it is necessary to add the following environment variables to the operator ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: spark-on-k8s-spark-operator spec: template: spec: containers: - name: spark-operator env: - name: _JAVA_OPTIONS value: "-Djava.net.preferIPv6Addresses=true" - name: KUBERNETES_DISABLE_HOSTNAME_VERIFICATION value: "true" ```
What changes were proposed in this pull request?
This PR aims to extend the
IPv6support inRpcAddressadditionally when the input doesn't have[]properly.Why are the changes needed?
Note that Apache Spark already depends on
java.net.URIgetHostandgetPortand it assumpts[]-style IPv6. This PR additionally handles the case where the given host string doesn't have[].RpcAddress.fromURIStringspark/core/src/main/scala/org/apache/spark/rpc/RpcAddress.scala
Lines 40 to 43 in 683179c
Utils.extractHostPortFromSparkUrlspark/core/src/main/scala/org/apache/spark/util/Utils.scala
Lines 2520 to 2524 in 683179c
We need to handle Java URI IPv6 style additionally.
Does this PR introduce any user-facing change?
No. This is
private[spark]class.How was this patch tested?
Pass the CIs with newly added test cases.
This is also tested manually on IPv6-only environment with the following command.