Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Jun 14, 2022

What changes were proposed in this pull request?

This PR aims to extend the IPv6 support in RpcAddress additionally when the input doesn't have [] properly.

Why are the changes needed?

Note that Apache Spark already depends on java.net.URI getHost and getPort and it assumpts []-style IPv6. This PR additionally handles the case where the given host string doesn't have [].

We need to handle Java URI IPv6 style additionally.

jshell> var uri = new java.net.URI("https://[::1]:80")
uri ==> https://[::1]:80

jshell> uri.getHost()
$4 ==> "[::1]"

jshell> uri.getPort()
$5 ==> 80

Does this PR introduce any user-facing change?

No. This is private[spark] class.

How was this patch tested?

Pass the CIs with newly added test cases.

This is also tested manually on IPv6-only environment with the following command.

$ SERIAL_SBT_TESTS=1 SPARK_LOCAL_HOSTNAME='[2600:.(omitted)..:60cd]' build/sbt "core/test" -Djava.net.preferIPv6Addresses=true -Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest
...
[info] Run completed in 18 minutes, 43 seconds.
[info] Total number of tests run: 2950
[info] Suites: completed 284, aborted 0
[info] Tests: succeeded 2950, failed 0, canceled 4, ignored 8, pending 0
[info] All tests passed.
[info] Passed: Total 3214, Failed 0, Errors 0, Passed 3214, Ignored 8, Canceled 4
[success] Total time: 1189 s (19:49), completed Jun 14, 2022, 4:45:55 PM

@github-actions github-actions bot added the CORE label Jun 14, 2022
@dongjoon-hyun dongjoon-hyun marked this pull request as draft June 14, 2022 18:18
@dongjoon-hyun dongjoon-hyun marked this pull request as ready for review June 14, 2022 19:23
@dongjoon-hyun
Copy link
Member Author

Could you review this, @Ngone51 ?

}

test("SPARK-39468: IPv6 hostPort") {
val address = RpcAddress("::1", 1234)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main use case of this PR.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-39468][CORE] Support IPv6 in RpcAddress [SPARK-39468][CORE] Improve RpcAddress to add [] to IPv6 if needed Jun 14, 2022
@dongjoon-hyun
Copy link
Member Author

cc @cloud-fan and @HyukjinKwon , too

@dongjoon-hyun
Copy link
Member Author

Thank you so much, @cloud-fan . Merged to master

private[spark] case class RpcAddress(host: String, port: Int) {
private[spark] case class RpcAddress(_host: String, port: Int) {

val host: String = Utils.addBracketsIfNeeded(_host)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be a lazy val instead of val ?
We end up almost doubling the size of the serialized instance of this object otherwise.

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Jun 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, of course, we can. Thank you for review, @mridulm .

Actually, at the first commit, I used def host first to save the size . Later I switched to val in the second commit.

If the memory consumption of RpcAddress instances matter, we can revert back def host instead of lazy val.

Among three options, which one do you prefer, @mridulm and @cloud-fan ?

  1. def host (Initial commit)
  2. lazy val host (New alternative)
  3. val host (AS-IS)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer lazy val - it keeps the serialized size the same, while making the runtime cost of accessing host is almost the same as before (which is increased by def host).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HyukjinKwon pushed a commit that referenced this pull request Jun 16, 2022
### What changes were proposed in this pull request?

This PR aims to use `lazy val host` instead of `val host`.

### Why are the changes needed?

To address the review comments about `RpcAddress` object size.
- #36868 (comment)
- #36868 (comment)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

Closes #36882 from dongjoon-hyun/SPARK-39468-2.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
LittleWat added a commit to LittleWat/spark-on-k8s-operator that referenced this pull request Aug 30, 2023
Resolves kubeflow#1344

Spark 3.4 supports IPv6:
- apache/spark#36868

So I want to make the operator support IPv6.

I can confirm that this can submit the spark-job in IPv6-only environment.

Although it is necessary to add the following environment variables to the operator

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spark-on-k8s-spark-operator
spec:
  template:
    spec:
      containers:
      - name: spark-operator
        env:
        - name: _JAVA_OPTIONS
          value: "-Djava.net.preferIPv6Addresses=true"
        - name: KUBERNETES_DISABLE_HOSTNAME_VERIFICATION
          value: "true"

```
liyinan926 pushed a commit to kubeflow/spark-operator that referenced this pull request Oct 26, 2023
Resolves #1344

Spark 3.4 supports IPv6:
- apache/spark#36868

So I want to make the operator support IPv6.

I can confirm that this can submit the spark-job in IPv6-only environment.

Although it is necessary to add the following environment variables to the operator

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spark-on-k8s-spark-operator
spec:
  template:
    spec:
      containers:
      - name: spark-operator
        env:
        - name: _JAVA_OPTIONS
          value: "-Djava.net.preferIPv6Addresses=true"
        - name: KUBERNETES_DISABLE_HOSTNAME_VERIFICATION
          value: "true"

```
peter-mcclonski pushed a commit to TechnologyBrewery/spark-on-k8s-operator that referenced this pull request Apr 16, 2024
Resolves kubeflow#1344

Spark 3.4 supports IPv6:
- apache/spark#36868

So I want to make the operator support IPv6.

I can confirm that this can submit the spark-job in IPv6-only environment.

Although it is necessary to add the following environment variables to the operator

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spark-on-k8s-spark-operator
spec:
  template:
    spec:
      containers:
      - name: spark-operator
        env:
        - name: _JAVA_OPTIONS
          value: "-Djava.net.preferIPv6Addresses=true"
        - name: KUBERNETES_DISABLE_HOSTNAME_VERIFICATION
          value: "true"

```

Signed-off-by: Peter McClonski <[email protected]>
sigmarkarl pushed a commit to spotinst/spark-on-k8s-operator that referenced this pull request Aug 7, 2024
Resolves kubeflow#1344

Spark 3.4 supports IPv6:
- apache/spark#36868

So I want to make the operator support IPv6.

I can confirm that this can submit the spark-job in IPv6-only environment.

Although it is necessary to add the following environment variables to the operator

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spark-on-k8s-spark-operator
spec:
  template:
    spec:
      containers:
      - name: spark-operator
        env:
        - name: _JAVA_OPTIONS
          value: "-Djava.net.preferIPv6Addresses=true"
        - name: KUBERNETES_DISABLE_HOSTNAME_VERIFICATION
          value: "true"

```
jbhalodia-slack pushed a commit to jbhalodia-slack/spark-operator that referenced this pull request Oct 4, 2024
Resolves kubeflow#1344

Spark 3.4 supports IPv6:
- apache/spark#36868

So I want to make the operator support IPv6.

I can confirm that this can submit the spark-job in IPv6-only environment.

Although it is necessary to add the following environment variables to the operator

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spark-on-k8s-spark-operator
spec:
  template:
    spec:
      containers:
      - name: spark-operator
        env:
        - name: _JAVA_OPTIONS
          value: "-Djava.net.preferIPv6Addresses=true"
        - name: KUBERNETES_DISABLE_HOSTNAME_VERIFICATION
          value: "true"

```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants