Skip to content

Conversation

@attilapiros
Copy link
Contributor

@attilapiros attilapiros commented Jul 13, 2020

What changes were proposed in this pull request?

Fixing inconsistency between Spark memory configs and JVM option by adding the "m" default unit before setting Xmx/Xms JVM options when no suffix is provided.

Why are the changes needed?

Spark's maximum memory can be configured in several ways:

  • via Spark config
  • command line argument
  • environment variables

Both for executors and for the driver the memory can be configured separately. All of these are following the format of JVM memory configurations in a way they are using the very same size unit suffixes ("k", "m", "g" or "t") but there is an inconsistency regarding the default unit. When no suffix is given then the given amount is passed as it is to the JVM (to the -Xmx and -Xms options) where this memory options are using bytes as a default unit, for this please see the example here:

The following examples show how to set the maximum allowed size of allocated memory to 80 MB using various units:

-Xmx83886080
-Xmx81920k
-Xmx80m

Although the Spark memory config is in MiB.

Does this PR introduce any user-facing change?

Yes, before this PR when no suffix was given the XmX and Xms JVM options could have been configured to the 1/1024 of the valid amount because of the conversion difference between the units (bytes and megabytes).

This could happen in only in client mode. So the followings are affected:

  • all the use cases when some kind of REPL is started: spark-shell, spark-sql, pyspark, sparkR and beeline as the result of the changes done in SparkClassCommandBuilder
  • application submits via the spark-submit (only in client mode) as the result of the changes done in SparkSubmitCommandBuilder

The fact the limit to start an application is 471859200 bytes (and this number is already long enough and inconvenient to calculate with) is decreasing the number of affected cases.

How was this patch tested?

With unit test

See SparkSubmitCommandBuilderSuite.

With a manual testing

By executing a Spark example.

Without my change and without explicit suffix unit:

$ ./bin/spark-submit  --driver-memory 500 --class org.apache.spark.examples.SparkPi examples/target/original-spark-examples_2.12-3.1.0-SNAPSHOT.jar 1000
Error occurred during initialization of VM
Too small initial heap

You can see from the error above that it is not the Spark check for the memory is triggered as with the contra test (still without my change, but with suffix is given but a too low memory is set):

$  ./bin/spark-submit  --driver-memory 300m --class org.apache.spark.examples.SparkPi examples/target/original-spark-examples_2.12-3.1.0-SNAPSHOT.jar 1000
...
20/07/13 20:51:15 ERROR SparkContext: Error initializing SparkContext.
java.lang.IllegalArgumentException: System memory 301465600 must be at least 471859200. Please increase heap size using the --driver-memory option or spark.driver.memory in Spark configuration.
	at org.apache.spark.memory.UnifiedMemoryManager$.getMaxMemory(UnifiedMemoryManager.scala:221)

And after this PR running the first case (without explicit suffix unit):

$ ./bin/spark-submit  --driver-memory 500 --class org.apache.spark.examples.SparkPi examples/target/original-spark-examples_2.12-3.1.0-SNAPSHOT.jar 1000
...
20/07/13 20:57:12 INFO TaskSetManager: Finished task 995.0 in stage 0.0 (TID 995) in 419 ms on 192.168.1.210 (executor driver) (999/1000)
20/07/13 20:57:12 INFO Executor: Finished task 999.0 in stage 0.0 (TID 999). 914 bytes result sent to driver
20/07/13 20:57:12 INFO TaskSetManager: Finished task 999.0 in stage 0.0 (TID 999) in 242 ms on 192.168.1.210 (executor driver) (1000/1000)
20/07/13 20:57:12 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
20/07/13 20:57:12 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 45.892 s
20/07/13 20:57:12 INFO DAGScheduler: Job 0 is finished. Cancelling potential speculative or zombie tasks for this job
20/07/13 20:57:12 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage finished
20/07/13 20:57:12 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 45.967945 s
Pi is roughly 3.1417498714174985
20/07/13 20:57:12 INFO SparkUI: Stopped Spark web UI at http://192.168.1.210:4040
20/07/13 20:57:12 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/07/13 20:57:12 INFO MemoryStore: MemoryStore cleared
20/07/13 20:57:12 INFO BlockManager: BlockManager stopped
20/07/13 20:57:12 INFO BlockManagerMaster: BlockManagerMaster stopped
20/07/13 20:57:12 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/07/13 20:57:12 INFO SparkContext: Successfully stopped SparkContext
20/07/13 20:57:12 INFO ShutdownHookManager: Shutdown hook called
20/07/13 20:57:12 INFO ShutdownHookManager: Deleting directory /private/var/folders/t_/fr_vqcyx23vftk81ftz1k5hw0000gn/T/spark-02c73ca4-ff94-422f-a8dc-39d88365d391
20/07/13 20:57:12 INFO ShutdownHookManager: Deleting directory /private/var/folders/t_/fr_vqcyx23vftk81ftz1k5hw0000gn/T/spark-fa723796-b3a8-4d5b-94c7-65752cd653bd

And even jps shows the correct value:

$ jps -lvV
25269 org.apache.spark.deploy.SparkSubmit -Xmx500m
...

@SparkQA
Copy link

SparkQA commented Jul 13, 2020

Test build #125787 has finished for PR 29090 at commit dfbce91.

  • This patch fails PySpark pip packaging tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jul 14, 2020

Test build #125801 has finished for PR 29090 at commit dfbce91.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 14, 2020

Test build #125826 has finished for PR 29090 at commit 2ddbe0c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@attilapiros attilapiros changed the title [WIP][SPARK-32293] Fix inconsistency between Spark memory configs and JVM option [SPARK-32293] Fix inconsistency between Spark memory configs and JVM option Jul 14, 2020
}
if (executorMemory != null
&& Try(JavaUtils.byteStringAsBytes(executorMemory)).getOrElse(-1L) <= 0) {
&& Try(JavaUtils.byteStringAsMb(executorMemory)).getOrElse(-1L) <= 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this change executorMemory, do we need to update line 248 for driverMemory together maybe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, do we need to change this line? This seems to ensure that the value is non-negative. Just for my understanding, could you give me some example which gives different result before and after this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, there is no difference regarding the behaviour but reading the code and seeing byteStringAsBytes called with these configs gives the false impression they are in bytes. I think it is worth to change them to byteStringAsMb.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, the only difference would be if user set the memory to < 1 mb.
This is ridiculous enough to ignore as a valid usecase :-)

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some people may consider the proposed changes as a kind of breaking change. Could you explicitly elaborate more about those cases which previously worked but will fail after this PR?

cc @gatorsmile

dongjoon-hyun pushed a commit that referenced this pull request Jul 14, 2020
…llation in PIP test

### What changes were proposed in this pull request?

Currently the Jenkins PIP packaging test fails as below intermediately:

```
Installing dist into virtual env
Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz
Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0)
  Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB)
Installing collected packages: py4j, pyspark
  Found existing installation: py4j 0.10.9
    Uninstalling py4j-0.10.9:
      Successfully uninstalled py4j-0.10.9
  Found existing installation: pyspark 3.1.0.dev0
Exception:
Traceback (most recent call last):
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main
    status = self.run(options, args)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run
    use_user_site=options.use_user_site,
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs
    auto_confirm=True
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall
    uninstalled_pathset = UninstallPathSet.from_dist(dist)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist
    '(at %s)' % (link_pointer, dist.project_name, dist.location)
AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed
```

- #29099 (comment) (amp-jenkins-worker-04)
- #29090 (comment) (amp-jenkins-worker-03)

Seems like the previous installation of editable mode affects other PRs.

This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge.

### Why are the changes needed?

To recover the Jenkins build.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Jenkins build will test it out.

Closes #29102 from HyukjinKwon/SPARK-32303.

Lead-authored-by: HyukjinKwon <[email protected]>
Co-authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@attilapiros
Copy link
Contributor Author

@dongjoon-hyun Sure it can happen. I have updated the PR description see the section Does this PR introduce any user-facing change? listing the possible affected use cases.

As I see k8s is not affected as SparkContext sets the SPARK_EXECUTOR_MEMORY environment variable with the correct suffix:

executorEnvs("SPARK_EXECUTOR_MEMORY") = executorMemory + "m"

As an alternative solution we can only update our documentation emphasizing the importance to specify the suffix when a memory config is set.

@SparkQA
Copy link

SparkQA commented Jul 15, 2020

Test build #125885 has finished for PR 29090 at commit cc495c1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shaneknapp
Copy link
Contributor

test this please

@shaneknapp
Copy link
Contributor

btw i'm testing R upgrade and the k8s integration tests

@SparkQA
Copy link

SparkQA commented Jul 21, 2020

Test build #126267 has finished for PR 29090 at commit cc495c1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@holdenk
Copy link
Contributor

holdenk commented Jul 27, 2020

This looks reasonable to me, my only concern is if someone has a script that's been using the default behaviour of k already and this change could result in an unpleasant surprise (e.g. job failure). Do we think folks have not been using the current default behaviour? Would a better default be to throw an explicit exception if there are no units?

@attilapiros
Copy link
Contributor Author

Thanks @holdenk for looking into this.

And what about logging out a warning when no unit is given?

Like:
"Memory setting without explicit unit (${value}) is taken to be in MB by default! For details check SPARK-32293."

This way in case of a problem we provide an indication to the route cause.
This error mostly could be at the beginning of the application as after multiplying a number with 1024 the result will be a quite huge and this will trigger an allocation which is hard to be satisfied (not impossible but in client mode going up for example from 1GB to 1TB, that's huge).

The exception in these cases will be thrown by failed memory allocations.

@holdenk
Copy link
Contributor

holdenk commented Jul 29, 2020

That sounds good to me.

Copy link
Contributor

@tgravescs tgravescs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with others that it could be considered breaking change.
Can we make sure to put in release notes for changes in behavior and I assume we are only pulling this into 3.1.0 and not 3.0.1, correct?
We can log a message but most of the time if it doesn't fail people won't notice. I would generally expect if people were specifying it in bytes large enough for it to be useful and work then it would fail anyway because it would be to large when we add in "m".

<td>
Amount of memory to use per executor process, in the same format as JVM memory strings with
a size unit suffix ("k", "m", "g" or "t") (e.g. <code>512m</code>, <code>2g</code>).
a size unit suffix ("k", "m", "g" or "t") (e.g. <code>512m</code>, <code>2g</code>) using
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems we are a bit inconsistent across the documentation as wel (pyspark.memory, memoryOverhead)l. other memory settings just say MiB unless otherwise specified but don't mention the suffix options. I wonder if we make them all consistent. Note one of the yarn configs says: Use lower-case suffixes, e.g. k, m, g, t, and p, for kibi-, mebi-, gibi-, tebi-, and pebibytes, respectively. but again doesn't say m is the default.


String mem = firstNonEmpty(memKey != null ? System.getenv(memKey) : null, DEFAULT_MEM);
cmd.add("-Xmx" + mem);
cmd.add("-Xmx" + addDefaultMSuffixIfNeeded(mem));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should update the standalone docs as well for --memory (SPARK_WORKER_MEMORY and SPARK_DAEMON_MEMORY) to say default to m and ideally make consistent with above docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note we should test those as well if you haven't already

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks I really focused on XMX and XMS settings but now I see there another error at

val executorMemory = conf.getSizeAsBytes(config.EXECUTOR_MEMORY.key)

@tgravescs
Copy link
Contributor

sorry just read @holdenk comment:

if someone has a script that's been using the default behaviour of k already

I thought default is bytes in most cases, was there somewhere we are using k? If not I'm less worried like I mention above because I think if someone species the size in bytes and we add in M most times probably going to fail as to large.

@attilapiros
Copy link
Contributor Author

Regarding the logging I see a problem: this is the place where we put together the string which will be executed to start the Spark. There is no logger/logging around. So I can only use stderr.

@attilapiros
Copy link
Contributor Author

To avoid any misunderstandings there is still some work and testing to be done on this but in the next 2-3 weeks I will be away from github so please do not expect progress on this PR.

@SparkQA
Copy link

SparkQA commented Aug 3, 2020

Test build #126984 has finished for PR 29090 at commit 8dfe643.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 18, 2020
…llation in PIP test

### What changes were proposed in this pull request?

Currently the Jenkins PIP packaging test fails as below intermediately:

```
Installing dist into virtual env
Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz
Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0)
  Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB)
Installing collected packages: py4j, pyspark
  Found existing installation: py4j 0.10.9
    Uninstalling py4j-0.10.9:
      Successfully uninstalled py4j-0.10.9
  Found existing installation: pyspark 3.1.0.dev0
Exception:
Traceback (most recent call last):
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main
    status = self.run(options, args)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run
    use_user_site=options.use_user_site,
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs
    auto_confirm=True
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall
    uninstalled_pathset = UninstallPathSet.from_dist(dist)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist
    '(at %s)' % (link_pointer, dist.project_name, dist.location)
AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed
```

- apache#29099 (comment) (amp-jenkins-worker-04)
- apache#29090 (comment) (amp-jenkins-worker-03)

Seems like the previous installation of editable mode affects other PRs.

This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge.

### Why are the changes needed?

To recover the Jenkins build.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Jenkins build will test it out.

Closes apache#29102 from HyukjinKwon/SPARK-32303.

Lead-authored-by: HyukjinKwon <[email protected]>
Co-authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 18, 2020
…llation in PIP test

### What changes were proposed in this pull request?

Currently the Jenkins PIP packaging test fails as below intermediately:

```
Installing dist into virtual env
Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz
Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0)
  Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB)
Installing collected packages: py4j, pyspark
  Found existing installation: py4j 0.10.9
    Uninstalling py4j-0.10.9:
      Successfully uninstalled py4j-0.10.9
  Found existing installation: pyspark 3.1.0.dev0
Exception:
Traceback (most recent call last):
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main
    status = self.run(options, args)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run
    use_user_site=options.use_user_site,
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs
    auto_confirm=True
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall
    uninstalled_pathset = UninstallPathSet.from_dist(dist)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist
    '(at %s)' % (link_pointer, dist.project_name, dist.location)
AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed
```

- apache#29099 (comment) (amp-jenkins-worker-04)
- apache#29090 (comment) (amp-jenkins-worker-03)

Seems like the previous installation of editable mode affects other PRs.

This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge.

### Why are the changes needed?

To recover the Jenkins build.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Jenkins build will test it out.

Closes apache#29102 from HyukjinKwon/SPARK-32303.

Lead-authored-by: HyukjinKwon <[email protected]>
Co-authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 18, 2020
…llation in PIP test

Currently the Jenkins PIP packaging test fails as below intermediately:

```
Installing dist into virtual env
Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz
Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0)
  Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB)
Installing collected packages: py4j, pyspark
  Found existing installation: py4j 0.10.9
    Uninstalling py4j-0.10.9:
      Successfully uninstalled py4j-0.10.9
  Found existing installation: pyspark 3.1.0.dev0
Exception:
Traceback (most recent call last):
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main
    status = self.run(options, args)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run
    use_user_site=options.use_user_site,
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs
    auto_confirm=True
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall
    uninstalled_pathset = UninstallPathSet.from_dist(dist)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist
    '(at %s)' % (link_pointer, dist.project_name, dist.location)
AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed
```

- apache#29099 (comment) (amp-jenkins-worker-04)
- apache#29090 (comment) (amp-jenkins-worker-03)

Seems like the previous installation of editable mode affects other PRs.

This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge.

To recover the Jenkins build.

No, dev-only.

Jenkins build will test it out.

Closes apache#29102 from HyukjinKwon/SPARK-32303.

Lead-authored-by: HyukjinKwon <[email protected]>
Co-authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 18, 2020
…llation in PIP test

### What changes were proposed in this pull request?

Currently the Jenkins PIP packaging test fails as below intermediately:

```
Installing dist into virtual env
Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz
Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0)
  Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB)
Installing collected packages: py4j, pyspark
  Found existing installation: py4j 0.10.9
    Uninstalling py4j-0.10.9:
      Successfully uninstalled py4j-0.10.9
  Found existing installation: pyspark 3.1.0.dev0
Exception:
Traceback (most recent call last):
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main
    status = self.run(options, args)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run
    use_user_site=options.use_user_site,
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs
    auto_confirm=True
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall
    uninstalled_pathset = UninstallPathSet.from_dist(dist)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist
    '(at %s)' % (link_pointer, dist.project_name, dist.location)
AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed
```

- apache#29099 (comment) (amp-jenkins-worker-04)
- apache#29090 (comment) (amp-jenkins-worker-03)

Seems like the previous installation of editable mode affects other PRs.

This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge.

### Why are the changes needed?

To recover the Jenkins build.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Jenkins build will test it out.

Closes apache#29102 from HyukjinKwon/SPARK-32303.

Lead-authored-by: HyukjinKwon <[email protected]>
Co-authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 18, 2020
…llation in PIP test

Currently the Jenkins PIP packaging test fails as below intermediately:

```
Installing dist into virtual env
Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz
Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0)
  Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB)
Installing collected packages: py4j, pyspark
  Found existing installation: py4j 0.10.9
    Uninstalling py4j-0.10.9:
      Successfully uninstalled py4j-0.10.9
  Found existing installation: pyspark 3.1.0.dev0
Exception:
Traceback (most recent call last):
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main
    status = self.run(options, args)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run
    use_user_site=options.use_user_site,
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs
    auto_confirm=True
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall
    uninstalled_pathset = UninstallPathSet.from_dist(dist)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist
    '(at %s)' % (link_pointer, dist.project_name, dist.location)
AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed
```

- apache#29099 (comment) (amp-jenkins-worker-04)
- apache#29090 (comment) (amp-jenkins-worker-03)

Seems like the previous installation of editable mode affects other PRs.

This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge.

To recover the Jenkins build.

No, dev-only.

Jenkins build will test it out.

Closes apache#29102 from HyukjinKwon/SPARK-32303.

Lead-authored-by: HyukjinKwon <[email protected]>
Co-authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 18, 2020
…llation in PIP test

Currently the Jenkins PIP packaging test fails as below intermediately:

```
Installing dist into virtual env
Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz
Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0)
  Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB)
Installing collected packages: py4j, pyspark
  Found existing installation: py4j 0.10.9
    Uninstalling py4j-0.10.9:
      Successfully uninstalled py4j-0.10.9
  Found existing installation: pyspark 3.1.0.dev0
Exception:
Traceback (most recent call last):
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main
    status = self.run(options, args)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run
    use_user_site=options.use_user_site,
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs
    auto_confirm=True
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall
    uninstalled_pathset = UninstallPathSet.from_dist(dist)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist
    '(at %s)' % (link_pointer, dist.project_name, dist.location)
AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed
```

- apache#29099 (comment) (amp-jenkins-worker-04)
- apache#29090 (comment) (amp-jenkins-worker-03)

Seems like the previous installation of editable mode affects other PRs.

This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge.

To recover the Jenkins build.

No, dev-only.

Jenkins build will test it out.

Closes apache#29102 from HyukjinKwon/SPARK-32303.

Lead-authored-by: HyukjinKwon <[email protected]>
Co-authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 18, 2020
…llation in PIP test

Currently the Jenkins PIP packaging test fails as below intermediately:

```
Installing dist into virtual env
Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz
Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0)
  Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB)
Installing collected packages: py4j, pyspark
  Found existing installation: py4j 0.10.9
    Uninstalling py4j-0.10.9:
      Successfully uninstalled py4j-0.10.9
  Found existing installation: pyspark 3.1.0.dev0
Exception:
Traceback (most recent call last):
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main
    status = self.run(options, args)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run
    use_user_site=options.use_user_site,
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs
    auto_confirm=True
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall
    uninstalled_pathset = UninstallPathSet.from_dist(dist)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist
    '(at %s)' % (link_pointer, dist.project_name, dist.location)
AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed
```

- apache#29099 (comment) (amp-jenkins-worker-04)
- apache#29090 (comment) (amp-jenkins-worker-03)

Seems like the previous installation of editable mode affects other PRs.

This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge.

To recover the Jenkins build.

No, dev-only.

Jenkins build will test it out.

Closes apache#29102 from HyukjinKwon/SPARK-32303.

Lead-authored-by: HyukjinKwon <[email protected]>
Co-authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 18, 2020
…llation in PIP test

Currently the Jenkins PIP packaging test fails as below intermediately:

```
Installing dist into virtual env
Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz
Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0)
  Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB)
Installing collected packages: py4j, pyspark
  Found existing installation: py4j 0.10.9
    Uninstalling py4j-0.10.9:
      Successfully uninstalled py4j-0.10.9
  Found existing installation: pyspark 3.1.0.dev0
Exception:
Traceback (most recent call last):
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main
    status = self.run(options, args)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run
    use_user_site=options.use_user_site,
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs
    auto_confirm=True
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall
    uninstalled_pathset = UninstallPathSet.from_dist(dist)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist
    '(at %s)' % (link_pointer, dist.project_name, dist.location)
AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed
```

- apache#29099 (comment) (amp-jenkins-worker-04)
- apache#29090 (comment) (amp-jenkins-worker-03)

Seems like the previous installation of editable mode affects other PRs.

This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge.

To recover the Jenkins build.

No, dev-only.

Jenkins build will test it out.

Closes apache#29102 from HyukjinKwon/SPARK-32303.

Lead-authored-by: HyukjinKwon <[email protected]>
Co-authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 18, 2020
…llation in PIP test

Currently the Jenkins PIP packaging test fails as below intermediately:

```
Installing dist into virtual env
Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz
Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0)
  Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB)
Installing collected packages: py4j, pyspark
  Found existing installation: py4j 0.10.9
    Uninstalling py4j-0.10.9:
      Successfully uninstalled py4j-0.10.9
  Found existing installation: pyspark 3.1.0.dev0
Exception:
Traceback (most recent call last):
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main
    status = self.run(options, args)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run
    use_user_site=options.use_user_site,
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs
    auto_confirm=True
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall
    uninstalled_pathset = UninstallPathSet.from_dist(dist)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist
    '(at %s)' % (link_pointer, dist.project_name, dist.location)
AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed
```

- apache#29099 (comment) (amp-jenkins-worker-04)
- apache#29090 (comment) (amp-jenkins-worker-03)

Seems like the previous installation of editable mode affects other PRs.

This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge.

To recover the Jenkins build.

No, dev-only.

Jenkins build will test it out.

Closes apache#29102 from HyukjinKwon/SPARK-32303.

Lead-authored-by: HyukjinKwon <[email protected]>
Co-authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 18, 2020
…llation in PIP test

Currently the Jenkins PIP packaging test fails as below intermediately:

```
Installing dist into virtual env
Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz
Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0)
  Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB)
Installing collected packages: py4j, pyspark
  Found existing installation: py4j 0.10.9
    Uninstalling py4j-0.10.9:
      Successfully uninstalled py4j-0.10.9
  Found existing installation: pyspark 3.1.0.dev0
Exception:
Traceback (most recent call last):
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main
    status = self.run(options, args)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run
    use_user_site=options.use_user_site,
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs
    auto_confirm=True
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall
    uninstalled_pathset = UninstallPathSet.from_dist(dist)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist
    '(at %s)' % (link_pointer, dist.project_name, dist.location)
AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed
```

- apache#29099 (comment) (amp-jenkins-worker-04)
- apache#29090 (comment) (amp-jenkins-worker-03)

Seems like the previous installation of editable mode affects other PRs.

This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge.

To recover the Jenkins build.

No, dev-only.

Jenkins build will test it out.

Closes apache#29102 from HyukjinKwon/SPARK-32303.

Lead-authored-by: HyukjinKwon <[email protected]>
Co-authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 18, 2020
…llation in PIP test

Currently the Jenkins PIP packaging test fails as below intermediately:

```
Installing dist into virtual env
Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz
Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0)
  Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB)
Installing collected packages: py4j, pyspark
  Found existing installation: py4j 0.10.9
    Uninstalling py4j-0.10.9:
      Successfully uninstalled py4j-0.10.9
  Found existing installation: pyspark 3.1.0.dev0
Exception:
Traceback (most recent call last):
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main
    status = self.run(options, args)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run
    use_user_site=options.use_user_site,
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs
    auto_confirm=True
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall
    uninstalled_pathset = UninstallPathSet.from_dist(dist)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist
    '(at %s)' % (link_pointer, dist.project_name, dist.location)
AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed
```

- apache#29099 (comment) (amp-jenkins-worker-04)
- apache#29090 (comment) (amp-jenkins-worker-03)

Seems like the previous installation of editable mode affects other PRs.

This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge.

To recover the Jenkins build.

No, dev-only.

Jenkins build will test it out.

Closes apache#29102 from HyukjinKwon/SPARK-32303.

Lead-authored-by: HyukjinKwon <[email protected]>
Co-authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 18, 2020
…llation in PIP test

Currently the Jenkins PIP packaging test fails as below intermediately:

```
Installing dist into virtual env
Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz
Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0)
  Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB)
Installing collected packages: py4j, pyspark
  Found existing installation: py4j 0.10.9
    Uninstalling py4j-0.10.9:
      Successfully uninstalled py4j-0.10.9
  Found existing installation: pyspark 3.1.0.dev0
Exception:
Traceback (most recent call last):
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main
    status = self.run(options, args)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run
    use_user_site=options.use_user_site,
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs
    auto_confirm=True
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall
    uninstalled_pathset = UninstallPathSet.from_dist(dist)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist
    '(at %s)' % (link_pointer, dist.project_name, dist.location)
AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed
```

- apache#29099 (comment) (amp-jenkins-worker-04)
- apache#29090 (comment) (amp-jenkins-worker-03)

Seems like the previous installation of editable mode affects other PRs.

This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge.

To recover the Jenkins build.

No, dev-only.

Jenkins build will test it out.

Closes apache#29102 from HyukjinKwon/SPARK-32303.

Lead-authored-by: HyukjinKwon <[email protected]>
Co-authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 19, 2020
…llation in PIP test

Currently the Jenkins PIP packaging test fails as below intermediately:

```
Installing dist into virtual env
Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz
Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0)
  Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB)
Installing collected packages: py4j, pyspark
  Found existing installation: py4j 0.10.9
    Uninstalling py4j-0.10.9:
      Successfully uninstalled py4j-0.10.9
  Found existing installation: pyspark 3.1.0.dev0
Exception:
Traceback (most recent call last):
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main
    status = self.run(options, args)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run
    use_user_site=options.use_user_site,
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs
    auto_confirm=True
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall
    uninstalled_pathset = UninstallPathSet.from_dist(dist)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist
    '(at %s)' % (link_pointer, dist.project_name, dist.location)
AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed
```

- apache#29099 (comment) (amp-jenkins-worker-04)
- apache#29090 (comment) (amp-jenkins-worker-03)

Seems like the previous installation of editable mode affects other PRs.

This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge.

To recover the Jenkins build.

No, dev-only.

Jenkins build will test it out.

Closes apache#29102 from HyukjinKwon/SPARK-32303.

Lead-authored-by: HyukjinKwon <[email protected]>
Co-authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 19, 2020
…llation in PIP test

### What changes were proposed in this pull request?

Currently the Jenkins PIP packaging test fails as below intermediately:

```
Installing dist into virtual env
Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz
Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0)
  Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB)
Installing collected packages: py4j, pyspark
  Found existing installation: py4j 0.10.9
    Uninstalling py4j-0.10.9:
      Successfully uninstalled py4j-0.10.9
  Found existing installation: pyspark 3.1.0.dev0
Exception:
Traceback (most recent call last):
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main
    status = self.run(options, args)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run
    use_user_site=options.use_user_site,
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs
    auto_confirm=True
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall
    uninstalled_pathset = UninstallPathSet.from_dist(dist)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist
    '(at %s)' % (link_pointer, dist.project_name, dist.location)
AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed
```

- apache#29099 (comment) (amp-jenkins-worker-04)
- apache#29090 (comment) (amp-jenkins-worker-03)

Seems like the previous installation of editable mode affects other PRs.

This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge.

### Why are the changes needed?

To recover the Jenkins build.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Jenkins build will test it out.

Closes apache#29102 from HyukjinKwon/SPARK-32303.

Lead-authored-by: HyukjinKwon <[email protected]>
Co-authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 19, 2020
…llation in PIP test

Currently the Jenkins PIP packaging test fails as below intermediately:

```
Installing dist into virtual env
Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz
Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0)
  Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB)
Installing collected packages: py4j, pyspark
  Found existing installation: py4j 0.10.9
    Uninstalling py4j-0.10.9:
      Successfully uninstalled py4j-0.10.9
  Found existing installation: pyspark 3.1.0.dev0
Exception:
Traceback (most recent call last):
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main
    status = self.run(options, args)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run
    use_user_site=options.use_user_site,
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs
    auto_confirm=True
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall
    uninstalled_pathset = UninstallPathSet.from_dist(dist)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist
    '(at %s)' % (link_pointer, dist.project_name, dist.location)
AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed
```

- apache#29099 (comment) (amp-jenkins-worker-04)
- apache#29090 (comment) (amp-jenkins-worker-03)

Seems like the previous installation of editable mode affects other PRs.

This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge.

To recover the Jenkins build.

No, dev-only.

Jenkins build will test it out.

Closes apache#29102 from HyukjinKwon/SPARK-32303.

Lead-authored-by: HyukjinKwon <[email protected]>
Co-authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 19, 2020
…llation in PIP test

Currently the Jenkins PIP packaging test fails as below intermediately:

```
Installing dist into virtual env
Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz
Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0)
  Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB)
Installing collected packages: py4j, pyspark
  Found existing installation: py4j 0.10.9
    Uninstalling py4j-0.10.9:
      Successfully uninstalled py4j-0.10.9
  Found existing installation: pyspark 3.1.0.dev0
Exception:
Traceback (most recent call last):
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main
    status = self.run(options, args)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run
    use_user_site=options.use_user_site,
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs
    auto_confirm=True
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall
    uninstalled_pathset = UninstallPathSet.from_dist(dist)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist
    '(at %s)' % (link_pointer, dist.project_name, dist.location)
AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed
```

- apache#29099 (comment) (amp-jenkins-worker-04)
- apache#29090 (comment) (amp-jenkins-worker-03)

Seems like the previous installation of editable mode affects other PRs.

This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge.

To recover the Jenkins build.

No, dev-only.

Jenkins build will test it out.

Closes apache#29102 from HyukjinKwon/SPARK-32303.

Lead-authored-by: HyukjinKwon <[email protected]>
Co-authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 19, 2020
…llation in PIP test

### What changes were proposed in this pull request?

Currently the Jenkins PIP packaging test fails as below intermediately:

```
Installing dist into virtual env
Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz
Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0)
  Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB)
Installing collected packages: py4j, pyspark
  Found existing installation: py4j 0.10.9
    Uninstalling py4j-0.10.9:
      Successfully uninstalled py4j-0.10.9
  Found existing installation: pyspark 3.1.0.dev0
Exception:
Traceback (most recent call last):
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main
    status = self.run(options, args)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run
    use_user_site=options.use_user_site,
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs
    auto_confirm=True
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall
    uninstalled_pathset = UninstallPathSet.from_dist(dist)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist
    '(at %s)' % (link_pointer, dist.project_name, dist.location)
AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed
```

- apache#29099 (comment) (amp-jenkins-worker-04)
- apache#29090 (comment) (amp-jenkins-worker-03)

Seems like the previous installation of editable mode affects other PRs.

This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge.

### Why are the changes needed?

To recover the Jenkins build.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Jenkins build will test it out.

Closes apache#29102 from HyukjinKwon/SPARK-32303.

Lead-authored-by: HyukjinKwon <[email protected]>
Co-authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 19, 2020
…llation in PIP test

Currently the Jenkins PIP packaging test fails as below intermediately:

```
Installing dist into virtual env
Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz
Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0)
  Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB)
Installing collected packages: py4j, pyspark
  Found existing installation: py4j 0.10.9
    Uninstalling py4j-0.10.9:
      Successfully uninstalled py4j-0.10.9
  Found existing installation: pyspark 3.1.0.dev0
Exception:
Traceback (most recent call last):
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main
    status = self.run(options, args)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run
    use_user_site=options.use_user_site,
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs
    auto_confirm=True
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall
    uninstalled_pathset = UninstallPathSet.from_dist(dist)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist
    '(at %s)' % (link_pointer, dist.project_name, dist.location)
AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed
```

- apache#29099 (comment) (amp-jenkins-worker-04)
- apache#29090 (comment) (amp-jenkins-worker-03)

Seems like the previous installation of editable mode affects other PRs.

This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge.

To recover the Jenkins build.

No, dev-only.

Jenkins build will test it out.

Closes apache#29102 from HyukjinKwon/SPARK-32303.

Lead-authored-by: HyukjinKwon <[email protected]>
Co-authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 19, 2020
…llation in PIP test

### What changes were proposed in this pull request?

Currently the Jenkins PIP packaging test fails as below intermediately:

```
Installing dist into virtual env
Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz
Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0)
  Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB)
Installing collected packages: py4j, pyspark
  Found existing installation: py4j 0.10.9
    Uninstalling py4j-0.10.9:
      Successfully uninstalled py4j-0.10.9
  Found existing installation: pyspark 3.1.0.dev0
Exception:
Traceback (most recent call last):
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main
    status = self.run(options, args)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run
    use_user_site=options.use_user_site,
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs
    auto_confirm=True
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall
    uninstalled_pathset = UninstallPathSet.from_dist(dist)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist
    '(at %s)' % (link_pointer, dist.project_name, dist.location)
AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed
```

- apache#29099 (comment) (amp-jenkins-worker-04)
- apache#29090 (comment) (amp-jenkins-worker-03)

Seems like the previous installation of editable mode affects other PRs.

This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge.

### Why are the changes needed?

To recover the Jenkins build.

### Does this PR introduce _any_ user-facing change?

No, dev-only.

### How was this patch tested?

Jenkins build will test it out.

Closes apache#29102 from HyukjinKwon/SPARK-32303.

Lead-authored-by: HyukjinKwon <[email protected]>
Co-authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
HyukjinKwon added a commit to HyukjinKwon/spark that referenced this pull request Aug 19, 2020
…llation in PIP test

Currently the Jenkins PIP packaging test fails as below intermediately:

```
Installing dist into virtual env
Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz
Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0)
  Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB)
Installing collected packages: py4j, pyspark
  Found existing installation: py4j 0.10.9
    Uninstalling py4j-0.10.9:
      Successfully uninstalled py4j-0.10.9
  Found existing installation: pyspark 3.1.0.dev0
Exception:
Traceback (most recent call last):
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main
    status = self.run(options, args)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run
    use_user_site=options.use_user_site,
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs
    auto_confirm=True
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall
    uninstalled_pathset = UninstallPathSet.from_dist(dist)
  File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist
    '(at %s)' % (link_pointer, dist.project_name, dist.location)
AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed
```

- apache#29099 (comment) (amp-jenkins-worker-04)
- apache#29090 (comment) (amp-jenkins-worker-03)

Seems like the previous installation of editable mode affects other PRs.

This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge.

To recover the Jenkins build.

No, dev-only.

Jenkins build will test it out.

Closes apache#29102 from HyukjinKwon/SPARK-32303.

Lead-authored-by: HyukjinKwon <[email protected]>
Co-authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Copy link
Contributor

@mridulm mridulm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this @attilapiros !

}
if (executorMemory != null
&& Try(JavaUtils.byteStringAsBytes(executorMemory)).getOrElse(-1L) <= 0) {
&& Try(JavaUtils.byteStringAsMb(executorMemory)).getOrElse(-1L) <= 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, the only difference would be if user set the memory to < 1 mb.
This is ridiculous enough to ignore as a valid usecase :-)

static String addDefaultMSuffixIfNeeded(String memoryString) {
if (memoryString.chars().allMatch(Character::isDigit)) {
System.err.println("Memory setting without explicit unit (" +
memoryString + ") is taken to be in MB by default! For details check SPARK-32293.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we are documenting 'm' is the suffix we use if not specified, do we need this message to stderr ?

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Feb 12, 2021
@github-actions github-actions bot closed this Feb 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants