-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-32293] Fix inconsistency between Spark memory configs and JVM option #29090
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #125787 has finished for PR 29090 at commit
|
|
retest this please |
|
Test build #125801 has finished for PR 29090 at commit
|
|
Test build #125826 has finished for PR 29090 at commit
|
| } | ||
| if (executorMemory != null | ||
| && Try(JavaUtils.byteStringAsBytes(executorMemory)).getOrElse(-1L) <= 0) { | ||
| && Try(JavaUtils.byteStringAsMb(executorMemory)).getOrElse(-1L) <= 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this change executorMemory, do we need to update line 248 for driverMemory together maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, do we need to change this line? This seems to ensure that the value is non-negative. Just for my understanding, could you give me some example which gives different result before and after this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, there is no difference regarding the behaviour but reading the code and seeing byteStringAsBytes called with these configs gives the false impression they are in bytes. I think it is worth to change them to byteStringAsMb.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory, the only difference would be if user set the memory to < 1 mb.
This is ridiculous enough to ignore as a valid usecase :-)
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some people may consider the proposed changes as a kind of breaking change. Could you explicitly elaborate more about those cases which previously worked but will fail after this PR?
cc @gatorsmile
…llation in PIP test ### What changes were proposed in this pull request? Currently the Jenkins PIP packaging test fails as below intermediately: ``` Installing dist into virtual env Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0) Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB) Installing collected packages: py4j, pyspark Found existing installation: py4j 0.10.9 Uninstalling py4j-0.10.9: Successfully uninstalled py4j-0.10.9 Found existing installation: pyspark 3.1.0.dev0 Exception: Traceback (most recent call last): File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main status = self.run(options, args) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run use_user_site=options.use_user_site, File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs auto_confirm=True File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall uninstalled_pathset = UninstallPathSet.from_dist(dist) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist '(at %s)' % (link_pointer, dist.project_name, dist.location) AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed ``` - #29099 (comment) (amp-jenkins-worker-04) - #29090 (comment) (amp-jenkins-worker-03) Seems like the previous installation of editable mode affects other PRs. This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge. ### Why are the changes needed? To recover the Jenkins build. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Jenkins build will test it out. Closes #29102 from HyukjinKwon/SPARK-32303. Lead-authored-by: HyukjinKwon <[email protected]> Co-authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
|
@dongjoon-hyun Sure it can happen. I have updated the PR description see the section As I see k8s is not affected as
As an alternative solution we can only update our documentation emphasizing the importance to specify the suffix when a memory config is set. |
|
Test build #125885 has finished for PR 29090 at commit
|
|
test this please |
|
btw i'm testing R upgrade and the k8s integration tests |
|
Test build #126267 has finished for PR 29090 at commit
|
|
This looks reasonable to me, my only concern is if someone has a script that's been using the default behaviour of |
|
Thanks @holdenk for looking into this. And what about logging out a warning when no unit is given? Like: This way in case of a problem we provide an indication to the route cause. The exception in these cases will be thrown by failed memory allocations. |
|
That sounds good to me. |
tgravescs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with others that it could be considered breaking change.
Can we make sure to put in release notes for changes in behavior and I assume we are only pulling this into 3.1.0 and not 3.0.1, correct?
We can log a message but most of the time if it doesn't fail people won't notice. I would generally expect if people were specifying it in bytes large enough for it to be useful and work then it would fail anyway because it would be to large when we add in "m".
| <td> | ||
| Amount of memory to use per executor process, in the same format as JVM memory strings with | ||
| a size unit suffix ("k", "m", "g" or "t") (e.g. <code>512m</code>, <code>2g</code>). | ||
| a size unit suffix ("k", "m", "g" or "t") (e.g. <code>512m</code>, <code>2g</code>) using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems we are a bit inconsistent across the documentation as wel (pyspark.memory, memoryOverhead)l. other memory settings just say MiB unless otherwise specified but don't mention the suffix options. I wonder if we make them all consistent. Note one of the yarn configs says: Use lower-case suffixes, e.g. k, m, g, t, and p, for kibi-, mebi-, gibi-, tebi-, and pebibytes, respectively. but again doesn't say m is the default.
|
|
||
| String mem = firstNonEmpty(memKey != null ? System.getenv(memKey) : null, DEFAULT_MEM); | ||
| cmd.add("-Xmx" + mem); | ||
| cmd.add("-Xmx" + addDefaultMSuffixIfNeeded(mem)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should update the standalone docs as well for --memory (SPARK_WORKER_MEMORY and SPARK_DAEMON_MEMORY) to say default to m and ideally make consistent with above docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note we should test those as well if you haven't already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks I really focused on XMX and XMS settings but now I see there another error at
| val executorMemory = conf.getSizeAsBytes(config.EXECUTOR_MEMORY.key) |
|
sorry just read @holdenk comment:
I thought default is bytes in most cases, was there somewhere we are using k? If not I'm less worried like I mention above because I think if someone species the size in bytes and we add in M most times probably going to fail as to large. |
|
Regarding the logging I see a problem: this is the place where we put together the string which will be executed to start the Spark. There is no logger/logging around. So I can only use stderr. |
|
To avoid any misunderstandings there is still some work and testing to be done on this but in the next 2-3 weeks I will be away from github so please do not expect progress on this PR. |
|
Test build #126984 has finished for PR 29090 at commit
|
…llation in PIP test ### What changes were proposed in this pull request? Currently the Jenkins PIP packaging test fails as below intermediately: ``` Installing dist into virtual env Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0) Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB) Installing collected packages: py4j, pyspark Found existing installation: py4j 0.10.9 Uninstalling py4j-0.10.9: Successfully uninstalled py4j-0.10.9 Found existing installation: pyspark 3.1.0.dev0 Exception: Traceback (most recent call last): File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main status = self.run(options, args) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run use_user_site=options.use_user_site, File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs auto_confirm=True File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall uninstalled_pathset = UninstallPathSet.from_dist(dist) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist '(at %s)' % (link_pointer, dist.project_name, dist.location) AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed ``` - apache#29099 (comment) (amp-jenkins-worker-04) - apache#29090 (comment) (amp-jenkins-worker-03) Seems like the previous installation of editable mode affects other PRs. This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge. ### Why are the changes needed? To recover the Jenkins build. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Jenkins build will test it out. Closes apache#29102 from HyukjinKwon/SPARK-32303. Lead-authored-by: HyukjinKwon <[email protected]> Co-authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…llation in PIP test ### What changes were proposed in this pull request? Currently the Jenkins PIP packaging test fails as below intermediately: ``` Installing dist into virtual env Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0) Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB) Installing collected packages: py4j, pyspark Found existing installation: py4j 0.10.9 Uninstalling py4j-0.10.9: Successfully uninstalled py4j-0.10.9 Found existing installation: pyspark 3.1.0.dev0 Exception: Traceback (most recent call last): File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main status = self.run(options, args) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run use_user_site=options.use_user_site, File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs auto_confirm=True File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall uninstalled_pathset = UninstallPathSet.from_dist(dist) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist '(at %s)' % (link_pointer, dist.project_name, dist.location) AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed ``` - apache#29099 (comment) (amp-jenkins-worker-04) - apache#29090 (comment) (amp-jenkins-worker-03) Seems like the previous installation of editable mode affects other PRs. This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge. ### Why are the changes needed? To recover the Jenkins build. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Jenkins build will test it out. Closes apache#29102 from HyukjinKwon/SPARK-32303. Lead-authored-by: HyukjinKwon <[email protected]> Co-authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…llation in PIP test Currently the Jenkins PIP packaging test fails as below intermediately: ``` Installing dist into virtual env Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0) Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB) Installing collected packages: py4j, pyspark Found existing installation: py4j 0.10.9 Uninstalling py4j-0.10.9: Successfully uninstalled py4j-0.10.9 Found existing installation: pyspark 3.1.0.dev0 Exception: Traceback (most recent call last): File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main status = self.run(options, args) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run use_user_site=options.use_user_site, File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs auto_confirm=True File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall uninstalled_pathset = UninstallPathSet.from_dist(dist) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist '(at %s)' % (link_pointer, dist.project_name, dist.location) AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed ``` - apache#29099 (comment) (amp-jenkins-worker-04) - apache#29090 (comment) (amp-jenkins-worker-03) Seems like the previous installation of editable mode affects other PRs. This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge. To recover the Jenkins build. No, dev-only. Jenkins build will test it out. Closes apache#29102 from HyukjinKwon/SPARK-32303. Lead-authored-by: HyukjinKwon <[email protected]> Co-authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…llation in PIP test ### What changes were proposed in this pull request? Currently the Jenkins PIP packaging test fails as below intermediately: ``` Installing dist into virtual env Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0) Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB) Installing collected packages: py4j, pyspark Found existing installation: py4j 0.10.9 Uninstalling py4j-0.10.9: Successfully uninstalled py4j-0.10.9 Found existing installation: pyspark 3.1.0.dev0 Exception: Traceback (most recent call last): File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main status = self.run(options, args) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run use_user_site=options.use_user_site, File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs auto_confirm=True File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall uninstalled_pathset = UninstallPathSet.from_dist(dist) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist '(at %s)' % (link_pointer, dist.project_name, dist.location) AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed ``` - apache#29099 (comment) (amp-jenkins-worker-04) - apache#29090 (comment) (amp-jenkins-worker-03) Seems like the previous installation of editable mode affects other PRs. This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge. ### Why are the changes needed? To recover the Jenkins build. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Jenkins build will test it out. Closes apache#29102 from HyukjinKwon/SPARK-32303. Lead-authored-by: HyukjinKwon <[email protected]> Co-authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…llation in PIP test Currently the Jenkins PIP packaging test fails as below intermediately: ``` Installing dist into virtual env Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0) Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB) Installing collected packages: py4j, pyspark Found existing installation: py4j 0.10.9 Uninstalling py4j-0.10.9: Successfully uninstalled py4j-0.10.9 Found existing installation: pyspark 3.1.0.dev0 Exception: Traceback (most recent call last): File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main status = self.run(options, args) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run use_user_site=options.use_user_site, File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs auto_confirm=True File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall uninstalled_pathset = UninstallPathSet.from_dist(dist) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist '(at %s)' % (link_pointer, dist.project_name, dist.location) AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed ``` - apache#29099 (comment) (amp-jenkins-worker-04) - apache#29090 (comment) (amp-jenkins-worker-03) Seems like the previous installation of editable mode affects other PRs. This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge. To recover the Jenkins build. No, dev-only. Jenkins build will test it out. Closes apache#29102 from HyukjinKwon/SPARK-32303. Lead-authored-by: HyukjinKwon <[email protected]> Co-authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…llation in PIP test Currently the Jenkins PIP packaging test fails as below intermediately: ``` Installing dist into virtual env Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0) Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB) Installing collected packages: py4j, pyspark Found existing installation: py4j 0.10.9 Uninstalling py4j-0.10.9: Successfully uninstalled py4j-0.10.9 Found existing installation: pyspark 3.1.0.dev0 Exception: Traceback (most recent call last): File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main status = self.run(options, args) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run use_user_site=options.use_user_site, File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs auto_confirm=True File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall uninstalled_pathset = UninstallPathSet.from_dist(dist) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist '(at %s)' % (link_pointer, dist.project_name, dist.location) AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed ``` - apache#29099 (comment) (amp-jenkins-worker-04) - apache#29090 (comment) (amp-jenkins-worker-03) Seems like the previous installation of editable mode affects other PRs. This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge. To recover the Jenkins build. No, dev-only. Jenkins build will test it out. Closes apache#29102 from HyukjinKwon/SPARK-32303. Lead-authored-by: HyukjinKwon <[email protected]> Co-authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…llation in PIP test Currently the Jenkins PIP packaging test fails as below intermediately: ``` Installing dist into virtual env Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0) Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB) Installing collected packages: py4j, pyspark Found existing installation: py4j 0.10.9 Uninstalling py4j-0.10.9: Successfully uninstalled py4j-0.10.9 Found existing installation: pyspark 3.1.0.dev0 Exception: Traceback (most recent call last): File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main status = self.run(options, args) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run use_user_site=options.use_user_site, File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs auto_confirm=True File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall uninstalled_pathset = UninstallPathSet.from_dist(dist) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist '(at %s)' % (link_pointer, dist.project_name, dist.location) AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed ``` - apache#29099 (comment) (amp-jenkins-worker-04) - apache#29090 (comment) (amp-jenkins-worker-03) Seems like the previous installation of editable mode affects other PRs. This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge. To recover the Jenkins build. No, dev-only. Jenkins build will test it out. Closes apache#29102 from HyukjinKwon/SPARK-32303. Lead-authored-by: HyukjinKwon <[email protected]> Co-authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…llation in PIP test Currently the Jenkins PIP packaging test fails as below intermediately: ``` Installing dist into virtual env Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0) Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB) Installing collected packages: py4j, pyspark Found existing installation: py4j 0.10.9 Uninstalling py4j-0.10.9: Successfully uninstalled py4j-0.10.9 Found existing installation: pyspark 3.1.0.dev0 Exception: Traceback (most recent call last): File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main status = self.run(options, args) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run use_user_site=options.use_user_site, File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs auto_confirm=True File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall uninstalled_pathset = UninstallPathSet.from_dist(dist) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist '(at %s)' % (link_pointer, dist.project_name, dist.location) AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed ``` - apache#29099 (comment) (amp-jenkins-worker-04) - apache#29090 (comment) (amp-jenkins-worker-03) Seems like the previous installation of editable mode affects other PRs. This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge. To recover the Jenkins build. No, dev-only. Jenkins build will test it out. Closes apache#29102 from HyukjinKwon/SPARK-32303. Lead-authored-by: HyukjinKwon <[email protected]> Co-authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…llation in PIP test Currently the Jenkins PIP packaging test fails as below intermediately: ``` Installing dist into virtual env Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0) Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB) Installing collected packages: py4j, pyspark Found existing installation: py4j 0.10.9 Uninstalling py4j-0.10.9: Successfully uninstalled py4j-0.10.9 Found existing installation: pyspark 3.1.0.dev0 Exception: Traceback (most recent call last): File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main status = self.run(options, args) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run use_user_site=options.use_user_site, File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs auto_confirm=True File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall uninstalled_pathset = UninstallPathSet.from_dist(dist) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist '(at %s)' % (link_pointer, dist.project_name, dist.location) AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed ``` - apache#29099 (comment) (amp-jenkins-worker-04) - apache#29090 (comment) (amp-jenkins-worker-03) Seems like the previous installation of editable mode affects other PRs. This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge. To recover the Jenkins build. No, dev-only. Jenkins build will test it out. Closes apache#29102 from HyukjinKwon/SPARK-32303. Lead-authored-by: HyukjinKwon <[email protected]> Co-authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…llation in PIP test Currently the Jenkins PIP packaging test fails as below intermediately: ``` Installing dist into virtual env Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0) Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB) Installing collected packages: py4j, pyspark Found existing installation: py4j 0.10.9 Uninstalling py4j-0.10.9: Successfully uninstalled py4j-0.10.9 Found existing installation: pyspark 3.1.0.dev0 Exception: Traceback (most recent call last): File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main status = self.run(options, args) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run use_user_site=options.use_user_site, File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs auto_confirm=True File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall uninstalled_pathset = UninstallPathSet.from_dist(dist) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist '(at %s)' % (link_pointer, dist.project_name, dist.location) AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed ``` - apache#29099 (comment) (amp-jenkins-worker-04) - apache#29090 (comment) (amp-jenkins-worker-03) Seems like the previous installation of editable mode affects other PRs. This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge. To recover the Jenkins build. No, dev-only. Jenkins build will test it out. Closes apache#29102 from HyukjinKwon/SPARK-32303. Lead-authored-by: HyukjinKwon <[email protected]> Co-authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…llation in PIP test Currently the Jenkins PIP packaging test fails as below intermediately: ``` Installing dist into virtual env Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0) Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB) Installing collected packages: py4j, pyspark Found existing installation: py4j 0.10.9 Uninstalling py4j-0.10.9: Successfully uninstalled py4j-0.10.9 Found existing installation: pyspark 3.1.0.dev0 Exception: Traceback (most recent call last): File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main status = self.run(options, args) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run use_user_site=options.use_user_site, File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs auto_confirm=True File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall uninstalled_pathset = UninstallPathSet.from_dist(dist) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist '(at %s)' % (link_pointer, dist.project_name, dist.location) AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed ``` - apache#29099 (comment) (amp-jenkins-worker-04) - apache#29090 (comment) (amp-jenkins-worker-03) Seems like the previous installation of editable mode affects other PRs. This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge. To recover the Jenkins build. No, dev-only. Jenkins build will test it out. Closes apache#29102 from HyukjinKwon/SPARK-32303. Lead-authored-by: HyukjinKwon <[email protected]> Co-authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…llation in PIP test Currently the Jenkins PIP packaging test fails as below intermediately: ``` Installing dist into virtual env Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0) Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB) Installing collected packages: py4j, pyspark Found existing installation: py4j 0.10.9 Uninstalling py4j-0.10.9: Successfully uninstalled py4j-0.10.9 Found existing installation: pyspark 3.1.0.dev0 Exception: Traceback (most recent call last): File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main status = self.run(options, args) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run use_user_site=options.use_user_site, File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs auto_confirm=True File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall uninstalled_pathset = UninstallPathSet.from_dist(dist) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist '(at %s)' % (link_pointer, dist.project_name, dist.location) AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed ``` - apache#29099 (comment) (amp-jenkins-worker-04) - apache#29090 (comment) (amp-jenkins-worker-03) Seems like the previous installation of editable mode affects other PRs. This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge. To recover the Jenkins build. No, dev-only. Jenkins build will test it out. Closes apache#29102 from HyukjinKwon/SPARK-32303. Lead-authored-by: HyukjinKwon <[email protected]> Co-authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…llation in PIP test Currently the Jenkins PIP packaging test fails as below intermediately: ``` Installing dist into virtual env Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0) Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB) Installing collected packages: py4j, pyspark Found existing installation: py4j 0.10.9 Uninstalling py4j-0.10.9: Successfully uninstalled py4j-0.10.9 Found existing installation: pyspark 3.1.0.dev0 Exception: Traceback (most recent call last): File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main status = self.run(options, args) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run use_user_site=options.use_user_site, File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs auto_confirm=True File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall uninstalled_pathset = UninstallPathSet.from_dist(dist) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist '(at %s)' % (link_pointer, dist.project_name, dist.location) AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed ``` - apache#29099 (comment) (amp-jenkins-worker-04) - apache#29090 (comment) (amp-jenkins-worker-03) Seems like the previous installation of editable mode affects other PRs. This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge. To recover the Jenkins build. No, dev-only. Jenkins build will test it out. Closes apache#29102 from HyukjinKwon/SPARK-32303. Lead-authored-by: HyukjinKwon <[email protected]> Co-authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…llation in PIP test ### What changes were proposed in this pull request? Currently the Jenkins PIP packaging test fails as below intermediately: ``` Installing dist into virtual env Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0) Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB) Installing collected packages: py4j, pyspark Found existing installation: py4j 0.10.9 Uninstalling py4j-0.10.9: Successfully uninstalled py4j-0.10.9 Found existing installation: pyspark 3.1.0.dev0 Exception: Traceback (most recent call last): File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main status = self.run(options, args) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run use_user_site=options.use_user_site, File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs auto_confirm=True File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall uninstalled_pathset = UninstallPathSet.from_dist(dist) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist '(at %s)' % (link_pointer, dist.project_name, dist.location) AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed ``` - apache#29099 (comment) (amp-jenkins-worker-04) - apache#29090 (comment) (amp-jenkins-worker-03) Seems like the previous installation of editable mode affects other PRs. This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge. ### Why are the changes needed? To recover the Jenkins build. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Jenkins build will test it out. Closes apache#29102 from HyukjinKwon/SPARK-32303. Lead-authored-by: HyukjinKwon <[email protected]> Co-authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…llation in PIP test Currently the Jenkins PIP packaging test fails as below intermediately: ``` Installing dist into virtual env Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0) Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB) Installing collected packages: py4j, pyspark Found existing installation: py4j 0.10.9 Uninstalling py4j-0.10.9: Successfully uninstalled py4j-0.10.9 Found existing installation: pyspark 3.1.0.dev0 Exception: Traceback (most recent call last): File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main status = self.run(options, args) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run use_user_site=options.use_user_site, File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs auto_confirm=True File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall uninstalled_pathset = UninstallPathSet.from_dist(dist) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist '(at %s)' % (link_pointer, dist.project_name, dist.location) AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed ``` - apache#29099 (comment) (amp-jenkins-worker-04) - apache#29090 (comment) (amp-jenkins-worker-03) Seems like the previous installation of editable mode affects other PRs. This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge. To recover the Jenkins build. No, dev-only. Jenkins build will test it out. Closes apache#29102 from HyukjinKwon/SPARK-32303. Lead-authored-by: HyukjinKwon <[email protected]> Co-authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…llation in PIP test Currently the Jenkins PIP packaging test fails as below intermediately: ``` Installing dist into virtual env Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0) Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB) Installing collected packages: py4j, pyspark Found existing installation: py4j 0.10.9 Uninstalling py4j-0.10.9: Successfully uninstalled py4j-0.10.9 Found existing installation: pyspark 3.1.0.dev0 Exception: Traceback (most recent call last): File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main status = self.run(options, args) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run use_user_site=options.use_user_site, File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs auto_confirm=True File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall uninstalled_pathset = UninstallPathSet.from_dist(dist) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist '(at %s)' % (link_pointer, dist.project_name, dist.location) AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed ``` - apache#29099 (comment) (amp-jenkins-worker-04) - apache#29090 (comment) (amp-jenkins-worker-03) Seems like the previous installation of editable mode affects other PRs. This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge. To recover the Jenkins build. No, dev-only. Jenkins build will test it out. Closes apache#29102 from HyukjinKwon/SPARK-32303. Lead-authored-by: HyukjinKwon <[email protected]> Co-authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…llation in PIP test ### What changes were proposed in this pull request? Currently the Jenkins PIP packaging test fails as below intermediately: ``` Installing dist into virtual env Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0) Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB) Installing collected packages: py4j, pyspark Found existing installation: py4j 0.10.9 Uninstalling py4j-0.10.9: Successfully uninstalled py4j-0.10.9 Found existing installation: pyspark 3.1.0.dev0 Exception: Traceback (most recent call last): File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main status = self.run(options, args) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run use_user_site=options.use_user_site, File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs auto_confirm=True File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall uninstalled_pathset = UninstallPathSet.from_dist(dist) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist '(at %s)' % (link_pointer, dist.project_name, dist.location) AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed ``` - apache#29099 (comment) (amp-jenkins-worker-04) - apache#29090 (comment) (amp-jenkins-worker-03) Seems like the previous installation of editable mode affects other PRs. This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge. ### Why are the changes needed? To recover the Jenkins build. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Jenkins build will test it out. Closes apache#29102 from HyukjinKwon/SPARK-32303. Lead-authored-by: HyukjinKwon <[email protected]> Co-authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…llation in PIP test Currently the Jenkins PIP packaging test fails as below intermediately: ``` Installing dist into virtual env Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0) Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB) Installing collected packages: py4j, pyspark Found existing installation: py4j 0.10.9 Uninstalling py4j-0.10.9: Successfully uninstalled py4j-0.10.9 Found existing installation: pyspark 3.1.0.dev0 Exception: Traceback (most recent call last): File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main status = self.run(options, args) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run use_user_site=options.use_user_site, File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs auto_confirm=True File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall uninstalled_pathset = UninstallPathSet.from_dist(dist) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist '(at %s)' % (link_pointer, dist.project_name, dist.location) AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed ``` - apache#29099 (comment) (amp-jenkins-worker-04) - apache#29090 (comment) (amp-jenkins-worker-03) Seems like the previous installation of editable mode affects other PRs. This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge. To recover the Jenkins build. No, dev-only. Jenkins build will test it out. Closes apache#29102 from HyukjinKwon/SPARK-32303. Lead-authored-by: HyukjinKwon <[email protected]> Co-authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…llation in PIP test ### What changes were proposed in this pull request? Currently the Jenkins PIP packaging test fails as below intermediately: ``` Installing dist into virtual env Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0) Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB) Installing collected packages: py4j, pyspark Found existing installation: py4j 0.10.9 Uninstalling py4j-0.10.9: Successfully uninstalled py4j-0.10.9 Found existing installation: pyspark 3.1.0.dev0 Exception: Traceback (most recent call last): File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main status = self.run(options, args) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run use_user_site=options.use_user_site, File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs auto_confirm=True File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall uninstalled_pathset = UninstallPathSet.from_dist(dist) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist '(at %s)' % (link_pointer, dist.project_name, dist.location) AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed ``` - apache#29099 (comment) (amp-jenkins-worker-04) - apache#29090 (comment) (amp-jenkins-worker-03) Seems like the previous installation of editable mode affects other PRs. This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge. ### Why are the changes needed? To recover the Jenkins build. ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Jenkins build will test it out. Closes apache#29102 from HyukjinKwon/SPARK-32303. Lead-authored-by: HyukjinKwon <[email protected]> Co-authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…llation in PIP test Currently the Jenkins PIP packaging test fails as below intermediately: ``` Installing dist into virtual env Processing ./python/dist/pyspark-3.1.0.dev0.tar.gz Collecting py4j==0.10.9 (from pyspark==3.1.0.dev0) Downloading https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl (198kB) Installing collected packages: py4j, pyspark Found existing installation: py4j 0.10.9 Uninstalling py4j-0.10.9: Successfully uninstalled py4j-0.10.9 Found existing installation: pyspark 3.1.0.dev0 Exception: Traceback (most recent call last): File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 179, in main status = self.run(options, args) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 393, in run use_user_site=options.use_user_site, File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 50, in install_given_reqs auto_confirm=True File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 816, in uninstall uninstalled_pathset = UninstallPathSet.from_dist(dist) File "/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py", line 505, in from_dist '(at %s)' % (link_pointer, dist.project_name, dist.location) AssertionError: Egg-link /home/jenkins/workspace/SparkPullRequestBuilder3/python does not match installed ``` - apache#29099 (comment) (amp-jenkins-worker-04) - apache#29090 (comment) (amp-jenkins-worker-03) Seems like the previous installation of editable mode affects other PRs. This PR simply works around by removing the symbolic link from the previous editable installation. This is a common workaround up to my knowledge. To recover the Jenkins build. No, dev-only. Jenkins build will test it out. Closes apache#29102 from HyukjinKwon/SPARK-32303. Lead-authored-by: HyukjinKwon <[email protected]> Co-authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
mridulm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this @attilapiros !
| } | ||
| if (executorMemory != null | ||
| && Try(JavaUtils.byteStringAsBytes(executorMemory)).getOrElse(-1L) <= 0) { | ||
| && Try(JavaUtils.byteStringAsMb(executorMemory)).getOrElse(-1L) <= 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory, the only difference would be if user set the memory to < 1 mb.
This is ridiculous enough to ignore as a valid usecase :-)
| static String addDefaultMSuffixIfNeeded(String memoryString) { | ||
| if (memoryString.chars().allMatch(Character::isDigit)) { | ||
| System.err.println("Memory setting without explicit unit (" + | ||
| memoryString + ") is taken to be in MB by default! For details check SPARK-32293."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we are documenting 'm' is the suffix we use if not specified, do we need this message to stderr ?
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
Fixing inconsistency between Spark memory configs and JVM option by adding the "m" default unit before setting Xmx/Xms JVM options when no suffix is provided.
Why are the changes needed?
Spark's maximum memory can be configured in several ways:
Both for executors and for the driver the memory can be configured separately. All of these are following the format of JVM memory configurations in a way they are using the very same size unit suffixes ("k", "m", "g" or "t") but there is an inconsistency regarding the default unit. When no suffix is given then the given amount is passed as it is to the JVM (to the -Xmx and -Xms options) where this memory options are using bytes as a default unit, for this please see the example here:
Although the Spark memory config is in MiB.
Does this PR introduce any user-facing change?
Yes, before this PR when no suffix was given the
XmXandXmsJVM options could have been configured to the 1/1024 of the valid amount because of the conversion difference between the units (bytes and megabytes).This could happen in only in client mode. So the followings are affected:
spark-shell,spark-sql,pyspark,sparkRandbeelineas the result of the changes done inSparkClassCommandBuilderspark-submit(only in client mode) as the result of the changes done inSparkSubmitCommandBuilderThe fact the limit to start an application is 471859200 bytes (and this number is already long enough and inconvenient to calculate with) is decreasing the number of affected cases.
How was this patch tested?
With unit test
See
SparkSubmitCommandBuilderSuite.With a manual testing
By executing a Spark example.
Without my change and without explicit suffix unit:
You can see from the error above that it is not the Spark check for the memory is triggered as with the contra test (still without my change, but with suffix is given but a too low memory is set):
And after this PR running the first case (without explicit suffix unit):
And even
jpsshows the correct value: