[SPARK-20627][PYSPARK] Drop the hadoop distirbution name from the Python version#17885
[SPARK-20627][PYSPARK] Drop the hadoop distirbution name from the Python version#17885holdenk wants to merge 2 commits intoapache:masterfrom
Conversation
|
I'll target this for master, branch-2.2, branch-2.1. |
|
Test build #76535 has finished for PR 17885 at commit
|
| PYSPARK_VERSION=`echo "$SPARK_VERSION+$NAME" | sed -r "s/-/./" | sed -r "s/SNAPSHOT/dev0/"` | ||
| # Write out the VERSION to PySpark version info we rewrite the - into a . and SNAPSHOT | ||
| # to dev0 to be closer to PEP440. | ||
| PYSPARK_VERSION=`echo "$SPARK_VERSION" | sed -r "s/-/./" | sed -r "s/SNAPSHOT/dev0/"` |
There was a problem hiding this comment.
This also affects the pyspark-*.tgz artifact name. It seems like this means the same file name will be used for different flavors of the release. If they're identical anyway it's just redundant, but are they? I don't know this part well so might be misunderstanding what this would do.
There was a problem hiding this comment.
So we currently only package Python for one Hadoop version. If we start doing multiple Hadoop versions for Python we can figure out how to handle that again.
|
If there are no other comments I'm going to merge this tomorrow. |
|
Are you referring to https://www.python.org/dev/peps/pep-0440/ ? |
|
Could you post the changes you made in the PR description and explain why it resolves PEP-0440? It might help more people understand the impacts of this PR by reading the PR description. Thanks! |
|
Updated with more explanation of what we changed in the PR description. |
…hon version ## What changes were proposed in this pull request? Drop the hadoop distirbution name from the Python version (PEP440 - https://www.python.org/dev/peps/pep-0440/). We've been using the local version string to disambiguate between different hadoop versions packaged with PySpark, but PEP0440 states that local versions should not be used when publishing up-stream. Since we no longer make PySpark pip packages for different hadoop versions, we can simply drop the hadoop information. If at a later point we need to start publishing different hadoop versions we can look at make different packages or similar. ## How was this patch tested? Ran `make-distribution` locally Author: Holden Karau <holden@us.ibm.com> Closes #17885 from holdenk/SPARK-20627-remove-pip-local-version-string. (cherry picked from commit 1b85bcd) Signed-off-by: Holden Karau <holden@us.ibm.com>
|
Merged to master, branch-2.2, and branch-2.1. |
…hon version ## What changes were proposed in this pull request? Drop the hadoop distirbution name from the Python version (PEP440 - https://www.python.org/dev/peps/pep-0440/). We've been using the local version string to disambiguate between different hadoop versions packaged with PySpark, but PEP0440 states that local versions should not be used when publishing up-stream. Since we no longer make PySpark pip packages for different hadoop versions, we can simply drop the hadoop information. If at a later point we need to start publishing different hadoop versions we can look at make different packages or similar. ## How was this patch tested? Ran `make-distribution` locally Author: Holden Karau <holden@us.ibm.com> Closes #17885 from holdenk/SPARK-20627-remove-pip-local-version-string. (cherry picked from commit 1b85bcd) Signed-off-by: Holden Karau <holden@us.ibm.com>
|
Could you post the original section about It sounds like PEP0440 does not encourage it. Below is what I found
|
…hon version ## What changes were proposed in this pull request? Drop the hadoop distirbution name from the Python version (PEP440 - https://www.python.org/dev/peps/pep-0440/). We've been using the local version string to disambiguate between different hadoop versions packaged with PySpark, but PEP0440 states that local versions should not be used when publishing up-stream. Since we no longer make PySpark pip packages for different hadoop versions, we can simply drop the hadoop information. If at a later point we need to start publishing different hadoop versions we can look at make different packages or similar. ## How was this patch tested? Ran `make-distribution` locally Author: Holden Karau <holden@us.ibm.com> Closes apache#17885 from holdenk/SPARK-20627-remove-pip-local-version-string.
What changes were proposed in this pull request?
Drop the hadoop distirbution name from the Python version (PEP440 - https://www.python.org/dev/peps/pep-0440/). We've been using the local version string to disambiguate between different hadoop versions packaged with PySpark, but PEP0440 states that local versions should not be used when publishing up-stream. Since we no longer make PySpark pip packages for different hadoop versions, we can simply drop the hadoop information. If at a later point we need to start publishing different hadoop versions we can look at make different packages or similar.
How was this patch tested?
Ran
make-distributionlocally