Skip to content

Commit d6ddfdf

Browse files
committed
[SPARK-19955][PYSPARK] Jenkins Python Conda based test.
## What changes were proposed in this pull request? Allow Jenkins Python tests to use the installed conda to test Python 2.7 support & test pip installability. ## How was this patch tested? Updated shell scripts, ran tests locally with installed conda, ran tests in Jenkins. Author: Holden Karau <[email protected]> Closes #17355 from holdenk/SPARK-19955-support-python-tests-with-conda.
1 parent c622a87 commit d6ddfdf

File tree

3 files changed

+47
-28
lines changed

3 files changed

+47
-28
lines changed

dev/run-pip-tests

Lines changed: 42 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -35,32 +35,37 @@ function delete_virtualenv() {
3535
}
3636
trap delete_virtualenv EXIT
3737

38+
PYTHON_EXECS=()
3839
# Some systems don't have pip or virtualenv - in those cases our tests won't work.
39-
if ! hash virtualenv 2>/dev/null; then
40-
echo "Missing virtualenv skipping pip installability tests."
40+
if hash virtualenv 2>/dev/null && [ ! -n "$USE_CONDA" ]; then
41+
echo "virtualenv installed - using. Note if this is a conda virtual env you may wish to set USE_CONDA"
42+
# Figure out which Python execs we should test pip installation with
43+
if hash python2 2>/dev/null; then
44+
# We do this since we are testing with virtualenv and the default virtual env python
45+
# is in /usr/bin/python
46+
PYTHON_EXECS+=('python2')
47+
elif hash python 2>/dev/null; then
48+
# If python2 isn't installed fallback to python if available
49+
PYTHON_EXECS+=('python')
50+
fi
51+
if hash python3 2>/dev/null; then
52+
PYTHON_EXECS+=('python3')
53+
fi
54+
elif hash conda 2>/dev/null; then
55+
echo "Using conda virtual enviroments"
56+
PYTHON_EXECS=('3.5')
57+
USE_CONDA=1
58+
else
59+
echo "Missing virtualenv & conda, skipping pip installability tests"
4160
exit 0
4261
fi
4362
if ! hash pip 2>/dev/null; then
4463
echo "Missing pip, skipping pip installability tests."
4564
exit 0
4665
fi
4766

48-
# Figure out which Python execs we should test pip installation with
49-
PYTHON_EXECS=()
50-
if hash python2 2>/dev/null; then
51-
# We do this since we are testing with virtualenv and the default virtual env python
52-
# is in /usr/bin/python
53-
PYTHON_EXECS+=('python2')
54-
elif hash python 2>/dev/null; then
55-
# If python2 isn't installed fallback to python if available
56-
PYTHON_EXECS+=('python')
57-
fi
58-
if hash python3 2>/dev/null; then
59-
PYTHON_EXECS+=('python3')
60-
fi
61-
6267
# Determine which version of PySpark we are building for archive name
63-
PYSPARK_VERSION=$(python -c "exec(open('python/pyspark/version.py').read());print __version__")
68+
PYSPARK_VERSION=$(python3 -c "exec(open('python/pyspark/version.py').read());print(__version__)")
6469
PYSPARK_DIST="$FWDIR/python/dist/pyspark-$PYSPARK_VERSION.tar.gz"
6570
# The pip install options we use for all the pip commands
6671
PIP_OPTIONS="--upgrade --no-cache-dir --force-reinstall "
@@ -75,18 +80,24 @@ for python in "${PYTHON_EXECS[@]}"; do
7580
echo "Using $VIRTUALENV_BASE for virtualenv"
7681
VIRTUALENV_PATH="$VIRTUALENV_BASE"/$python
7782
rm -rf "$VIRTUALENV_PATH"
78-
mkdir -p "$VIRTUALENV_PATH"
79-
virtualenv --python=$python "$VIRTUALENV_PATH"
80-
source "$VIRTUALENV_PATH"/bin/activate
81-
# Upgrade pip & friends
82-
pip install --upgrade pip pypandoc wheel
83-
pip install numpy # Needed so we can verify mllib imports
83+
if [ -n "$USE_CONDA" ]; then
84+
conda create -y -p "$VIRTUALENV_PATH" python=$python numpy pandas pip setuptools
85+
source activate "$VIRTUALENV_PATH"
86+
else
87+
mkdir -p "$VIRTUALENV_PATH"
88+
virtualenv --python=$python "$VIRTUALENV_PATH"
89+
source "$VIRTUALENV_PATH"/bin/activate
90+
fi
91+
# Upgrade pip & friends if using virutal env
92+
if [ ! -n "USE_CONDA" ]; then
93+
pip install --upgrade pip pypandoc wheel numpy
94+
fi
8495

8596
echo "Creating pip installable source dist"
8697
cd "$FWDIR"/python
8798
# Delete the egg info file if it exists, this can cache the setup file.
8899
rm -rf pyspark.egg-info || echo "No existing egg info file, skipping deletion"
89-
$python setup.py sdist
100+
python setup.py sdist
90101

91102

92103
echo "Installing dist into virtual env"
@@ -112,6 +123,13 @@ for python in "${PYTHON_EXECS[@]}"; do
112123

113124
cd "$FWDIR"
114125

126+
# conda / virtualenv enviroments need to be deactivated differently
127+
if [ -n "$USE_CONDA" ]; then
128+
source deactivate
129+
else
130+
deactivate
131+
fi
132+
115133
done
116134
done
117135

dev/run-tests-jenkins

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,8 @@
2222
# Environment variables are populated by the code here:
2323
#+ https://github.com/jenkinsci/ghprb-plugin/blob/master/src/main/java/org/jenkinsci/plugins/ghprb/GhprbTrigger.java#L139
2424

25-
FWDIR="$(cd "`dirname $0`"/..; pwd)"
25+
FWDIR="$( cd "$( dirname "$0" )/.." && pwd )"
2626
cd "$FWDIR"
2727

28+
export PATH=/home/anaconda/bin:$PATH
2829
exec python -u ./dev/run-tests-jenkins.py "$@"

python/run-tests.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -111,9 +111,9 @@ def run_individual_python_test(test_name, pyspark_python):
111111

112112

113113
def get_default_python_executables():
114-
python_execs = [x for x in ["python2.6", "python3.4", "pypy"] if which(x)]
115-
if "python2.6" not in python_execs:
116-
LOGGER.warning("Not testing against `python2.6` because it could not be found; falling"
114+
python_execs = [x for x in ["python2.7", "python3.4", "pypy"] if which(x)]
115+
if "python2.7" not in python_execs:
116+
LOGGER.warning("Not testing against `python2.7` because it could not be found; falling"
117117
" back to `python` instead")
118118
python_execs.insert(0, "python")
119119
return python_execs

0 commit comments

Comments
 (0)