- Added --proxy option in order to set a proxy to access to Python packages repositories.
- Added plugin-env section on configuration file in order to be able to set environment variables on plugin download process.
- Added --plugin-env option (and its environment variable associated SPARPY_PLUGIN_ENVVARS) in order to set environment variables on plugin download process. It could be necessary on some cases using conda environments.
- Added environment variable SPARPY_CONFIG for option --config.
- Added environment variable SPARPY_DEBUG for option --debug.
- Fix isparpy.
- Fix ignoring all packages when exclude packages list is empty.
- Fix Python package regex.
- Fix download script.
- Added --exclude-python-packages option in order to exclude python packages.
- Better parsing plugins names.
- Added --exclude-packages option in order to exclude spark packages.
- Fix isparpy entrypoint. Allows --class parameter.
- Allow to set constraints files.
- Don't set master and deploy_mode default values.
- Fix sparpy-submit entrypoint.
- Fix --property-file option.
- Fix --class option.
- Able to use environment variables for the most of options.
- Support to set pip options as configuration using --conf sparpy.config-key=value in order to allow to use sparpy-submit in EMR-on-EKS images.
- Allows --class in order to allow to use sparpy-submit in EMR-on-EKS images.
- Allows --property-file in order to allow to use sparpy-submit in EMR-on-EKS images.
- Added --pre option in order to allow pre-release packages.
- Added --env option in order to set environment variables for spark process.
- Added spark-env config section in order to set environment variables for spark process.
- Write pip output when it fails.
- Fixed problems with interactive sparpy.
- Fixed no-self option in config file.
- Allow to use plugins that don't use click. They must be callable with one argument of type Sequence[str] in order to pass arguments to it.
- Added --version option in order to print sparpy version.
- Fixed error when a plugin requires a package which is already installed but version does not satisfy requirement.
- Sparpy does not print error traceback when subprocess fails.
- Enable --force-download option.
- Added --find-links option in order to use a directory as package repository.
- Added --no-index option in order to avoid to use external package repositories.
- Added --queue option in order to set yarn queue.
- Ensure driver's python executable is same python as sparpy.
- Added new entry point sparpy-download just to download packages to specific directory.
- Added new entry point isparpy in order to start an interactive session.
- Force pyspark python executable to same as sparpy.
- Fix unrecognized options.
- Fix default configuration file names.
- Added configuration file option.
- Added --debug option.
On package setup.py an entry point should be configured for Sparpy:
setup(
name='yourpackage',
...
entry_points={
...
'sparpy.cli_plugins': [
'my_command_1=yourpackage.module:command_1',
'my_command_2=yourpackage.module:command_2',
]
}
)
Note
Avoid to use PySpark as requirement in order to not download package from pypi.
It must be installed on a Spark edge node.
$ pip install sparpy[base]
Using default Spark submit parameters:
$ sparpy --plugin "mypackage>=0.1" my_plugin_command --myparam 1
sparpy and sparpu-submit accept the parameter --config that allow to set a configuration file. If it is not set it will try to use configuration file $HOME/.sparpyrc. It if does not exist it will try to use /etc/sparpy.conf.
Format:
[spark]
master=yarn
deploy-mode=client
queue=my_queue
spark-executable=/path/to/my-spark-submit
conf=
spark.conf.1=value1
spark.conf.2=value2
packages=
maven:package_1:0.1.1
maven:package_2:0.6.1
repositories=
https://my-maven-repository-1.com/mvn
https://my-maven-repository-2.com/mvn
reqs_paths=
/path/to/dir/with/python/packages_1
/path/to/dir/with/python/packages_2
[spark-env]
MY_ENV_VAR=value
[plugins]
extra-index-urls=
https://my-pypi-repository-1.com/simple
https://my-pypi-repository-2.com/simple
cache-dir=/path/to/cache/dir
plugins=
my-package1
my-package2==0.1.2
requirements-files=
/path/to/requirement-1.txt
/path/to/requirement-2.txt
find-links=
/path/to/directory/with/packages_1
/path/to/directory/with/packages_2
download-dir-prefix=my_prefix_
no-index=false
no-self=false
force-download=true
[plugin-env]
MY_ENV_VAR=value
[interactive]
pyspark-executable=/path/to/pyspark
python-interactive-driver=/path/to/interactive/driver