Katapult is a Python package that allows you to run any script on a cloud service (for now AWS only).
- Easily run scripts on AWS by writing a simple configuration file
- Handles Python and Julia scripts, or any command
- Handles PyPi , Conda/Mamba, Apt-get and Julia environments
- Concurrent instance support
- Handles disconnections from instances, including stopped or terminated instances
- Handles interruption of Katapult, with state recovery
- Runs locally or on a remote instance, with 'watcher' functionality
Important Note |
---|
Katapult helps you easily create instances on AWS so that you can focus on your scripts. It is important to realize that it can and will likely generate extra costs. If you want to minimize those costs, activate the eco mode in the configuration or make sure you monitor the resources created by Katapult. Those include:
|
In order to use the python AWS client (Boto3), you need to have an existing AWS account and to setup your computer for AWS.
- Go to the AWS Signup page and create an account
- Download the AWS CLI
- In the AWS web console, create a user with administrator privilege
- In the AWS web console, under the AMI section, click on the new user and make sure you create an access key under the tab "Security Credentials". Make sure "Console Password" is Enabled as well
- In ther Terminal, use the AWS CLI to setup your configuration:
aws configure
See https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html
- To run in
remote
mode, you also need to add the following credentials to your user (maybe):
- iam:PassRole
- iam:CreateRole
- ec2:AssociateIamInstanceProfile
- ec2:ReplaceIamInstanceProfileAssociation
- Go to the AWS Signup page and create an account
- In the AWS web console, create a user with administrator privilege
- In the AWS web console, under the IAM section, click on the new user and make sure you create an access key under the tab "Security Credentials". Make sure "Console Password" is Enabled as well
- Add your new user credentials manually, in the credentials file
[default]
region = eu-west-3
output = json
[default]
aws_access_key_id = YOUR_ACCESS_KEY_ID
aws_secret_access_key = YOUR_SECRET_ACCESS_KEY
- In the AWS web console, in the IAM service, create a group 'katapult-users' with
AmazonEC2FullAccess
andIAMFullAccess
permissions - In the AWS web console, in the IAM service, create a user USERNAME attached to the 'katapult-users' group:
COPY the Access Key info !
- Add your new user profile manually, in the credentials file
[default]
region = eu-west-3
output = json
[profile katapult]
region = eu-west-3
output = json
[default]
aws_access_key_id = YOUR_ACCESS_KEY_ID
aws_secret_access_key = YOUR_SECRET_ACCESS_KEY
[katapult]
aws_access_key_id = YOU_PROFILE_ACCESS_KEY_ID
aws_secret_access_key = YOUR_PROFILE_SECRET_ACCESS_KEY
- add the 'profile' : 'katapult_USERNAME' to the configuration
config = {
################################################################################
# GLOBALS
################################################################################
'project' : 'test' , # this will be concatenated with the instance hashes (if not None)
'profile' : 'katapult' ,
...
python3 -m venv .venv
source ./.venv/bin/activate
python3 -m ensurepip --default-pip
python3 -m pip install -r requirements.txt
curl -sSL https://install.python-poetry.org | python3.8 -
poetry install
C:\> python3 -m venv .venv
C:\> .venv\\Scripts\\activate.bat
C:\> python -m pip install -r requirements.txt
C:\> (Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | py -
C:\> poetry install
# copy the example file
cp examples/config.example.py config.py
#
# EDIT THE FILE
#
# to run with pip
python3 -m katapult.demo config
# to run with pip with reset (maestro and instances)
python3 -m katapult.demo config reset
# to run with poetry
poetry run demo config
# to run with poetry with reset (maestro and the instances)
poetry run demo config reset
# to run script flow test
poetry install -E scriptflow
cd examples/scriptflow/simple
# [!] EDIT THE PROFILE_NAME in sflow.py [!]
scriptflow run sleepit
config = {
################################################################################
# GLOBALS
################################################################################
'project' : 'test' , # this will be concatenated with the instance hashes (if not None)
'profile' : None , # if you want to use a specific profile (user/region), specify its name here
'dev' : False , # When True, this will ensure the same instance and dev environement are being used (while working on building up the project)
'debug' : 1 , # debug level (0...3)
'maestro' : 'local' , # where the 'maestro' resides: local' | 'remote' (micro instance)
'auto_stop' : True , # will automatically stop the instances and the maestro, once the jobs are done
'provider' : 'aws' , # the provider name ('aws' | 'azure' | ...)
'job_assign' : None , # algorithm used for job assignation / task scheduling ('random' | 'multi_knapsack')
'recover' : True , # if True, Katapult will always save the state and try to recover this state on the next execution
'print_deploy' : False , # if True, this will cause the deploy stage to print more (and lock)
'mutualize_uploads' : True , # adjusts the directory structure of the uploads ... (False = per job or True = global/mutualized)
################################################################################
# INSTANCES / HARDWARE
################################################################################
'instances' : [
{
'region' : None , # can be None or has to be valid. Overrides AWS user region configuration.
'cloud_id' : None , # can be None, or even wrong/non-existing - then the default one is used
'img_id' : 'ami-077fd75cd229c811b' , # OS image: has to be valid and available for the profile (user/region)
'img_username' : 'ubuntu' , # the SSH user for the image
'type' : 't2.micro' , # proprietary size spec (has to be valid)
'cpus' : None , # number of CPU cores
'gpu' : None , # the proprietary type of the GPU
'disk_size' : None , # the disk size of this instance type (in GB)
'disk_type' : None , # the proprietary disk type of this instance type: 'standard', 'io1', 'io2', 'st1', etc
'eco' : True , # eco == True >> SPOT e.g.
'eco_life' : None , # lifecycle of the machine in ECO mode (datetime.timedelta object) (can be None with eco = True)
'max_bid' : None , # max bid ($/hour) (can be None with eco = True)
'number' : 1 , # multiplicity: the number of instance(s) to create
'explode' : True # multiplicity: can this instance type be distributed accross multiple instances, to split CPUs
}
] ,
################################################################################
# ENVIRONMENTS / SOFTWARE
################################################################################
'environments' : [
{
'name' : None , # name of the environment - should be unique if not 'None'. 'None' only when len(environments)==1
# env_conda + env_pypi : mamba is used to setup the env (pip dependencies included)
# env_conda (only) : mamba is used to setup the env
# env_pypi (only) : venv + pip is used to setup the env
'command' : 'examples/install_julia.sh' , # None, or a string: path to a bash file to execute when deploying
'env_aptget' : [ "openssh-client"] , # None, an array of librarires/binaries for apt-get
'env_conda' : "examples/environment.yml", # None, an array of libraries, a path to environment.yml file, or a path to the root of a conda environment
'env_conda_channels' : None , # None, an array of channels. If None (or absent), defaults and conda-forge will be used
'env_pypi' : "examples/requirements.txt" , # None, an array of libraries, a path to requirements.txt file, or a path to the root of a venv environment
'env_julia' : [ "Wavelets" ] , # None, a string or an array of Julia packages to install (requires julia)
}
] ,
################################################################################
# JOBS / SCRIPTS
################################################################################
'jobs' : [
{
'env_name' : None , # the environment to use (can be 'None' if solely one environment is provided above)
'cpus_req' : None , # the CPU(s) requirements for the process (can be None)
'run_script' : 'examples/run_remote.py 1 10',# the script to run (Python (.py) or Julia (.jl) for now) (prioritised vs 'run_command')
'run_command' : None , # the command to run
'upload_files' : [ "uploaded.txt"] , # any file to upload (array or string) - will be put in the same directory
'input_files' : 'input.dat' , # the input file name (used by the script)
'output_files' : 'output.dat' , # the output file name (used by the script)
'repeat' : 2 , # the number of times this job is repeated
} ,
{
'env_name' : None , # the environment to use (can be 'None' if solely one environment is provided above)
'cpus_req' : None , # the CPU(s) requirements for the process (can be None)
'run_script' : 'examples/run_remote.py 2 12',# the script to run (Python (.py) or Julia (.jl) for now) (prioritised vs 'run_command')
'run_command' : None , # the command to run
'upload_files' : [ "uploaded.txt"] , # any file to upload (array or string) - will be put in the same directory
'input_files' : 'input.dat' , # the input file name (used by the script)
'output_files' : 'output.dat' , # the output file name (used by the script)
}
]
}
config = {
'debug' : 1 , # debug level (0...3)
'maestro' : 'local' , # where the 'maestro' resides: local' | 'remote' (nano instance) | 'lambda'
'provider' : 'aws' , # the provider name ('aws' | 'azure' | ...)
'instances' : [
{
'type' : 't2.micro' , # proprietary size spec (has to be valid)
}
] ,
'environments' : [
{
'name' : None , # name of the environment - should be unique if not 'None'. 'None' only when len(environments)==1
'env_conda' : "examples/environment.yml", # None, an array of libraries, a path to environment.yml file, or a path to the root of a conda environment
'env_julia' : ["Wavelets"] , # None, a string or an array of Julia packages to install (requires julia)
}
] ,
'jobs' : [
{
'env_name' : None , # the environment to use (can be 'None' if solely one environment is provided above)
'cpus_req' : None , # the CPU(s) requirements for the process (can be None)
'run_script' : 'examples/run_remote.py 1 10',# the script to run (Python (.py) or Julia (.jl) for now) (prioritised vs 'run_command')
'upload_files' : [ "uploaded.txt"] , # any file to upload (array or string) - will be put in the same directory
'input_files' : 'input.dat' , # the input file name (used by the script)
'output_files' : 'output.dat' , # the output file name (used by the script)
} ,
{
'env_name' : None , # the environment to use (can be 'None' if solely one environment is provided above)
'cpus_req' : None , # the CPU(s) requirements for the process (can be None)
'run_script' : 'examples/run_remote.py 2 12',# the script to run (Python (.py) or Julia (.jl) for now) (prioritised vs 'run_command')
'upload_files' : [ "uploaded.txt"] , # any file to upload (array or string) - will be put in the same directory
'input_files' : 'input.dat' , # the input file name (used by the script)
'output_files' : 'output.dat' , # the output file name (used by the script)
}
]
}
class KatapultLightProvider(ABC):
class KatapultFatProvider(ABC):
def debug(self,level,*args,**kwargs):
# start the provider: creates the instances
# if reset = True, Katapult forces a process cleanup as well as more re-uploads
def start(self,reset):
# deploy all materials (environments, files, scripts etc.)
def deploy(self):
# run the jobs
# returns a KatapultRunSession
def run(self,wait=False):
# wait for the processes to reach a state
def wait(self,job_state,run_session=None):
# get the states of the processes
def get_jobs_states(self,run_session=None):
# print a summary of processes
def print_jobs_summary(self,run_session=None,instance=None):
# print the aborted logs, if any
def print_aborted_logs(self,run_session=None,instance=None):
# fetch results data
def fetch_results(self,out_directory=None,run_session=None):
# wait for the watcher process to be completely done (useful for demo)
def finalize(self):
# wakeup = start + assign + deploy + run + watch
def wakeup(self)
@abstractmethod
def get_region(self):
@abstractmethod
def get_recommended_cpus(self,inst_cfg):
@abstractmethod
def create_instance_objects(self,config,for_maestro):
@abstractmethod
def find_instance(self,config):
@abstractmethod
def start_instance(self,instance):
@abstractmethod
def stop_instance(self,instance):
@abstractmethod
def terminate_instance(self,instance):
@abstractmethod
def update_instance_info(self,instance):
# GLOBAL methods
def get_client(provider='aws',maestro='local')
Note: this demo works the same way, whether Katapult runs locally or remotely
from katapult import provider as katapult
from katapult.core import KatapultProcessState
import asyncio
# load config
config = __import__(config).config
# create provider: this loads the config
provider = katapult.get_client(config)
# start the provider: this attempts to create the instances
await provider.start()
# deploy the necessary stuff onto the instances
await provider.deploy()
# run the jobs and get active processes objects back
run_session = await provider.run()
# wait for the active proccesses to be done or aborted:
await provider.wait(KatapultProcessState.DONE|KatapultProcessState.ABORTED)
# you can get the state of all jobs this way:
await provider.get_jobs_states()
# or get the state for a specific run session:
await provider.get_jobs_states(run_session)
# you can print processes summary with:
await provider.print_jobs_summary()
# get the results file locally
await provider.fetch_results('./tmp')
Note: the commands below work the same way, whether Katapult runs locally or remotely
# init the client with global params and add instances, envs and jobs (if any)
poetry run cli init config.py
# add more jobs
poetry run cli cfg_add_jobs config_jobs.py
# add more stuff
poetry run cli cfg_add_config config_more.py
# deploy the material onto the instances
poetry run cli deploy
# run the jobs
poetry run cli run
# wait for the jobs to be done
poetry run cli wait
# get the results
poetry run cli fetch_results
# shutdown the daemon
poetry run cli shutdown
# init the client with global params and add instances, envs and jobs (if any)
python3 -m katapult.cli init config.py
# add more jobs
python3 -m katapult.cli cfg_add_jobs config_jobs.py
# add more stuff
python3 -m katapult.cli cfg_add_config config_more.py
# deploy the material onto the instances
python3 -m katapult.cli deploy
# run the jobs
python3 -m katapult.cli run
# wait for the jobs to be done
python3 -m katapult.cli wait
# get the results
python3 -m katapult.cli fetch_results
# shutdown the daemon
python3 -m katapult.cli shutdown
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.