-
Install python3 (3.7.5) if necessary. On MacOS using Homebrew to install python3 is pretty good.
-
Create a new python virtual environment in the 'actor-critic-with-emphatic-weightings' directory (named 've' in this case):
$ python3 -m venv ve
- Activate the virtual environment:
$ source ve/bin/activate
- Install the required python package dependencies:
(ve)$ pip install -r requirements.txt
- Activate the virtual environment (if necessary):
$ cd $SCRATCH/actor-critic-with-emphatic-weightings/
$ source ve/bin/activate
- Read the help output for each script to determine which arguments you want to run your experiment with.
(ve)$ python generate_experience.py --help
(ve)$ python sweep.py --help
(ve)$ python run_ace.py --help
(ve)$ python evaluate_policies.py --help
- Generate the data to use to train the agents:
(ve)$ python generate_experience.py
Generating the data ahead of time is more efficient than doing it for each agent, and is possible due to off-policy learning.
- Run the sweep.py python script to generate bash scripts for SLURM to run:
(ve)$ python sweep.py
The script will give you a really rough estimate of how long the job might take and the number of nodes necessary to complete the job in the amount of time specified via the "--num_hours" argument. If requesting that number of nodes is ok with you, type "y", hit enter, and the script will generate the individual bash scripts for each node.
- Schedule the generated script(s) to run via SLURM:
(ve)$ sbatch mountain-car/sweep0.sh
- Evaluate the resulting policies:
(ve)$ python evaluate_policies.py
- Use a jupyter notebook to explore the data and generate plots of performance.