The "hello world" of reinforcement learning in my opinion. So why make this? I want to be able to provide more advanced environments for machine learning agents, and Unity offers great flexibility for that. This is more of an example of how to create a Unity environment from scratch and interfact with it directly using Python.
This uses a custom double DQN: one for local and one for a target network. This agent learns based off raw pixels on the screen.
This is split into two sections. One is dedicated to installing PyTorch, while the other installs the remaining dependencies.
You can run this through Unity or with a prebuilt binary. If you want the prebuilt game you can grab the folder from Dropbox. Make sure you get Build
if you are running on Windows and Build.app
if you are running on Mac. If you're running on Mac and experience an issue try making the file executable if it's not: cd Build.app/Contents/MacOS & chmod +x GridWorld
.
Be sure to play either Build
or Build.app
under the GridWorld folder, i.e. /path/to/GridWorld/Build
or /path/to/GridWorld/Build.app
.
I'm not making this part of the requirements.txt
because there might be different ways you want to install PyTorch. Take a look at https://pytorch.org/.
If you want this to run with CUDA, you also need to install proper CUDA dependencies. PyTorch does not ship with CUDA. A list of CUDA toolkits can be found here. Make sure you toolkit version matches whta you download for PyTorch.
But what about TensorFlow? No.
In order to install everything else, just run pip install -r requirements.txt
.
The config file specifies the behavior of the game within Unity along with behavior within Python.
Specific to controlling attributes related to the DQN.
loss - The loss function to use. Defaults to mse_loss
. Can be any _loss
function found here.
optimizer - The optimizer to use. Defaults to Adam
. Currently only supports default hyperparameters to the optimizer. Must be found here.
device - Device to run the DQN on. This will attempt to run it on cuda:0
if available and use cpu
otherwise.
optimizer - Optimizer to use. Defaults to ADAM.
load_from_checkpoint - If you want to start the DQN from a checkpoint.tar
file, this is where you can specify the path to it.
eps_start - Starting epsilon value. This controls exploration. Defaults to 1.0
and must be within (0.0, 1.0]
.
eps_end - Ending epsilon value. Defaults to 0.01
and must be within (0.0, 1.0]
.
eps_decay - Decay rate of epsilon. Defaults to 0.99
and must be within (0.0, 1.0)
.
tau - Controls the amount of soft update between target and local network. Defaults to 1e-3
and must be larger than 0.0
.
gamma - Discount factor for the target network. Defaults to 0.99
and must be within (0.0, 1.0)
.
soft_update_every_n_episodes - The frequency to perform a soft update of the local network. Defaults to 4
.
Specific to controlling attributes related to experience replay. Experience replay is used within the DQN.
memory_size - Amount of prior experiences to keep track of. Defaults to 10,000
.
batch_size - Number of batches of prior experiences to train on at a time.
Specific to controlling attributes related to saving and tracking stats.
save_checkpoint_every_n_episodes - The frequency to save checkpoints of the agent at. Defaults to None
, i.e. will not save.
NOTE: Checkpoints are around 10MB in size. Keep this in mind when deciding how frequently to save.
sliding_window_average - The window size to keep track of statistics. Defaults to 100
.
save_stats_every_n_episodes - Frequency to save stats at. Defaults to None
, i.e. will not save.
save_on_shutdown - Whether or not to save the agent on shutdown. Defaults to True
. Helpful if you kill an agent with Ctrl + C
but what a recent snapshot, or if your computer crashes.
Specific to controlling attributes related to the actual Unity game.
num_targets - Number of targets (goals) in the game. Defaults to 2
.
num_fires - Number of fires in the game. Defaults to 4
.
allow_light_source - Whether or not to allow a light source in the game. This ultimately adds some reflection to the game. Defaults to True
.
step_reward - The reward the agent receives each time step. Defaults to -0.1
.
target_reward - The reward the agent receives when touching the target. Defaults to 1.0
.
fire_reward - The reward the agent receives when touching fire. Defaults to -1.0
Command line arguments are used in conjuction with the config. I wanted to split them into a config section that controls more of the hyperparameters, and a command line section which just does a little additional setup.
-c /path/to/file.config or --config /path/to/file.config
can be used to specify the config file to load. This will load default_settings.config
if not specified.
--train
can be added to specify you want to enter training mode with an agent. This will load any configs necessary and can also be used with a checkpoint.
Want to test your agent? Easy. You can add --test /path/to/checkpoint.tar
which will load an agent.
NOTE: This will also load the epsilon value associated with that checkpoint. This is to ensure that agents can still behave appropriately with some uncertainty in the environment.
This is used in conjunction with config stats. If it's not set in the config, then adding this argument won't do anything.
If it is set, specify save-checkpoint /path/to/folder
which you would like to save checkpoints to
NOTE:The specified directory must not exist. It will be created.
There may be a need to load from a checkpoint if your computer crashes or you wish to continue training from a certain point in time. For this you can specify --load-checkpoint /path/to/checkpoint.tar
.
This is only needed if you have specified a frequency to save stats. You can give a new location with --save-stats /path/to/stats_to_create.csv
.
There may be a time where you want to tweak behavior of the game without rebuilding. It's easy to do if you run the game within Unity. By specifying --run-in-unity
you can do that. Have the Unity environment up and then you will be prompted to press play
within Unity once Python has created the environment.
It's possible that you might want to run the game within Unity and slow it down. You can specify --time-scale <float>
to change this. By default it's set to 1.0
. This changes Unity's Time.timeScale
, so use caution as setting it too high may have consequences on the physics engine.
grid_world.py --train
will begin training with the default config file.
grid_world.py --test "C:\users\chris\documents\checkpoints\checkpoint_20000.tar"
will test the agent loaded from that checkpoint with the default config file.
grid_world.py --train -c "C:\users\chris\documents\custom_setting.config" --load-checkpoint "C:\users\chris\documents\checkpoints\checkpoint_20000.tar"
will begin training from the specified checkpoint and using the custom config file.
grid_world.py --train -c "C:\users\chris\documents\custom_setting.config" --save-checkpoint "C:\users\chris\documents\new_checkpoints" --save-stats "C:\users\chris\documents\custom_setting.csv"
will begin training with a custom config file, saving checkpoints to a folder which will be created, and saving stats under a file which will also be created.
grid_world.py --train --run-in-unity
will begin training with all default parameters and will run within Unity.
grid_world.py --test "C:\users\chris\documents\checkpoints\checkpoint_20000.tar" --run-in-unity
will begin testing an saved checkpoint within Unity.
grid_world.py --train --run-in-unity --time-scale 5
will begin training with all default parameters and run within Unity at 5x speed.