Skip to content

Chrispresso/GridWorld

Repository files navigation

GridWorld

The "hello world" of reinforcement learning in my opinion. So why make this? I want to be able to provide more advanced environments for machine learning agents, and Unity offers great flexibility for that. This is more of an example of how to create a Unity environment from scratch and interfact with it directly using Python.

This uses a custom double DQN: one for local and one for a target network. This agent learns based off raw pixels on the screen.

Installation

This is split into two sections. One is dedicated to installing PyTorch, while the other installs the remaining dependencies.

You can run this through Unity or with a prebuilt binary. If you want the prebuilt game you can grab the folder from Dropbox. Make sure you get Build if you are running on Windows and Build.app if you are running on Mac. If you're running on Mac and experience an issue try making the file executable if it's not: cd Build.app/Contents/MacOS & chmod +x GridWorld.

Be sure to play either Build or Build.app under the GridWorld folder, i.e. /path/to/GridWorld/Build or /path/to/GridWorld/Build.app.

Installing PyTorch

I'm not making this part of the requirements.txt because there might be different ways you want to install PyTorch. Take a look at https://pytorch.org/.
If you want this to run with CUDA, you also need to install proper CUDA dependencies. PyTorch does not ship with CUDA. A list of CUDA toolkits can be found here. Make sure you toolkit version matches whta you download for PyTorch.

But what about TensorFlow? No.

Remaining Installation

In order to install everything else, just run pip install -r requirements.txt.

Understanding Config

The config file specifies the behavior of the game within Unity along with behavior within Python.

DQN

Specific to controlling attributes related to the DQN.

loss - The loss function to use. Defaults to mse_loss. Can be any _loss function found here.
optimizer - The optimizer to use. Defaults to Adam. Currently only supports default hyperparameters to the optimizer. Must be found here.
device - Device to run the DQN on. This will attempt to run it on cuda:0 if available and use cpu otherwise.
optimizer - Optimizer to use. Defaults to ADAM.
load_from_checkpoint - If you want to start the DQN from a checkpoint.tar file, this is where you can specify the path to it.
eps_start - Starting epsilon value. This controls exploration. Defaults to 1.0 and must be within (0.0, 1.0].
eps_end - Ending epsilon value. Defaults to 0.01 and must be within (0.0, 1.0].
eps_decay - Decay rate of epsilon. Defaults to 0.99 and must be within (0.0, 1.0).
tau - Controls the amount of soft update between target and local network. Defaults to 1e-3 and must be larger than 0.0. gamma - Discount factor for the target network. Defaults to 0.99 and must be within (0.0, 1.0). soft_update_every_n_episodes - The frequency to perform a soft update of the local network. Defaults to 4.

Experience Replay

Specific to controlling attributes related to experience replay. Experience replay is used within the DQN.

memory_size - Amount of prior experiences to keep track of. Defaults to 10,000. batch_size - Number of batches of prior experiences to train on at a time.

Stats

Specific to controlling attributes related to saving and tracking stats.

save_checkpoint_every_n_episodes - The frequency to save checkpoints of the agent at. Defaults to None, i.e. will not save.
NOTE: Checkpoints are around 10MB in size. Keep this in mind when deciding how frequently to save.
sliding_window_average - The window size to keep track of statistics. Defaults to 100.
save_stats_every_n_episodes - Frequency to save stats at. Defaults to None, i.e. will not save.
save_on_shutdown - Whether or not to save the agent on shutdown. Defaults to True. Helpful if you kill an agent with Ctrl + C but what a recent snapshot, or if your computer crashes.

Game

Specific to controlling attributes related to the actual Unity game.

num_targets - Number of targets (goals) in the game. Defaults to 2.
num_fires - Number of fires in the game. Defaults to 4.
allow_light_source - Whether or not to allow a light source in the game. This ultimately adds some reflection to the game. Defaults to True.
step_reward - The reward the agent receives each time step. Defaults to -0.1.
target_reward - The reward the agent receives when touching the target. Defaults to 1.0.
fire_reward - The reward the agent receives when touching fire. Defaults to -1.0

Command Line Arguments

Command line arguments are used in conjuction with the config. I wanted to split them into a config section that controls more of the hyperparameters, and a command line section which just does a little additional setup.

Specifying Config

-c /path/to/file.config or --config /path/to/file.config can be used to specify the config file to load. This will load default_settings.config if not specified.

Training

--train can be added to specify you want to enter training mode with an agent. This will load any configs necessary and can also be used with a checkpoint.

Testing

Want to test your agent? Easy. You can add --test /path/to/checkpoint.tar which will load an agent.
NOTE: This will also load the epsilon value associated with that checkpoint. This is to ensure that agents can still behave appropriately with some uncertainty in the environment.

Saving Checkpoints

This is used in conjunction with config stats. If it's not set in the config, then adding this argument won't do anything.
If it is set, specify save-checkpoint /path/to/folder which you would like to save checkpoints to
NOTE:The specified directory must not exist. It will be created.

Loading Checkpoints

There may be a need to load from a checkpoint if your computer crashes or you wish to continue training from a certain point in time. For this you can specify --load-checkpoint /path/to/checkpoint.tar.

Saving Stats

This is only needed if you have specified a frequency to save stats. You can give a new location with --save-stats /path/to/stats_to_create.csv.

Running Within Unity

There may be a time where you want to tweak behavior of the game without rebuilding. It's easy to do if you run the game within Unity. By specifying --run-in-unity you can do that. Have the Unity environment up and then you will be prompted to press play within Unity once Python has created the environment.

Changing Speed of the Environment

It's possible that you might want to run the game within Unity and slow it down. You can specify --time-scale <float> to change this. By default it's set to 1.0. This changes Unity's Time.timeScale, so use caution as setting it too high may have consequences on the physics engine.

Examples

grid_world.py --train will begin training with the default config file.
grid_world.py --test "C:\users\chris\documents\checkpoints\checkpoint_20000.tar" will test the agent loaded from that checkpoint with the default config file. grid_world.py --train -c "C:\users\chris\documents\custom_setting.config" --load-checkpoint "C:\users\chris\documents\checkpoints\checkpoint_20000.tar" will begin training from the specified checkpoint and using the custom config file.
grid_world.py --train -c "C:\users\chris\documents\custom_setting.config" --save-checkpoint "C:\users\chris\documents\new_checkpoints" --save-stats "C:\users\chris\documents\custom_setting.csv" will begin training with a custom config file, saving checkpoints to a folder which will be created, and saving stats under a file which will also be created.
grid_world.py --train --run-in-unity will begin training with all default parameters and will run within Unity.
grid_world.py --test "C:\users\chris\documents\checkpoints\checkpoint_20000.tar" --run-in-unity will begin testing an saved checkpoint within Unity.
grid_world.py --train --run-in-unity --time-scale 5 will begin training with all default parameters and run within Unity at 5x speed.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published