Skip to content
This repository has been archived by the owner on Sep 4, 2024. It is now read-only.

Commit

Permalink
final commit of all changes and new code as well as agents
Browse files Browse the repository at this point in the history
  • Loading branch information
Robertboy18 committed Aug 31, 2022
1 parent fb6e987 commit c3baba0
Show file tree
Hide file tree
Showing 53 changed files with 11,704 additions and 2 deletions.
192 changes: 190 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,12 @@ MinAtar is a testbed for AI agents which implements miniaturized versions of sev
<img src="img/space_invaders.gif" width="200" />
</p>

## Quick Start
## Standard Quick Start
To use MinAtar, you need python3 installed, make sure pip is also up to date. To run the included `DQN` and `AC_lambda` examples, you need `PyTorch`. To install MinAtar, please follow the steps below:

1. Clone the repo:
```bash
git clone https://github.com/kenjyoung/MinAtar.git
git clone https://github.com/Robertboy18/MinAtar-Faster.git
```
If you prefer running MinAtar in a virtualenv, you can do the following before step 2:
```bash
Expand All @@ -29,6 +29,7 @@ pip install --upgrade pip
2. Install MinAtar:
```bash
pip install .
pip install -r requirements.txt
```
If you have any issues with automatic dependency installation, you can instead install the necessary dependencies manually and run
```bash
Expand All @@ -55,6 +56,193 @@ Use the arrow keys to move and space bar to fire. Also, press q to quit and r to

Also included in the examples directory are example implementations of DQN (dqn.py) and online actor-critic with eligibility traces (AC_lambda.py).

## Optimized Code with various Agents Usage

To run your first experiment:
```
python3 main.py --agent-json config/agent/SAC.json --env-json config/environment/AcrobotContinuous-v1.json --index 0
```

# Usage
The file main.py trains an agent for a specified number of runs, based on an environment and agent configuration file count in config/environment/ or config/agent/ respectively. The data is saved in the results directory, with a name similar to the environment and agent name.

For more information on how to use the main.py program, see the `--help` option:
```
Usage: main.py [OPTIONS]
Given agent and environment configuration files, run the experiment defined
by the configuration files
Options:
--env-json TEXT Path to the environment json configuration file
[required]
--agent-json TEXT Path to the agent json configuration file [required]
--index INTEGER The index of the hyperparameter to run
-m, --monitor Whether or not to render the scene as the agent trains.
-a, --after INTEGER How many timesteps (training) should pass before
rendering the scene
--save-dir TEXT Which directory to save the results file in
--help Show this message and exit.
```

Example:
```
./main.py --env-json config/environment/MountainCarContinuous-v0.json --agent-json config/agent/linearAC.json --index 0 --monitor --after 1000
```
will run the experiment using linear-Gaussian actor-critic on the mountain
car environment. The experiment is run on one process (serially), and the
scene is rendered after 1000 timesteps of training. We will only run the
hyperparameter setting with index 0.

# Hyperparameter settings
The hyperparameter settings are laid out in the agent configuration files.
The files are laid out such that each setting is a list of values, and the
total number of hyperparameter settings is the product of the lengths of each
of these lists. For example, if the agent config file looks like:
```
{
"agent_name": "linearAC",
"parameters":
{
"decay": [0.5],
"critic_lr": [0.005, 0.1, 0.3],
"actor_lr": [0.005, 0.1, 0.3],
"avg_reward_lr": [0.1, 0.3, 0.5, 0.9],
"scaled": [true],
"clip_stddev": [1000]
}
}
```
then, there are `1 x 3 x 3 x 4 x 1 x 1 = 36` different hyperparameter
settings. Each hyperparameter setting is given a specific index. For example
hyperparameter setting index `1` would have the following hyperparameters:
```
{
"agent_name": "linearAC",
"parameters":
{
"decay": 0.5,
"critic_lr": 0.005,
"actor_lr": 0.005,
"avg_reward_lr": 0.1,
"scaled": true,
"clip_stddev": 1000
}
}
```
The hyperparameter settings indices are actually implemented `mod x`,
where `x` is the maximum number of hyperparameter settings (in the example
about, `36`). So, in the example above, the hyperparameter settings with
indices `1, 37, 73, ...` all refer to the same hyperparameter settings since
`1 = 37 = 73 = ... mod 36`. The difference is that the consecutive indices
have a different seed. So, each time we run experiments with hyperparameter
setting `1`, it will have the same seed. If we run with hyperparameter setting
`37`, it will be the same hyperparameter settings as `1`, but with a different
seed, and this seed will be the same every time we run the experiment with
hyperparameter settings `37`. This is what Martha and her students
have done with their Actor-Expert implementation, and I find that it works
nicely for hyperparameter sweeps.


# Saved Data
Each experiment saves all the data as a Python dictionary. The dictionary is
designed so that we store all information about the experiment, including all
agent hyperparameters and environment settings so that the experiment is
exactly reproducible.

If the data dictionary is called `data`, then the main data for the experiment
is stored in `data["experiment_data"]`, which is a dictionary mapping from
hyperparameter settings indices to agent parameters and experiment runs.
`data["experiment_data"][i]["agent_params"]` is a dictionary storing the
agent's hyperparameters (hyperparameter settings index `i`) for the experiment.
`data["experiment_data"][i]["runs]` is a list storing the runs for the
`i-th` hyperparameter setting. Each element of the list is a dictionary, giving
all the information for that run and hyperparameter setting. For example,
`data["experiment_data"][i]["runs"][j]` will give all the information on
the `j-th` run of hyperparameter settings `i`.

Below is a tree diagram of the data structure:
```
data
├─── "experiment"
│ ├─── "environment": environment configuration file
│ └─── "agent": agent configuration file
└─── "experiment_data": dictionary of hyperparameter setting *index* to runs
   ├─── "agent_params": the hyperparameters settings
└─── "runs": a list containing all the runs for this hyperparameter setting (each run is a dictionary of elements)
└─── index i: information on the ith run
├─── "run_number": the run number
├─── "random_seed": the random seed used for the run
├─── "total_timesteps": the total number of timesteps in the run
├─── "eval_interval_timesteps": the interval of timesteps to pass before running offline evaluation
├─── "episodes_per_eval": the number of episodes run at each offline evaluation
├─── "eval_episode_rewards": list of the returns (np.array) from each evaluation episode if there are 10 episodes per eval,
│ then this will be a list of np.arrays where each np.array has 10 elements (one per eval episode)
├─── "eval_episode_steps": the number of timesteps per evaluation episode, with the same form as "eval_episode_rewards"
├─── "timesteps_at_eval": the number of training steps that passed at each evaluation. For example, if there were 10
│ offline evaluations, then this will be a list of 10 integers, each stating how many training steps passed before each
│ evaluation.
├─── "train_episode_rewards": the return seen for each training episode
├─── "train_episode_steps": the number of timesteps passed for each training episode
├─── "train_time": the total amount of training time in seconds
├─── "eval_time": the total amount of evaluation time in seconds
└─── "total_train_episodes": the total number of training episodes for the run
```

For example, here is `data["experiment_data"][i]["runs"][j]` for a mock run
of the Linear-Gaussian Actor-Critic agent on MountainCarContinuous-v0:
```
{'random_seed': 0,
'total_timesteps': 1000,
'eval_interval_timesteps': 500,
'episodes_per_eval': 10,
'eval_episode_rewards': array([[-200., -200., -200., -200., -200., -200., -200., -200., -200.,
-200.],
[-200., -200., -200., -200., -200., -200., -200., -200., -200.,
-200.]]),
'eval_episode_steps': array([[200, 200, 200, 200, 200, 200, 200, 200, 200, 200],
[200, 200, 200, 200, 200, 200, 200, 200, 200, 200]]),
'timesteps_at_eval': array([ 0, 600]),
'train_episode_steps': array([200, 200, 200, 200, 200]),
'train_episode_rewards': array([-200., -200., -200., -200., -200.]),
'train_time': 0.12098526954650879,
'eval_time': 0.044415950775146484,
'total_train_episodes': 5,
...}
```

# Configuration files
Each configuration file is a JSON file and has a few properties. There
are also templates in each configuration directory for the files.

## Environment Configuration File
```
{
"env_name": "environment filename without .json, all files refer to this as env_name",
"total_timesteps": "int - total timesteps for the entire run",
"steps_per_episode": "int - max number of steps per episode",
"eval_interval_timesteps": "int - interval of timesteps at which offline evaluation should be done",
"eval_episodes": "int - the number of offline episodes per evaluation",
"gamma": "float - the discount factor",
}
```

## Agent Configuration File
The agent configuration file is more general. The template is below. Since
both agents already have configuration files, there is not much need to add
any new configurations for agents. Instead, it would suffice to alter the
existing configuration files. The issue is that each agent has very different
configurations and hyperparameters, and so the config files are very different.
```
{
"agent_name": "filename without .json, all code refers to this as agent_name",
"parameters":
{
"parameter name": "list of values"
}
}
```

## OpenAI Gym Wrapper
MinAtar now includes an OpenAI Gym plugin using the Gym plugin system. If a sufficiently recent version of OpenAI gym (`pip install gym==0.21.0` works) is installed, this plugin should be automatically available after installing MinAtar as normal. A gym environment can then be constructed as follows:
```bash
Expand Down
83 changes: 83 additions & 0 deletions agent/Random.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
#!/usr/bin/env python3

# Adapted from https://github.com/pranz24/pytorch-soft-actor-critic

# Import modules
import torch
import numpy as np
from agent.baseAgent import BaseAgent


class Random(BaseAgent):
"""
Random implements a random policy.
"""
def __init__(self, action_space, seed):
super().__init__()
self.batch = False

self.action_dims = len(action_space.high)
self.action_low = action_space.low
self.action_high = action_space.high

# Set the seed for all random number generators, this includes
# everything used by PyTorch, including setting the initial weights
# of networks. PyTorch prefers seeds with many non-zero binary units
self.torch_rng = torch.manual_seed(seed)
self.rng = np.random.default_rng(seed)

self.policy = torch.distributions.Uniform(
torch.Tensor(action_space.low), torch.Tensor(action_space.high))

def sample_action(self, _):
"""
Samples an action from the agent
Parameters
----------
_ : np.array
The state feature vector
Returns
-------
array_like of float
The action to take
"""
action = self.policy.sample()

return action.detach().cpu().numpy()

def sample_action_(self, _, size):
"""
sample_action_ is like sample_action, except the rng for
action selection in the environment is not affected by running
this function.
"""
return self.rng.uniform(self.action_low, self.action_high,
size=(size, self.action_dims))

def update(self, _, _1, _2, _3, _4):
pass

def reset(self):
"""
Resets the agent between episodes
"""
pass

def eval(self):
pass

def train(self):
pass

# Save model parameters
def save_model(self, _, _1="", _2=None, _3=None):
pass

# Load model parameters
def load_model(self, _, _1):
pass

def get_parameters(self):
pass
Loading

0 comments on commit c3baba0

Please sign in to comment.