Skip to content

Commit

Permalink
fixed errors in tests
Browse files Browse the repository at this point in the history
  • Loading branch information
p-christ committed May 3, 2019
1 parent e03cf71 commit a1f9884
Show file tree
Hide file tree
Showing 86 changed files with 138 additions and 139 deletions.
16 changes: 8 additions & 8 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,15 @@ install:

script:
- export PYTHONPATH="$PYTHONPATH:$PWD"
- export PYTHONPATH=""$PYTHONPATH:$PWD/Agents""
- export PYTHONPATH=""$PYTHONPATH:$PWD/Agents/DQN_Agents""
- export PYTHONPATH=""$PYTHONPATH:$PWD/Agents/Hierarchical_Agents""
- export PYTHONPATH=""$PYTHONPATH:$PWD/Agents/Actor_Critic_Agents""
- export PYTHONPATH=""$PYTHONPATH:$PWD/Agents/Policy_Gradient_Agents""
- export PYTHONPATH=""$PYTHONPATH:$PWD/Utilities/Data_Structures""
- export PYTHONPATH=""$PYTHONPATH:$PWD/Environments""
- export PYTHONPATH=""$PYTHONPATH:$PWD/agents""
- export PYTHONPATH=""$PYTHONPATH:$PWD/agents/DQN_agents""
- export PYTHONPATH=""$PYTHONPATH:$PWD/agents/hierarchical_agents""
- export PYTHONPATH=""$PYTHONPATH:$PWD/agents/actor_critic_agents""
- export PYTHONPATH=""$PYTHONPATH:$PWD/agents/policy_gradient_agents""
- export PYTHONPATH=""$PYTHONPATH:$PWD/utilities/data_structures""
- export PYTHONPATH=""$PYTHONPATH:$PWD/environments""
- export PYTHONPATH=""$PYTHONPATH:$PWD/*""
- pytest Tests/*.py
- pytest tests/*.py



Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@



![RL](Utilities/RL_image.jpeg) ![PyTorch](Utilities/PyTorch-logo-2.jpg)
![RL](utilities/RL_image.jpeg) ![PyTorch](utilities/PyTorch-logo-2.jpg)

This repository contains PyTorch implementations of deep reinforcement learning algorithms and environments.

Expand Down Expand Up @@ -46,7 +46,7 @@ Below shows various RL algorithms successfully learning discrete action game [Ca
with 3 random seeds is shown with the shaded area representing plus and minus 1 standard deviation. Hyperparameters
used can be found in files `Results/Cart_Pole.py` and `Results/Mountain_Car.py`.

![Cart Pole and Mountain Car Results](Results/Data_and_Graphs/CartPole_and_MountainCar_Graph.png)
![Cart Pole and Mountain Car Results](data_and_graphs/CartPole_and_MountainCar_Graph.png)


#### 2. Hindsight Experience Replay (HER) Experiements
Expand All @@ -57,7 +57,7 @@ and [Multi-Goal Reinforcement Learning 2018](https://arxiv.org/abs/1802.09464).
the papers and show how adding HER can allow an agent to solve problems that it otherwise would not be able to solve at all. Note that the same hyperparameters were used within each pair of agents and so the only difference
between them was whether hindsight was used or not.

![HER Experiment Results](Results/Data_and_Graphs/HER_Experiments.png)
![HER Experiment Results](data_and_graphs/HER_Experiments.png)

#### 3. Hierarchical Reinforcement Learning Experiments

Expand All @@ -73,7 +73,7 @@ The results on the right show the performance of DDQN and algorithm Stochastic N
the implementation of SSN-HRL uses 2 DDQN algorithms within it. Note that the first 300 episodes of training
for SNN-HRL were used for pre-training which is why there is no reward for those episodes.

![Long Corridor and Four Rooms](Results/Data_and_Graphs/Four_Rooms_and_Long_Corridor.png)
![Long Corridor and Four Rooms](data_and_graphs/Four_Rooms_and_Long_Corridor.png)


### Usage ###
Expand Down
6 changes: 0 additions & 6 deletions Agents/Base_Agent.py → agents/Base_Agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -202,12 +202,6 @@ def enough_experiences_to_learn_from(self):
"""Boolean indicated whether there are enough experiences in the memory buffer to learn from"""
return len(self.memory) > self.hyperparameters["batch_size"]

def pick_and_conduct_action(self):
"""Picks and conducts an action"""
raise ValueError("CHANGE ME")
self.action = self.actor_pick_action()
self.conduct_action()

def save_experience(self, memory=None, experience=None):
"""Saves the recent experience to the memory buffer"""
if memory is None: memory = self.memory
Expand Down
2 changes: 1 addition & 1 deletion Agents/DQN_Agents/DDQN.py → agents/DQN_agents/DDQN.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from Agents.DQN_Agents.DQN_With_Fixed_Q_Targets import DQN_With_Fixed_Q_Targets
from agents.DQN_agents.DQN_With_Fixed_Q_Targets import DQN_With_Fixed_Q_Targets

class DDQN(DQN_With_Fixed_Q_Targets):
"""A double DQN agent"""
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import torch
import torch.nn.functional as F
from Agents.DQN_Agents.DDQN import DDQN
from Utilities.Data_Structures.Prioritised_Replay_Buffer import Prioritised_Replay_Buffer
from agents.DQN_agents.DDQN import DDQN
from utilities.data_structures.Prioritised_Replay_Buffer import Prioritised_Replay_Buffer

class DDQN_With_Prioritised_Experience_Replay(DDQN):
"""A DQN agent with prioritised experience replay"""
Expand Down
4 changes: 2 additions & 2 deletions Agents/DQN_Agents/DQN.py → agents/DQN_agents/DQN.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
import torch.optim as optim
import torch.nn.functional as F
import numpy as np
from Agents.Base_Agent import Base_Agent
from Utilities.Data_Structures.Replay_Buffer import Replay_Buffer
from agents.Base_Agent import Base_Agent
from utilities.data_structures.Replay_Buffer import Replay_Buffer

class DQN(Base_Agent):
"""A deep Q learning agent"""
Expand Down
4 changes: 2 additions & 2 deletions Agents/DQN_Agents/DQN_HER.py → agents/DQN_agents/DQN_HER.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from Agents.DQN_Agents.DQN import DQN
from Agents.HER_Base import HER_Base
from agents.DQN_agents.DQN import DQN
from agents.HER_Base import HER_Base

class DQN_HER(HER_Base, DQN):
"""DQN algorithm with hindsight experience replay"""
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import copy
from Agents.DQN_Agents.DQN import DQN
from agents.DQN_agents.DQN import DQN

class DQN_With_Fixed_Q_Targets(DQN):
"""A DQN agent that uses an older version of the q_network as the target network"""
Expand Down
6 changes: 3 additions & 3 deletions Agents/HER_Base.py → agents/HER_Base.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import torch
import numpy as np
from Utilities.Data_Structures.Replay_Buffer import Replay_Buffer
from Utilities.Utility_Functions import abstract
from utilities.data_structures.Replay_Buffer import Replay_Buffer
from utilities.Utility_Functions import abstract

@abstract
class HER_Base(object):
Expand Down Expand Up @@ -81,7 +81,7 @@ def save_alternative_experience(self):

def sample_from_HER_and_Ordinary_Buffer(self):
"""Samples from the ordinary replay buffer and HER replay buffer according to a proportion specified in config"""
states, actions, rewards, next_states, dones = self.memory.produce_action_and_action_info(self.ordinary_buffer_batch_size)
states, actions, rewards, next_states, dones = self.memory.sample(self.ordinary_buffer_batch_size)
HER_states, HER_actions, HER_rewards, HER_next_states, HER_dones = self.HER_memory.sample(self.HER_buffer_batch_size)

states = torch.cat((states, HER_states))
Expand Down
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
from torch.multiprocessing import Queue
from torch.optim import Adam
from Base_Agent import Base_Agent
from Utilities.Utility_Functions import create_actor_distribution, SharedAdam
from utilities.Utility_Functions import create_actor_distribution, SharedAdam

class A3C(Base_Agent):
"""Actor critic A3C algorithm from deepmind paper https://arxiv.org/pdf/1602.01783.pdf"""
Expand Down Expand Up @@ -42,11 +42,10 @@ def run_n_episodes(self):
results_queue, copy.deepcopy(self.actor_critic), gradient_updates_queue)
worker.start()
processes.append(worker)

self.print_results(episode_number, results_queue)
for worker in processes:
worker.join()
optimizer_worker.kill()
optimizer_worker.kill()

time_taken = time.time() - start
return self.game_full_episode_scores, self.rolling_results, time_taken
Expand Down Expand Up @@ -110,7 +109,7 @@ def set_seeds(self, worker_num):

def run(self):
"""Starts the worker"""
for _ in range(self.episodes_to_run):
for ep_ix in range(self.episodes_to_run):
with self.optimizer_lock:
Base_Agent.copy_model_over(self.shared_model, self.local_model)
epsilon_exploration = self.calculate_new_exploration()
Expand All @@ -131,6 +130,7 @@ def run(self):
self.episode_log_action_probabilities.append(action_log_prob)
self.critic_outputs.append(critic_outputs)
state = next_state

total_loss = self.calculate_total_loss()
self.put_gradients_in_queue(total_loss)
self.episode_number += 1
Expand Down Expand Up @@ -159,8 +159,8 @@ def pick_action_and_get_critic_values(self, policy, state, epsilon_exploration=N
actor_output = model_output[:, list(range(self.action_size))] #we only use first set of columns to decide action, last column is state-value
critic_output = model_output[:, -1]
action_distribution = create_actor_distribution(self.action_types, actor_output, self.action_size)
action = action_distribution.produce_action_and_action_info().cpu().numpy()
if self.action_types == "CONTINUOUS": action += self.noise.produce_action_and_action_info()
action = action_distribution.sample().cpu().numpy()
if self.action_types == "CONTINUOUS": action += self.noise.sample()
if self.action_types == "DISCRETE":
if random.random() <= epsilon_exploration:
action = random.randint(0, self.action_size - 1)
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from torch import optim
from Base_Agent import Base_Agent
from Replay_Buffer import Replay_Buffer
from Utilities.OU_Noise import OU_Noise
from utilities.OU_Noise import OU_Noise

class DDPG(Base_Agent):
"""A DDPG Agent"""
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from Agents.Actor_Critic_Agents.DDPG import DDPG
from Agents.HER_Base import HER_Base
from agents.actor_critic_agents.DDPG import DDPG
from agents.HER_Base import HER_Base

class DDPG_HER(HER_Base, DDPG):
"""DDPG algorithm with hindsight experience replay"""
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
import numpy as np
import torch
from gym import Wrapper, spaces
from Agents.Base_Agent import Base_Agent
from Agents.Policy_Gradient_Agents.PPO import PPO
from agents.Base_Agent import Base_Agent
from agents.policy_gradient_agents.PPO import PPO
from DDQN import DDQN


Expand Down Expand Up @@ -141,7 +141,7 @@ def step(self, action):
cumulative_reward = 0
for _ in range(self.timesteps_before_changing_skill):
with torch.no_grad():
skill_action = self.skills_agent.actor_pick_action(np.array([next_state[0], action]))
skill_action = self.skills_agent.pick_action(np.array([next_state[0], action]))
next_state, reward, done, _ = self.env.step(skill_action)
cumulative_reward += reward
if done: break
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
import torch
import numpy as np
from torch import optim
from Agents.Base_Agent import Base_Agent
from Utilities.Parallel_Experience_Generator import Parallel_Experience_Generator
from Utilities.Utility_Functions import normalise_rewards, create_actor_distribution
from agents.Base_Agent import Base_Agent
from utilities.Parallel_Experience_Generator import Parallel_Experience_Generator
from utilities.Utility_Functions import normalise_rewards, create_actor_distribution

class PPO(Base_Agent):
"""Proximal Policy Optimization agent"""
Expand Down Expand Up @@ -67,7 +67,6 @@ def calculate_all_discounted_returns(self):
def calculate_all_ratio_of_policy_probabilities(self):
"""For each action calculates the ratio of the probability that the new policy would have picked the action vs.
the probability the old policy would have picked it. This will then be used to inform the loss"""

all_states = [state for states in self.many_episode_states for state in states]
all_actions = [[action] if self.action_types == "DISCRETE" else action for actions in self.many_episode_actions for action in actions ]
all_states = torch.stack([torch.Tensor(states).float().to(self.device) for states in all_states])
Expand All @@ -83,11 +82,8 @@ def calculate_all_ratio_of_policy_probabilities(self):
def calculate_log_probability_of_actions(self, policy, states, actions):
"""Calculates the log probability of an action occuring given a policy and starting state"""
policy_output = policy.forward(states).to(self.device)
print("ACTION TYPES ", self.action_types)
policy_distribution = create_actor_distribution(self.action_types, policy_output, self.action_size)
actions_tensor = actions
print("actions tensor shape ", actions_tensor.shape)
policy_distribution_log_prob = policy_distribution.log_prob(actions_tensor)
policy_distribution_log_prob = policy_distribution.log_prob(actions)
return policy_distribution_log_prob

def calculate_loss(self, all_ratio_of_policy_probabilities, all_discounted_returns):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
import torch
import torch.optim as optim
from torch.distributions import Categorical
from Agents.Base_Agent import Base_Agent
from agents.Base_Agent import Base_Agent

class REINFORCE(Base_Agent):
agent_name = "REINFORCE"
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
class Bit_Flipping_Environment(gym.Env):
environment_name = "Bit Flipping Game"

def __init__(self, environment_dimension=20):
def __init__(self, environment_dimension=20, deterministic=False):

self.action_space = spaces.Discrete(environment_dimension)
self.observation_space = spaces.Dict(dict(
Expand All @@ -26,13 +26,19 @@ def __init__(self, environment_dimension=20):
self.reward_for_achieving_goal = self.environment_dimension
self.step_reward_for_not_achieving_goal = -1

self.deterministic = deterministic

def seed(self, seed=None):
self.np_random, seed = seeding.np_random(seed)
return [seed]

def reset(self):
self.desired_goal = self.randomly_pick_state_or_goal()
self.state = self.randomly_pick_state_or_goal()
if not self.deterministic:
self.desired_goal = self.randomly_pick_state_or_goal()
self.state = self.randomly_pick_state_or_goal()
else:
self.desired_goal = [0 for _ in range(self.environment_dimension)]
self.state = [1 for _ in range(self.environment_dimension)]
self.state.extend(self.desired_goal)
self.achieved_goal = self.state[:self.environment_dimension]
self.step_count = 0
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
8 changes: 4 additions & 4 deletions Results/Bit_Flipping.py → results/Bit_Flipping.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
from gym.wrappers import FlattenDictWrapper
from Agents.DQN_Agents.DQN_HER import DQN_HER
from agents.DQN_agents.DQN_HER import DQN_HER
from Bit_Flipping_Environment import Bit_Flipping_Environment
from Agents.Trainer import Trainer
from Utilities.Data_Structures.Config import Config
from Agents.DQN_Agents.DQN import DQN
from agents.Trainer import Trainer
from utilities.data_structures.Config import Config
from agents.DQN_agents.DQN import DQN

config = Config()
config.seed = 1
Expand Down
18 changes: 9 additions & 9 deletions Results/Cart_Pole.py → results/Cart_Pole.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
import gym

from A2C import A2C
from Agents.Actor_Critic_Agents.A3C import A3C
from Agents.Policy_Gradient_Agents.PPO import PPO
from Agents.Trainer import Trainer
from Utilities.Data_Structures.Config import Config
from Agents.DQN_Agents.DDQN import DDQN
from Agents.DQN_Agents.DDQN_With_Prioritised_Experience_Replay import DDQN_With_Prioritised_Experience_Replay
from Agents.DQN_Agents.DQN import DQN
from Agents.DQN_Agents.DQN_With_Fixed_Q_Targets import DQN_With_Fixed_Q_Targets
from agents.actor_critic_agents.A3C import A3C
from agents.policy_gradient_agents.PPO import PPO
from agents.Trainer import Trainer
from utilities.data_structures.Config import Config
from agents.DQN_agents.DDQN import DDQN
from agents.DQN_agents.DDQN_With_Prioritised_Experience_Replay import DDQN_With_Prioritised_Experience_Replay
from agents.DQN_agents.DQN import DQN
from agents.DQN_agents.DQN_With_Fixed_Q_Targets import DQN_With_Fixed_Q_Targets

config = Config()
config.seed = 1
Expand Down Expand Up @@ -88,7 +88,7 @@
}

if __name__ == "__main__":
AGENTS = [DQN, DQN_With_Fixed_Q_Targets, DDQN_With_Prioritised_Experience_Replay, DDQN, PPO, A2C, A3C]
AGENTS = [A2C, A3C, DQN, DQN_With_Fixed_Q_Targets, DDQN_With_Prioritised_Experience_Replay, DDQN, PPO, ]
trainer = Trainer(config, AGENTS)
trainer.run_games_for_agents()

Expand Down
8 changes: 4 additions & 4 deletions Results/Fetch_Reach.py → results/Fetch_Reach.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
import gym

from Actor_Critic_Agents.DDPG import DDPG
from Agents.Actor_Critic_Agents.DDPG_HER import DDPG_HER
from Data_Structures.Config import Config
from Agents.Trainer import Trainer
from actor_critic_agents.DDPG import DDPG
from agents.actor_critic_agents.DDPG_HER import DDPG_HER
from data_structures.Config import Config
from agents.Trainer import Trainer


config = Config()
Expand Down
12 changes: 6 additions & 6 deletions Results/Four_Rooms.py → results/Four_Rooms.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
from A3C import A3C
from Agents.DQN_Agents.DQN_HER import DQN_HER
from agents.DQN_agents.DQN_HER import DQN_HER
from DDQN import DDQN
from Environments.Four_Rooms_Environment import Four_Rooms_Environment
from Hierarchical_Agents.SNN_HRL import SNN_HRL
from Agents.Trainer import Trainer
from Utilities.Data_Structures.Config import Config
from Agents.DQN_Agents.DQN import DQN
from environments.Four_Rooms_Environment import Four_Rooms_Environment
from hierarchical_agents.SNN_HRL import SNN_HRL
from agents.Trainer import Trainer
from utilities.data_structures.Config import Config
from agents.DQN_agents.DQN import DQN

config = Config()
config.seed = 1
Expand Down
8 changes: 4 additions & 4 deletions Results/Hopper.py → results/Hopper.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
import gym
from Agents.Policy_Gradient_Agents.PPO import PPO
from Agents.Actor_Critic_Agents.DDPG import DDPG
from agents.policy_gradient_agents.PPO import PPO
from agents.actor_critic_agents.DDPG import DDPG
from SAC import SAC
from TD3 import TD3
from Agents.Trainer import Trainer
from Utilities.Data_Structures.Config import Config
from agents.Trainer import Trainer
from utilities.data_structures.Config import Config


config = Config()
Expand Down
Loading

0 comments on commit a1f9884

Please sign in to comment.