Skip to content

Commit

Permalink
AAMAS 2019 submission code
Browse files Browse the repository at this point in the history
- Observation selection via random sampling for simplification
- Flexible atari action space
- Added function for centroid dumping
- Added .bin experiment dumps
- Added further commenting matching the paper's terminology
  • Loading branch information
giuse committed Mar 1, 2019
1 parent 4032874 commit 144e65a
Show file tree
Hide file tree
Showing 9 changed files with 156 additions and 111 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@
sources.txt
/stats/
*.pyc
/atari_*.bin
6 changes: 4 additions & 2 deletions Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,13 @@ source "https://rubygems.org"

# git_source(:github) {|repo_name| "https://github.com/#{repo_name}" }

gem 'machine_learning_workbench', '>=0.8'
# Remember to switch the comment in the following lines before committing :P
gem 'machine_learning_workbench', path: "/home/giuse/machine_learning_workbench/"
# gem 'machine_learning_workbench', '>=0.8'

gem 'pycall'
gem 'parallel'
gem 'rmagick' # well we're experimenting, we need to see what we're doing :)
gem 'rmagick' # well we're experimenting, we need to know what we're doing :)

gem 'pry-nav'
gem 'pry-rescue'
Expand Down
14 changes: 9 additions & 5 deletions Gemfile.lock
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
PATH
remote: /home/giuse/machine_learning_workbench
specs:
machine_learning_workbench (0.8.0)
numo-linalg (~> 0.1)
numo-narray (~> 0.9)
parallel (~> 1.12)

GEM
remote: https://rubygems.org/
specs:
Expand All @@ -6,10 +14,6 @@ GEM
coderay (1.1.2)
debug_inspector (0.0.3)
interception (0.5)
machine_learning_workbench (0.8.0)
numo-linalg (~> 0.1)
numo-narray (~> 0.9)
parallel (~> 1.12)
memory_profiler (0.9.10)
method_source (0.8.2)
numo-linalg (0.1.2)
Expand All @@ -36,7 +40,7 @@ PLATFORMS
ruby

DEPENDENCIES
machine_learning_workbench (>= 0.8)
machine_learning_workbench!
memory_profiler
parallel
pry-nav
Expand Down
12 changes: 9 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,16 @@
# Deep Neuroevolution experiments

This project collects a set of neuroevolution experiments with/towards deep networks.
This project collects a set of neuroevolution experiments with/towards deep networks for reinforcement learning control problems using an unsupervised learning feature exctactor.

## *Playing Atari with Six Neurons*

The experiments for this paper are based on [this code](https://github.com/giuse/DNE/releases/tag/aamas2019).
The algorithms themselves are coded in the [`machine_learning_workbench` library](https://github.com/giuse/machine_learning_workbench), specifically using [version 0.8.0](https://github.com/giuse/machine_learning_workbench/releases/tag/0.8.0).


## Installation

First make sure the OpenAI Gym is pip-installed, [instructions here](https://github.com/openai/gym).
First make sure the OpenAI Gym is pip-installed on python3, [instructions here](https://github.com/openai/gym).
You will also need the [GVGAI_GYM](https://github.com/rubenrtorrado/GVGAI_GYM) to access GVGAI environments.

Clone this repository, then execute:
Expand All @@ -29,7 +35,7 @@ Please feel free to contribute to this list (see `Contributing` above).

- **UL-ELR** stands for Unsupervised Learning plus Evolutionary Reinforcement Learning, from the paper _"Intrinsically Motivated Neuroevolution for Vision-Based Reinforcement Learning" (ICDL2011)_. Check [here](https://exascale.info/members/giuseppe-cuccu/) for citation reference and pdf.
- **BD-NES** stands for Block Diagonal Natural Evolution Strategy, from the homonymous paper _"Block Diagonal Natural Evolution Strategies" (PPSN2012)_. Check [here](https://exascale.info/members/giuseppe-cuccu/) for citation reference and pdf.
- **RNES** stands for Radial Natural Evolution Strategy, from the paper _"Novelty-Based Restarts for Evolution Strategies" (CEC2011)_. Check [here](https://exascale.info/members/
- **RNES** stands for Radial Natural Evolution Strategy, from the paper _"Novelty-Based Restarts for Evolution Strategies" (CEC2011)_. Check [here](https://exascale.info/members/giuseppe-cuccu/) for citation reference and pdf.
- **Online VQ** stands for Online Vector Quantization, from the paper _"Intrinsically Motivated Neuroevolution for Vision-Based Reinforcement Learning" (ICDL2011)_. Check [here](https://exascale.info/members/giuseppe-cuccu/) for citation reference and pdf.
- The **OpenAI Gym** is described [here](https://gym.openai.com/) and available on [this repo](https://github.com/openai/gym/)
- **PyCall.rb** is available on [this repo](https://github.com/mrkn/pycall.rb/).
98 changes: 55 additions & 43 deletions atari_ulerl_experiment.rb
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ def initialize config
@preproc = compr_opts.delete :preproc
@ntrials_per_ind = config[:run].delete :ntrials_per_ind
@compr = ObservationCompressor.new **compr_opts
# overload ninputs for network
# default ninputs for network
config[:net][:ninputs] ||= compr.code_size
puts "Loading Atari OpenAI Gym environment" # if debug
super config
Expand Down Expand Up @@ -78,34 +78,45 @@ def fitness_one genotype, env: single_env, render: false, nsteps: max_nsteps, ag
# require 'pry'; binding.pry unless observation == env.reset_obs # => check passed, add to tests
env.render if render
tot_reward = 0
# set of observations with highest novelty, representative of the ability of the individual
# to obtain novel observations from the environment => hence reaching novel env states
represent_obs = []
# # set of observations with highest novelty, representative of the ability of the individual
# # to obtain novel observations from the environment => hence reaching novel env states
# represent_obs = []

puts "IGNORING `nobs_per_ind=#{nobs_per_ind}` (random sampling obs)" if nobs_per_ind
represent_obs = observation
nobs = 1

puts " Running (max_nsteps: #{max_nsteps})" if debug
runtime = nsteps.times do |i|
code = compr.encode observation
# print code.to_a
selected_action = action_for code
novelty = compr.novelty observation, code
obs_lst, rew, done, info_lst = env.execute selected_action, skip_frames: skip_frames
# puts "#{obs_lst}, #{rew}, #{done}, #{info_lst}" if debug
observation = OBS_AGGR[aggr_type].call obs_lst
tot_reward += rew

# The same observation represents the state both for action selection and for individual novelty
# OPT: most obs will most likely have lower novelty, so place it first
# TODO: I could add here a check if obs is already in represent_obs; in fact
# though the probability is low (sequential markovian fully-observable env)
represent_obs.unshift [observation, novelty]
represent_obs.sort_by! &:last
represent_obs.shift if represent_obs.size > nobs_per_ind
## NOTE: SWAP COMMENTS ON THE FOLLOWING to switch to novelty-based obs selection
# # The same observation represents the state both for action selection and for individual novelty
# # OPT: most obs will most likely have lower novelty, so place it first
# # TODO: I could add here a check if obs is already in represent_obs; in fact
# # though the probability is low (sequential markovian fully-observable env)
# novelty = compr.novelty observation, code
# represent_obs.unshift [observation, novelty]
# represent_obs.sort_by! &:last
# represent_obs.shift if represent_obs.size > nobs_per_ind

# Random sampling for representative obs
nobs += 1
represent_obs = observation if rand < 1.0/nobs

# Image selection by random sampling

env.render if render
break i if done
end
# compr.train_set << represent_obs.first
represent_obs.each { |obs, _nov| compr.train_set << obs }
compr.train_set << represent_obs
# for novelty:
# represent_obs.each { |obs, _nov| compr.train_set << obs }
puts "=> Done! fitness: #{tot_reward}" if debug
# print tot_reward, ' ' # if debug
print "#{tot_reward}(#{runtime}) "
Expand All @@ -119,29 +130,26 @@ def fitness_one genotype, env: single_env, render: false, nsteps: max_nsteps, ag
# @return [lambda] function that evaluates the fitness of a list of genotype
# @note returned function has param genotypes [Array<gtype>] list of genotypes, return [Array<Numeric>] list of fitnesses for each genotype
def gen_fit_fn type, ntrials: ntrials_per_ind
if type.nil? || type == :parallel
nprocs = Parallel.processor_count - 1 # it's actually faster this way
puts "Running in parallel on #{nprocs} processes"
-> (genotypes) do
print "Fits: "
fits, parall_infos = Parallel.map(0...genotypes.shape.first,
in_processes: nprocs, isolation: true) do |i|
# env = parall_envs[Parallel.worker_number]
env = parall_envs[i] # leveraging dynamic env allocation
# fit = fitness_one genotypes[i, true], env: env
fits = ntrials.times.map { fitness_one genotypes[i, true], env: env }
fit = fits.to_na.mean
print "[m#{fit}] "
[fit, compr.parall_info]
end.transpose
puts # newline here because I'm done `print`ing all ind fits
puts "Exporting training images"
parall_infos.each &compr.method(:add_from_parall_info)
puts "Training optimizer"
fits.to_na
end
else
super
return super unless type.nil? || type == :parallel
nprocs = Parallel.processor_count - 1 # it's actually faster this way
puts "Running in parallel on #{nprocs} processes"
-> (genotypes) do
print "Fits: "
fits, parall_infos = Parallel.map(0...genotypes.shape.first,
in_processes: nprocs, isolation: true) do |i|
# env = parall_envs[Parallel.worker_number]
env = parall_envs[i] # leveraging dynamic env allocation
# fit = fitness_one genotypes[i, true], env: env
fits = ntrials.times.map { fitness_one genotypes[i, true], env: env }
fit = fits.to_na.mean
print "[m#{fit}] "
[fit, compr.parall_info]
end.transpose
puts # newline here because I'm done `print`ing all ind fits
puts "Exporting training images"
parall_infos.each &compr.method(:add_from_parall_info)
puts "Training optimizer"
fits.to_na
end
end

Expand All @@ -156,6 +164,7 @@ def action_for code
nans = output.isnan
# this is a pretty reliable bug indicator
raise "\n\n\tNaN network output!!\n\n" if nans.any?
# action = output[0...6].max_index # limit to 6 actions
action = output.max_index
end

Expand All @@ -168,7 +177,7 @@ def update_opt
nw = diff * net.struct[1]

new_mu_val = 0 # value for the new means (0)
new_var_val = 0.0001 # value for the new variances (diagonal of covariance) (1)
new_var_val = 0.0001 # value for the new variances (diagonal of covariance) (<<1)
new_cov_val = 0 # value for the other covariances (outside diagonal) (0)

old = case opt_type
Expand Down Expand Up @@ -203,7 +212,7 @@ def update_opt
puts " lrate: #{opt.lrate}"

# FIXME: I need to run these before I can use automatic popsize again!
# update popsize in bdnes and its blocks
# => update popsize in bdnes and its blocks before using it again
# if opt.kind_of? BDNES or something
# opt.instance_variable_set :popsize, blocks.map(&:popsize).max
# opt.blocks.each { |xnes| xnes.instance_variable_set :@popsize, opt.popsize }
Expand All @@ -218,7 +227,7 @@ def update_opt
def run ngens: max_ngens
@curr_ninputs = compr.code_size
ngens.times do |i|
$ngen = i # allows for conditional debugger calls
$ngen = i # allows for conditional debugger calls `binding.pry if $ngen = n`
puts Time.now
puts "# Gen #{i+1}/#{ngens}"
# it just makes more sense run first, even though at first gen the trainset is empty
Expand All @@ -228,14 +237,16 @@ def run ngens: max_ngens
update_opt # if I have more centroids, I should update opt

opt.train
# Note: data analysis is done by extracting statistics from logs using regexes.
# Just `puts` anything you'd like to track, and save log to file
puts "Best fit so far: #{opt.best.first} -- " \
"Fit mean: #{opt.last_fits.mean} -- " \
"Fit stddev: #{opt.last_fits.stddev}\n" \
"Mu mean: #{opt.mu.mean} -- " \
"Mu stddev: #{opt.mu.stddev} -- " \
"Conv: #{opt.convergence}"

break if termination_criteria&.call(opt)
break if termination_criteria&.call(opt) # uhm currently unused
end
end

Expand Down Expand Up @@ -263,7 +274,8 @@ def load fname=Dir["dumps/atari_*.bin"].sort.last
opt.instance_variable_set :@mu, hsh[:mu]
opt.instance_variable_set :@sigma, hsh[:sigma]
compr.instance_variable_set :@centrs, hsh[:centrs]
# what else needs to be done in order to be able to run `#show_ind`?
# Uhm haven't used that yet...
# what else needs to be done in order to be able to run `#show_ind` again?
puts "Experiment data loaded from `#{fname}`"
true
end
Expand Down
13 changes: 6 additions & 7 deletions atari_wrapper.rb
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@ module DNE
# Convenience wrapper for the Atari OpenAI Gym environments
class AtariWrapper

attr_reader :gym_env, :reset_obs, :reset_obs_py, :act_type, :act_size, :obs_size, :skip_type, :downsample, :preproc, :row_div, :col_div
attr_reader :gym_env, :reset_obs, :reset_obs_py, :act_type, :act_size,
:obs_size, :skip_type, :downsample, :preproc, :row_div, :col_div

extend Forwardable
def_delegator :@gym_env, :render
Expand All @@ -16,20 +17,20 @@ def initialize gym_env, downsample: nil, skip_type: nil, preproc: nil
@gym_env = gym_env
@reset_obs = reset
act_type, act_size = gym_env.action_space.to_s.match(/(.*)\((\d*)\)/).captures
raise "Not Atari act space" unless act_size == '6' && act_type == 'Discrete'
raise "Not Atari act space" unless act_size.to_i.between?(6,18) && act_type == 'Discrete'
@act_type = act_type.downcase.to_sym
@act_size = Integer(act_size)
@obs_size = @reset_obs.size
end

# Converts pyimg into NArray, applying optional pre-processing and resampling
def to_img pyimg
# subtract `reset_obs` to clear background
# subtract `reset_obs` to clear background => imgs as "what changes"
pyimg -= reset_obs_py if preproc == :subtr_bg
# resample to target resolution
pyimg = pyimg[(0..-1).step(downsample[0]),
(0..-1).step(downsample[1])] if downsample
# average color channels, flatten, convert to NArray
# average color channels, flatten to 1d array, convert to NArray
NImage[*pyimg.mean(2).ravel.tolist.to_a]
end

Expand All @@ -45,13 +46,11 @@ def gym_to_rb gym_ans
[to_img(obs), rew, done, info]
end

SKIP_TYPE = GymExperiment::SKIP_TYPE

# executes an action, then frameskips; returns aggregation
def execute act, skip_frames: 0, type: skip_type
act_ans = gym_env.step(act)
skip_ans = skip_frames.times.map do
SKIP_TYPE[type].call(act, gym_env)
GymExperiment::SKIP_TYPE[type].call(act, gym_env)
end
all_ans = skip_ans.unshift(act_ans)
obs_lst, rew_lst, done_lst, info_lst = all_ans.map(&method(:gym_to_rb)).transpose
Expand Down
Loading

0 comments on commit 144e65a

Please sign in to comment.