AAMAS 2019 submission code

- Observation selection via random sampling for simplification - Flexible atari action space - Added function for centroid dumping - Added .bin experiment dumps - Added further commenting matching the paper's terminology
giuse · Mar 1, 2019 · 144e65a · 144e65a
1 parent 4032874
commit 144e65a
Show file tree

Hide file tree

Showing 9 changed files with 156 additions and 111 deletions.
diff --git a/.gitignore b/.gitignore
@@ -7,3 +7,4 @@
 sources.txt
 /stats/
 *.pyc
+/atari_*.bin
diff --git a/Gemfile b/Gemfile
@@ -4,11 +4,13 @@ source "https://rubygems.org"
 
 # git_source(:github) {|repo_name| "https://github.com/#{repo_name}" }
 
-gem 'machine_learning_workbench', '>=0.8'
+# Remember to switch the comment in the following lines before committing :P
+gem 'machine_learning_workbench', path: "/home/giuse/machine_learning_workbench/"
+# gem 'machine_learning_workbench', '>=0.8'
 
 gem 'pycall'
 gem 'parallel'
-gem 'rmagick' # well we're experimenting, we need to see what we're doing :)
+gem 'rmagick' # well we're experimenting, we need to know what we're doing :)
 
 gem 'pry-nav'
 gem 'pry-rescue'

diff --git a/Gemfile.lock b/Gemfile.lock
@@ -1,3 +1,11 @@
+PATH
+  remote: /home/giuse/machine_learning_workbench
+  specs:
+    machine_learning_workbench (0.8.0)
+      numo-linalg (~> 0.1)
+      numo-narray (~> 0.9)
+      parallel (~> 1.12)
+
 GEM
   remote: https://rubygems.org/
   specs:
@@ -6,10 +14,6 @@ GEM
     coderay (1.1.2)
     debug_inspector (0.0.3)
     interception (0.5)
-    machine_learning_workbench (0.8.0)
-      numo-linalg (~> 0.1)
-      numo-narray (~> 0.9)
-      parallel (~> 1.12)
     memory_profiler (0.9.10)
     method_source (0.8.2)
     numo-linalg (0.1.2)
@@ -36,7 +40,7 @@ PLATFORMS
   ruby
 
 DEPENDENCIES
-  machine_learning_workbench (>= 0.8)
+  machine_learning_workbench!
   memory_profiler
   parallel
   pry-nav

diff --git a/README.md b/README.md
@@ -1,10 +1,16 @@
 # Deep Neuroevolution experiments
 
-This project collects a set of neuroevolution experiments with/towards deep networks.
+This project collects a set of neuroevolution experiments with/towards deep networks for reinforcement learning control problems using an unsupervised learning feature exctactor.
+
+## *Playing Atari with Six Neurons*
+
+The experiments for this paper are based on [this code](https://github.com/giuse/DNE/releases/tag/aamas2019).  
+The algorithms themselves are coded in the [`machine_learning_workbench` library](https://github.com/giuse/machine_learning_workbench), specifically using [version 0.8.0](https://github.com/giuse/machine_learning_workbench/releases/tag/0.8.0).
+
 
 ## Installation
 
-First make sure the OpenAI Gym is pip-installed, [instructions here](https://github.com/openai/gym).  
+First make sure the OpenAI Gym is pip-installed on python3, [instructions here](https://github.com/openai/gym).  
 You will also need the [GVGAI_GYM](https://github.com/rubenrtorrado/GVGAI_GYM) to access GVGAI environments.
 
 Clone this repository, then execute:
@@ -29,7 +35,7 @@ Please feel free to contribute to this list (see `Contributing` above).
 
 - **UL-ELR** stands for Unsupervised Learning plus Evolutionary Reinforcement Learning, from the paper _"Intrinsically Motivated Neuroevolution for Vision-Based Reinforcement Learning" (ICDL2011)_. Check [here](https://exascale.info/members/giuseppe-cuccu/) for citation reference and pdf.
 - **BD-NES** stands for Block Diagonal Natural Evolution Strategy, from the homonymous paper _"Block Diagonal Natural Evolution Strategies" (PPSN2012)_. Check [here](https://exascale.info/members/giuseppe-cuccu/) for citation reference and pdf.
-- **RNES** stands for Radial Natural Evolution Strategy, from the paper _"Novelty-Based Restarts for Evolution Strategies" (CEC2011)_. Check [here](https://exascale.info/members/
+- **RNES** stands for Radial Natural Evolution Strategy, from the paper _"Novelty-Based Restarts for Evolution Strategies" (CEC2011)_. Check [here](https://exascale.info/members/giuseppe-cuccu/) for citation reference and pdf.
 - **Online VQ** stands for Online Vector Quantization, from the paper _"Intrinsically Motivated Neuroevolution for Vision-Based Reinforcement Learning" (ICDL2011)_. Check [here](https://exascale.info/members/giuseppe-cuccu/) for citation reference and pdf.
 - The **OpenAI Gym** is described [here](https://gym.openai.com/) and available on [this repo](https://github.com/openai/gym/)
 - **PyCall.rb** is available on [this repo](https://github.com/mrkn/pycall.rb/).
diff --git a/atari_ulerl_experiment.rb b/atari_ulerl_experiment.rb
@@ -28,7 +28,7 @@ def initialize config
       @preproc = compr_opts.delete :preproc
       @ntrials_per_ind = config[:run].delete :ntrials_per_ind
       @compr = ObservationCompressor.new **compr_opts
-      # overload ninputs for network
+      # default ninputs for network
       config[:net][:ninputs] ||= compr.code_size
       puts "Loading Atari OpenAI Gym environment" # if debug
       super config
@@ -78,34 +78,45 @@ def fitness_one genotype, env: single_env, render: false, nsteps: max_nsteps, ag
       # require 'pry'; binding.pry unless observation == env.reset_obs # => check passed, add to tests
       env.render if render
       tot_reward = 0
-      # set of observations with highest novelty, representative of the ability of the individual
-      # to obtain novel observations from the environment => hence reaching novel env states
-      represent_obs = []
+      # # set of observations with highest novelty, representative of the ability of the individual
+      # # to obtain novel observations from the environment => hence reaching novel env states
+      # represent_obs = []
+
+      puts "IGNORING `nobs_per_ind=#{nobs_per_ind}` (random sampling obs)" if nobs_per_ind
+      represent_obs = observation
+      nobs = 1
 
       puts "  Running (max_nsteps: #{max_nsteps})" if debug
       runtime = nsteps.times do |i|
         code = compr.encode observation
         # print code.to_a
         selected_action = action_for code
-        novelty = compr.novelty observation, code
         obs_lst, rew, done, info_lst = env.execute selected_action, skip_frames: skip_frames
         # puts "#{obs_lst}, #{rew}, #{done}, #{info_lst}" if debug
         observation = OBS_AGGR[aggr_type].call obs_lst
         tot_reward += rew
-
-        # The same observation represents the state both for action selection and for individual novelty
-        # OPT: most obs will most likely have lower novelty, so place it first
-        # TODO: I could add here a check if obs is already in represent_obs; in fact
-        #       though the probability is low (sequential markovian fully-observable env)
-        represent_obs.unshift [observation, novelty]
-        represent_obs.sort_by! &:last
-        represent_obs.shift if represent_obs.size > nobs_per_ind
+        ## NOTE: SWAP COMMENTS ON THE FOLLOWING to switch to novelty-based obs selection
+        # # The same observation represents the state both for action selection and for individual novelty
+        # # OPT: most obs will most likely have lower novelty, so place it first
+        # # TODO: I could add here a check if obs is already in represent_obs; in fact
+        # #       though the probability is low (sequential markovian fully-observable env)
+        # novelty = compr.novelty observation, code
+        # represent_obs.unshift [observation, novelty]
+        # represent_obs.sort_by! &:last
+        # represent_obs.shift if represent_obs.size > nobs_per_ind
+
+        # Random sampling for representative obs
+        nobs += 1
+        represent_obs = observation if rand < 1.0/nobs
+
+        # Image selection by random sampling
 
         env.render if render
         break i if done
       end
-      # compr.train_set << represent_obs.first
-      represent_obs.each { |obs, _nov| compr.train_set << obs }
+      compr.train_set << represent_obs
+      # for novelty:
+      # represent_obs.each { |obs, _nov| compr.train_set << obs }
       puts "=> Done! fitness: #{tot_reward}" if debug
       # print tot_reward, ' ' # if debug
       print "#{tot_reward}(#{runtime}) "
@@ -119,29 +130,26 @@ def fitness_one genotype, env: single_env, render: false, nsteps: max_nsteps, ag
     # @return [lambda] function that evaluates the fitness of a list of genotype
     # @note returned function has param genotypes [Array<gtype>] list of genotypes, return [Array<Numeric>] list of fitnesses for each genotype
     def gen_fit_fn type, ntrials: ntrials_per_ind
-      if type.nil? || type == :parallel
-        nprocs = Parallel.processor_count - 1 # it's actually faster this way
-        puts "Running in parallel on #{nprocs} processes"
-        -> (genotypes) do
-          print "Fits: "
-          fits, parall_infos = Parallel.map(0...genotypes.shape.first,
-              in_processes: nprocs, isolation: true) do |i|
-            # env = parall_envs[Parallel.worker_number]
-            env = parall_envs[i] # leveraging dynamic env allocation
-            # fit = fitness_one genotypes[i, true], env: env
-            fits = ntrials.times.map { fitness_one genotypes[i, true], env: env }
-            fit = fits.to_na.mean
-            print "[m#{fit}] "
-            [fit, compr.parall_info]
-          end.transpose
-          puts # newline here because I'm done `print`ing all ind fits
-          puts "Exporting training images"
-          parall_infos.each &compr.method(:add_from_parall_info)
-          puts "Training optimizer"
-          fits.to_na
-        end
-      else
-        super
+      return super unless type.nil? || type == :parallel
+      nprocs = Parallel.processor_count - 1 # it's actually faster this way
+      puts "Running in parallel on #{nprocs} processes"
+      -> (genotypes) do
+        print "Fits: "
+        fits, parall_infos = Parallel.map(0...genotypes.shape.first,
+            in_processes: nprocs, isolation: true) do |i|
+          # env = parall_envs[Parallel.worker_number]
+          env = parall_envs[i] # leveraging dynamic env allocation
+          # fit = fitness_one genotypes[i, true], env: env
+          fits = ntrials.times.map { fitness_one genotypes[i, true], env: env }
+          fit = fits.to_na.mean
+          print "[m#{fit}] "
+          [fit, compr.parall_info]
+        end.transpose
+        puts # newline here because I'm done `print`ing all ind fits
+        puts "Exporting training images"
+        parall_infos.each &compr.method(:add_from_parall_info)
+        puts "Training optimizer"
+        fits.to_na
       end
     end
 
@@ -156,6 +164,7 @@ def action_for code
       nans = output.isnan
       # this is a pretty reliable bug indicator
       raise "\n\n\tNaN network output!!\n\n" if nans.any?
+      # action = output[0...6].max_index # limit to 6 actions
       action = output.max_index
     end
 
@@ -168,7 +177,7 @@ def update_opt
       nw = diff * net.struct[1]
 
       new_mu_val = 0     # value for the new means (0)
-      new_var_val = 0.0001  # value for the new variances (diagonal of covariance) (1)
+      new_var_val = 0.0001  # value for the new variances (diagonal of covariance) (<<1)
       new_cov_val = 0    # value for the other covariances (outside diagonal) (0)
 
       old = case opt_type
@@ -203,7 +212,7 @@ def update_opt
       puts "  lrate: #{opt.lrate}"
 
       # FIXME: I need to run these before I can use automatic popsize again!
-      # update popsize in bdnes and its blocks
+      # => update popsize in bdnes and its blocks before using it again
       # if opt.kind_of? BDNES or something
       # opt.instance_variable_set :popsize, blocks.map(&:popsize).max
       # opt.blocks.each { |xnes| xnes.instance_variable_set :@popsize, opt.popsize }
@@ -218,7 +227,7 @@ def update_opt
     def run ngens: max_ngens
       @curr_ninputs = compr.code_size
       ngens.times do |i|
-        $ngen = i # allows for conditional debugger calls
+        $ngen = i # allows for conditional debugger calls `binding.pry if $ngen = n`
         puts Time.now
         puts "# Gen #{i+1}/#{ngens}"
         # it just makes more sense run first, even though at first gen the trainset is empty
@@ -228,14 +237,16 @@ def run ngens: max_ngens
         update_opt  # if I have more centroids, I should update opt
 
         opt.train
+        # Note: data analysis is done by extracting statistics from logs using regexes.
+        # Just `puts` anything you'd like to track, and save log to file
         puts "Best fit so far: #{opt.best.first} -- " \
              "Fit mean: #{opt.last_fits.mean} -- " \
              "Fit stddev: #{opt.last_fits.stddev}\n" \
              "Mu mean: #{opt.mu.mean} -- " \
              "Mu stddev: #{opt.mu.stddev} -- " \
              "Conv: #{opt.convergence}"
 
-        break if termination_criteria&.call(opt)
+        break if termination_criteria&.call(opt) # uhm currently unused
       end
     end
 
@@ -263,7 +274,8 @@ def load fname=Dir["dumps/atari_*.bin"].sort.last
       opt.instance_variable_set :@mu, hsh[:mu]
       opt.instance_variable_set :@sigma, hsh[:sigma]
       compr.instance_variable_set :@centrs, hsh[:centrs]
-      # what else needs to be done in order to be able to run `#show_ind`?
+      # Uhm haven't used that yet...
+      # what else needs to be done in order to be able to run `#show_ind` again?
       puts "Experiment data loaded from `#{fname}`"
       true
     end

diff --git a/atari_wrapper.rb b/atari_wrapper.rb
@@ -4,7 +4,8 @@ module DNE
   # Convenience wrapper for the Atari OpenAI Gym environments
   class AtariWrapper
 
-    attr_reader :gym_env, :reset_obs, :reset_obs_py, :act_type, :act_size, :obs_size, :skip_type, :downsample, :preproc, :row_div, :col_div
+    attr_reader :gym_env, :reset_obs, :reset_obs_py, :act_type, :act_size,
+      :obs_size, :skip_type, :downsample, :preproc, :row_div, :col_div
 
     extend Forwardable
     def_delegator :@gym_env, :render
@@ -16,20 +17,20 @@ def initialize gym_env, downsample: nil, skip_type: nil, preproc: nil
       @gym_env = gym_env
       @reset_obs = reset
       act_type, act_size = gym_env.action_space.to_s.match(/(.*)\((\d*)\)/).captures
-      raise "Not Atari act space" unless act_size == '6' && act_type == 'Discrete'
+      raise "Not Atari act space" unless act_size.to_i.between?(6,18) && act_type == 'Discrete'
       @act_type = act_type.downcase.to_sym
       @act_size = Integer(act_size)
       @obs_size = @reset_obs.size
     end
 
     # Converts pyimg into NArray, applying optional pre-processing and resampling
     def to_img pyimg
-      # subtract `reset_obs` to clear background
+      # subtract `reset_obs` to clear background => imgs as "what changes"
       pyimg -= reset_obs_py if preproc == :subtr_bg
       # resample to target resolution
       pyimg = pyimg[(0..-1).step(downsample[0]),
                     (0..-1).step(downsample[1])] if downsample
-      # average color channels, flatten, convert to NArray
+      # average color channels, flatten to 1d array, convert to NArray
       NImage[*pyimg.mean(2).ravel.tolist.to_a]
     end
 
@@ -45,13 +46,11 @@ def gym_to_rb gym_ans
       [to_img(obs), rew, done, info]
     end
 
-    SKIP_TYPE = GymExperiment::SKIP_TYPE
-
     # executes an action, then frameskips; returns aggregation
     def execute act, skip_frames: 0, type: skip_type
       act_ans = gym_env.step(act)
       skip_ans = skip_frames.times.map do
-        SKIP_TYPE[type].call(act, gym_env)
+        GymExperiment::SKIP_TYPE[type].call(act, gym_env)
       end
       all_ans = skip_ans.unshift(act_ans)
       obs_lst, rew_lst, done_lst, info_lst = all_ans.map(&method(:gym_to_rb)).transpose
-Original file line number
+Diff line change
@@ Expand Up / @@ -7,3 +7,4 @@ @@
     sources.txt
     /stats/
     *.pyc
+    /atari_*.bin