Skip to content

Releases: kengz/openai_lab

Fix Numerical Errors; Improve PER

27 Apr 02:47
Compare
Choose a tag to compare

Improvements/Bug Fixes

Misc

PR #131

  • fix overflow error in np.exp of SoftmaxPolicy, BoltzmannPolicy by casting to float64 instead of float32
  • improve overall np.isfinite asserts
  • remove index after reset in *analysis.csv
  • remove unused specs
  • reorganize and expand test specs
  • guard continuous action value range in continuous policies
  • fix analytics param variable sourcing

DDPG

PR: #131

  • add EpsilonGreedyNoisePolicy

PER

PR: #131

  • add memory.update(errors) throughout all agents
  • add shape assert for Q values and errors throughout
  • auto max_mem_len as max_timestep * max_epis/3 if not specified
  • put the missing abs for init reward

ActorCritic, DDPG

19 Apr 04:12
Compare
Choose a tag to compare

New Algorithms

ActorCritic

PR: #118

  • add ActorCritic agent
  • add its policies, Discrete: ArgmaxPolicy, SoftmaxPolicy; Continuous: BoundedPolicy, GaussianPolicy
  • add basic specs, solve Cartpole-v0, Cartpole-v1, yet to solve the others

DDPG

PR: #118

  • add DDPG agent with custom tensorflow ops
  • add its policies (only Continuous now): NoNoisePolicy, LinearNoisePolicy, GaussianWhiteNoisePolicy, OUNoisePolicy
  • add basic specs, solve Pendulum-v0

Improvements/Bug Fixes

PR: #118

  • use logger.warn instead of raise error when component locks are violated
  • fix #114, #115 matplotlib backend setting issue. now single trial will live-plot and render
  • mute DoubleDQN as it breaks; instead revert to the single-model recompile from DQN

Component locks; virtualenv/conda installation support

15 Apr 21:53
Compare
Choose a tag to compare

Component Locks

PR: #120

We have a lot of components, and not all of them are compatible with another. When scheduling experiments and designing specs it is hard to keep all of them in check. This adds a component locks that does automatic checking of all specs when importing, by using the specified locks in rl/spec/component_locks.json. Uses the minimum description length design principle. When adding new components, be sure to update this file.

  • add double-network component lock
  • add discrete-action component lock; assume continuous agent can handle discrete action spaces as a generalization

Improved Installation

PR: #121
Solves: #113, #114, #115

  • fix broken gym installation. See gym PR 558
  • layout installation steps in doc, use binaries for server setup
  • introduce version lock for dependencies with requirements.txt, environment.yml
  • support installation by system python, virtualenv, conda, integrate into Grunt
  • add quickstart_dqn for example quickstart in doc

Bug Fixes

DoubleDQN

PR: #119

  • restore missing recompile_model call to the second model in DoubleDQN.

Fix Boltzmann, refactor RENDER

05 Apr 12:36
Compare
Choose a tag to compare

Bug Fixes

BoltzmannPolicy

PR: #109

  • fix state reshape with dimension > 1 using np.expand_dims
  • guard underflow by doing np.clip before np.exp

Misc

  • rename class from DoubleDQNPolicy to DoubleDQNEpsilonGreedyPolicy for clarity
  • refactor useless RENDER key from rl/spec/problems.json into rl/experiment.py

Fix PER breakage

04 Apr 12:41
Compare
Choose a tag to compare

Bug Fixes

PER

PR: #108

  • fix PER breakage on negative error = reward by adding a bump min_priority = abs(10 * SOLVED_MEAN_REWARD)
  • add a positive min_priority for all problems since they may have negative rewards. We cannot do error = abs(reward) because it is sign sensitive for priority calculation
  • add assert guard to ensure priority is not nan

First stable release

02 Apr 22:17
Compare
Choose a tag to compare

First stable release of OpenAI Lab

PR: #106

  • stable and generalized RL components design
  • implement discrete agents: DQN, double-DQN, SARSA, PER
  • run dozens of experiments as Lab tests. Solve numerous discrete environments on Fitness Matrix, with PR submissions. Mainly CartPole-v0, CartPole-v1, Acrobot-v1, LunarLander-v2
  • complete documentation page
  • complete analytics framework and generalized fitness_score as evaluation metrics
  • stable system design after many iterations
  • ready for more implementations and new research