Releases: kengz/openai_lab
Releases · kengz/openai_lab
Fix Numerical Errors; Improve PER
Improvements/Bug Fixes
Misc
PR #131
- fix overflow error in
np.exp
ofSoftmaxPolicy
,BoltzmannPolicy
by casting tofloat64
instead offloat32
- improve overall
np.isfinite
asserts - remove index after reset in
*analysis.csv
- remove unused specs
- reorganize and expand test specs
- guard continuous action value range in continuous policies
- fix analytics param variable sourcing
DDPG
PR: #131
- add
EpsilonGreedyNoisePolicy
PER
PR: #131
- add
memory.update(errors)
throughout all agents - add shape assert for Q values and errors throughout
- auto
max_mem_len
asmax_timestep * max_epis/3
if not specified - put the missing
abs
for init reward
ActorCritic, DDPG
New Algorithms
ActorCritic
PR: #118
- add
ActorCritic
agent - add its policies, Discrete:
ArgmaxPolicy, SoftmaxPolicy
; Continuous:BoundedPolicy, GaussianPolicy
- add basic specs, solve
Cartpole-v0
,Cartpole-v1
, yet to solve the others
DDPG
PR: #118
- add
DDPG
agent with custom tensorflow ops - add its policies (only Continuous now):
NoNoisePolicy, LinearNoisePolicy, GaussianWhiteNoisePolicy, OUNoisePolicy
- add basic specs, solve
Pendulum-v0
Improvements/Bug Fixes
PR: #118
Component locks; virtualenv/conda installation support
Component Locks
PR: #120
We have a lot of components, and not all of them are compatible with another. When scheduling experiments and designing specs it is hard to keep all of them in check. This adds a component locks that does automatic checking of all specs when importing, by using the specified locks in rl/spec/component_locks.json
. Uses the minimum description length design principle. When adding new components, be sure to update this file.
- add double-network component lock
- add discrete-action component lock; assume continuous agent can handle discrete action spaces as a generalization
Improved Installation
PR: #121
Solves: #113, #114, #115
- fix broken gym installation. See gym PR 558
- layout installation steps in doc, use binaries for server setup
- introduce version lock for dependencies with
requirements.txt, environment.yml
- support installation by system
python, virtualenv, conda
, integrate into Grunt - add
quickstart_dqn
for example quickstart in doc
Bug Fixes
DoubleDQN
PR: #119
- restore missing
recompile_model
call to the second model in DoubleDQN.
Fix Boltzmann, refactor RENDER
Bug Fixes
BoltzmannPolicy
PR: #109
- fix state reshape with dimension
> 1
usingnp.expand_dims
- guard underflow by doing
np.clip
beforenp.exp
Misc
- rename class from
DoubleDQNPolicy
toDoubleDQNEpsilonGreedyPolicy
for clarity - refactor useless
RENDER
key fromrl/spec/problems.json
intorl/experiment.py
Fix PER breakage
Bug Fixes
PER
PR: #108
- fix PER breakage on negative
error = reward
by adding a bumpmin_priority = abs(10 * SOLVED_MEAN_REWARD)
- add a positive
min_priority
for all problems since they may have negative rewards. We cannot doerror = abs(reward)
because it is sign sensitive for priority calculation - add assert guard to ensure
priority
is notnan
First stable release
First stable release of OpenAI Lab
PR: #106
- stable and generalized RL components design
- implement discrete agents:
DQN, double-DQN, SARSA, PER
- run dozens of experiments as Lab tests. Solve numerous discrete environments on Fitness Matrix, with PR submissions. Mainly
CartPole-v0, CartPole-v1, Acrobot-v1, LunarLander-v2
- complete documentation page
- complete analytics framework and generalized
fitness_score
as evaluation metrics - stable system design after many iterations
- ready for more implementations and new research