Skip to content

v0.7.0

Compare
Choose a tag to compare
@muupan muupan released this 28 Jun 09:47
· 645 commits to master since this release
265b9bf

Important enhancements

Important bugfixes

  • The bug that some examples use the same random seed across envs for env.seed is fixed.
  • The bug that batch training with n-step return and/or recurrent models is not successful is fixed.
  • The bug that examples/ale/train_dqn_ale.py uses LinearDecayEpsilonGreedy even when NoisyNet is used is fixed.
  • The bug that examples/ale/train_dqn_ale.py does not use the value specified by --noisy-net-sigma is fixed.
  • The bug that chainerrl.links.to_factorized_noisy does not work correctly with chainerrl.links.Sequence is fixed.

Important destructive changes

  • chainerrl.experiments.train_agent_async now requires eval_n_steps (number of timesteps for each evaluation phase) and eval_n_episodes (number of episodes for each evaluation phase) to be explicitly specified, with one of them being None.
  • examples/ale/dqn_phi.py is removed.
  • chainerrl.initializers.LeCunNormal is removed. Use chainer.initializers.LeCunNormal instead.

All updates

Enhancement

  • Rainbow (#374)
  • Make copy_param support scalar parameters (#410)
  • Enables batch DDPG agents to be trained. (#416)
  • Enables asynchronous time-based evaluations of agents. (#420)
  • Removes obsolete dqn_phi file (#424)
  • Add Branched and use it to simplify train_ppo_batch_gym.py (#427)
  • Remove LeCunNormal since Chainer has it from v3 (#428)
  • Precompute log probability in PPO (#430)
  • Recurrent PPO with a stateless recurrent model interface (#431)
  • Replace Variable.data with Variable.array (again) (#434)
  • Make IQN work with tuple observations (#435)
  • Add VectorStackFrame to reduce memory usage in train_dqn_batch_ale.py (#443)
  • DDPG example that reproduces the TD3 paper (#452)
  • TD3 agent (#453)
  • update requirements.txt and setup.py for gym (#461)
  • Support gym>=0.12.2 by stopping to use underscore methods in gym wrappers (#462)
  • Add warning about numpy 1.16.0 (#476)

Documentation

  • Link to abstract pages on ArXiv (#409)
  • fixes typo (#412)
  • Fixes file path in grasping example README (#422)
  • Add links to references (#425)
  • Fixes minor grammar mistake in A3C ALE example (#432)
  • Add explanation of examples/atari (#437)
  • Link to chainer/chainer, not pfnet/chainer (#439)
  • Link to chainer/chainer(rl), not pfnet/chainer(rl) (#440)
  • fix & add docstring for FCStateQFunctionWithDiscreteAction (#441)
  • Fixes a typo in train_agent_batch Documentation. (#444)
  • Adds Rainbow to main README (#447)
  • Fixes Docstring in IQN (#451)
  • Improves Rainbow README (#458)
  • very small fix: add missing doc for eval_performance. (#459)
  • Adds IQN Results to readme (#469)
  • Adds IQN to the documentation. (#470)
  • Adds reference to mujoco folder in the examples README (#474)
  • Fixes incorrect comment. (#490)

Examples

  • Rainbow (#374)
  • Create an IQN example aimed at reproducing the original paper and its evaluation protocol. (#408)
  • Benchmarks DQN example (#414)
  • Enables batch DDPG agents to be trained. (#416)
  • Fixes scores for Demon Attack (#418)
  • Set observation_space of kuka env correctly (#421)
  • Fixes error in setting explorer in DQN ALE example. (#423)
  • Add Branched and use it to simplify train_ppo_batch_gym.py (#427)
  • A3C Example for reproducing paper results. (#433)
  • PPO example that reproduces the "Deep Reinforcement Learning that Matters" paper (#448)
  • DDPG example that reproduces the TD3 paper (#452)
  • TD3 agent (#453)
  • Apply noisy_net_sigma parameter (#465)

Testing

  • Use Python 3.6 in Travis CI (#411)
  • Increase tolerance of TestGaussianDistribution.test_entropy since sometimes it failed (#438)
  • make FrameStack follow original spaces (#445)
  • Split test_examples.sh (#472)
  • Fix Travis error (#492)
  • Use Python 3.6 for ipynb (#493)

Bugfixes

  • bugfix (#360, thanks @corochann!)
  • Fixes error in setting explorer in DQN ALE example. (#423)
  • Make sure the agent sees when episodes end (#429)
  • Pass env_id to replay buffer methods to correctly support batch training (#442)
  • Add VectorStackFrame to reduce memory usage in train_dqn_batch_ale.py (#443)
  • Fix a bug of unintentionally using same process indices (#455)
  • Make cv2 dependency optional (#456)
  • fix ScaledFloatFrame.observation_space (#460)
  • Apply noisy_net_sigma parameter (#465)
  • Match EpisodicReplayBuffer.sample with ReplayBuffer.sample (#485)
  • Make to_factorized_noisy work with sequential links (#489)