Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check that the code implementation is accurate and reasonable #34

Open
2 of 10 tasks
StepNeverStop opened this issue Jan 6, 2021 · 2 comments
Open
2 of 10 tasks
Assignees
Labels
optimization Better performance or solution

Comments

@StepNeverStop
Copy link
Owner

StepNeverStop commented Jan 6, 2021

  • check and fix C51 [deaab73]
  • check qrdqn [deaab73]
  • check iqn
  • check and fix Rainbow
  • check on-policy buffer sampling
  • check function discounted_sum
  • check function calculate_td_error
  • checke whether works well when training with visual input
  • fix TRPO that step_size sometime be nan
  • check vdn and qmix
@StepNeverStop StepNeverStop self-assigned this Jan 6, 2021
@StepNeverStop StepNeverStop added the optimization Better performance or solution label Jan 6, 2021
StepNeverStop added a commit that referenced this issue Jan 7, 2021
StepNeverStop added a commit that referenced this issue Jul 4, 2021
1. fixed n-step replay buffer
2. reconstruct representation net
3. remove 'use_stack'
4. implement multi-agent algorithms with shared parameters
5. optimized agent network
@StepNeverStop
Copy link
Owner Author

StepNeverStop commented Jul 6, 2021

  • 检查将代码中关于运算维度的选择(dim/axis)把能设置为-1的都设置为-1。

StepNeverStop added a commit that referenced this issue Jul 26, 2021
1. fixed MLP
2. fixed gradient passing problem when sharing representation network between actor and critic, and so solved problem "one of the variables needed for gradient computation has been modified by an inplace operation"
StepNeverStop added a commit that referenced this issue Jul 28, 2021
1. fixed bugs in `iqn`, `c51`, `rainbow`, `dddqn`, `maxsqn`, `sql`, `bootstrappeddqn`, `averaged_dqn`
StepNeverStop added a commit that referenced this issue Jul 28, 2021
1. moved `logger2file` from agent class to main loop
2. updated folder `gym_env_list`
3. fixed bugs in `*.yaml`
4. added class property `n_copys` instead using `env._n_copys`
5. updated README
StepNeverStop added a commit that referenced this issue Jul 29, 2021
1. added `test.yaml` for quickly verify RLs
2. change folder name from `algos` to `algorithms` for better reading
3. removed single agent recoder, all algorithms(sarl&marl) using  `SimpleMovingAverageRecoder`
4. removed `GymVectorizedType` in `common/specs.py`
5. removed `common/train/*`, and implement unified training interface in `rls/train`
6. reconstructed `make_env` function in `rls/envs/make_env`
7. optimized function `load_config`
8. moved `off_policy_buffer.yaml` to `rls/configs/buffer`
9. removed configurations like `eval_while_train`, `add_noise2buffer` etc.
10. optimized environments' configuration files
11. optimized environment wrappers and implemented unified env interface for `gym` and `unity`, see `env_base.py`
12. updated dockerfiles
13. updated README
StepNeverStop added a commit that referenced this issue Jul 29, 2021
…ng. (#34, #25)

1. updated `setup.py`
2. removed redundant packages
3. fixed bugs in unity wrapper
4. fixed bugs in agent models that occurred in continuous-action training tasks
5. fixed bugs in class `MLP`
StepNeverStop added a commit that referenced this issue Jul 29, 2021
1. optimized `iTensor_oNumpy`
2. renamed `train_time_step` to `rnn_time_steps`, `burn_in_time_step` to `burn_in_time_steps`
3. optimized `on_policy_buffer.py`
4. optimized `EpisodeExperienceReplay`
5. fixed off-policy rnn training
6. optimized&fixed `to_numpy` and `to_tensor`
7. reimplemented `call` and invoking it in `__call__`
StepNeverStop added a commit that referenced this issue Jul 30, 2021
StepNeverStop added a commit that referenced this issue Jul 30, 2021
1. fixed bugs in maddpg and vdn
2. implemented `VDNMixer`
3. optimized parameters synchronizing function
StepNeverStop added a commit that referenced this issue Aug 26, 2021
StepNeverStop added a commit that referenced this issue Aug 27, 2021
StepNeverStop added a commit that referenced this issue Aug 28, 2021
1. updated README
2. added `__repe__` in class `TargetTwin`
3. fixed bugs in marl algorithms
StepNeverStop added a commit that referenced this issue Aug 28, 2021
1. added `_has_global_state` in pettingzoo env wrapper and marl policis
StepNeverStop added a commit that referenced this issue Aug 28, 2021
1. removed redundant function
2. optimized `q_target_func`
StepNeverStop added a commit that referenced this issue Aug 28, 2021
1. optimized `vdn`
StepNeverStop added a commit that referenced this issue Aug 29, 2021
1. updated README
2. optimized representation model
StepNeverStop added a commit that referenced this issue Aug 30, 2021
StepNeverStop added a commit that referenced this issue Aug 30, 2021
1. fixed bug in pettingzoo wrapper that not scale continuous actions from [-1, 1] to [low, hight]
2. fixed bugs in `sac`, `sac_v`, `tac`, `maxsqn`
3. implemented `masac`
4. fixed bugs in `squash_action`
5. implemented PER in marl
6. added several env configuration files when using platform pettingzoo
7. updated README
@StepNeverStop
Copy link
Owner Author

StepNeverStop commented Aug 31, 2021

  • 校正RNN隐状态在使用探索策略时的迭代更新 abf6b0a
  • 实现按策略与环境交互的间隔更新策略 abf6b0a

StepNeverStop added a commit that referenced this issue Aug 31, 2021
1. fixed rnn hidden states iteration
2. renamed `n_time_step` to `chunk_length`
2. added `train_interval` to both sarl and marl off-policy agorithms so as to control the training frequency related to data collecting
3. added `n_step_value` to calculate n-step return
4. updated README
@StepNeverStop StepNeverStop pinned this issue Aug 31, 2021
StepNeverStop added a commit that referenced this issue Sep 3, 2021
1. renamed `iTensor_oNumpy` to `iton`
2. optimized `auto_format.py`
3. added general params `oplr_params` to initializing optimizer
StepNeverStop added a commit that referenced this issue Sep 4, 2021
*. redefine version to v0.0.1
1. removed package `supersuit`
2. implemented class `MPIEnv`
3. implemented class `VECEnv`
4. optimized env wrappers, implemented `render` method to `gyms` environment.
5. reconstructed some of returns of `env.step` from `obs` to `obs_fa` and `obs_fs`.
  - `obs_fa` is used to choose action based by agent/policy. For the cross point of episode i and i+1, `obs_fa` represents $observation_{i+1}^{0}$, otherwise it keeps same with `obs_fs` which represents $observation_{i}^{t}$.
  - `obs_fs` is used to be stored in buffer. For the cross point of episode i and i+1, `obs_fs` represents $observation_{i}^{T}$, otherwise it keeps same with `obs_fa`.
6. optimzed `rssm` related based on mentioned `obs_fs`.
StepNeverStop added a commit that referenced this issue Sep 4, 2021
1. optimized on-policy algorithms
2. renamed `cell_state` to `rnncs`
3. renamed `next_cell_state` to `rnncs_`
4. fixed bugs when storing the first experience into replay buffer
5. optimized algorithm code format.
6. fixed bugs in `c51` and `qrdqn`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimization Better performance or solution
Projects
None yet
Development

No branches or pull requests

1 participant