Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

通用的经验表示格式,结合经验池机制优化数据流 #31

Open
StepNeverStop opened this issue Jan 2, 2021 · 1 comment
Assignees
Labels
enhancement New feature or request optimization Better performance or solution

Comments

@StepNeverStop
Copy link
Owner

No description provided.

@StepNeverStop StepNeverStop self-assigned this Jan 2, 2021
@StepNeverStop StepNeverStop added the enhancement New feature or request label Jan 2, 2021
StepNeverStop added a commit that referenced this issue Jan 2, 2021
- rename `indexs.py` to `specs.py`
- add 3 namedtuple named `ModelObservations`, `Experience` and `SingleModelInformation`
StepNeverStop added a commit that referenced this issue Jan 3, 2021
- add `BatchExperiences`, `ModelObservations`, `NamedTupleStaticClass` in `specs.py`
- fix bugs of `dpg` and `ClippedNormalActionNoise`
- refactor ExperienceReplay
- remove redundant function
@StepNeverStop
Copy link
Owner Author

  • 适配gym
  • 测试LSTM是否正确开启训练
  • 修复On-policy算法

StepNeverStop added a commit that referenced this issue Jan 4, 2021
- rename `memories` to `BATCH` for better reading
- implement new type of `DataBuffer` to adapt namedtuple `BatchExperiences`
- fix PPO
StepNeverStop added a commit that referenced this issue Jan 4, 2021
- fix `pg`, `a2c`, `aoc`, `ppoc`, `trpo`
StepNeverStop added a commit that referenced this issue Jan 4, 2021
StepNeverStop added a commit that referenced this issue Jan 4, 2021
 #32)

- remove redundant calculation
- add length-equal check function for on-policy algorithm storing valid data
- optimize `trpo`
StepNeverStop added a commit that referenced this issue Jan 5, 2021
- support multi-vector and multi-visual input
- optimize `gym` and `unity` wrapper
- fix `ActorCriticValueCts`
- tag 2.0.0
- add `ObsSpec`
- refactor `SingleAgentEnvArgs` and `MultiAgentEnvArgs`
- remove `self.s_dim`, use `self.concat_vector_dim` instead
- stop using vector input normalization temporarily
@StepNeverStop StepNeverStop reopened this Jan 5, 2021
@StepNeverStop StepNeverStop added the optimization Better performance or solution label Jan 6, 2021
StepNeverStop added a commit that referenced this issue Jul 2, 2021
…training. (#41,#25,#31)

1. change variable name from "is_lg_batch_size" to "can_sample"
2. optimized unity wrapper
3. optimized multi-agents replay buffers
StepNeverStop added a commit that referenced this issue Jul 4, 2021
1. fixed n-step replay buffer
2. reconstruct representation net
3. remove 'use_stack'
4. implement multi-agent algorithms with shared parameters
5. optimized agent network
StepNeverStop added a commit that referenced this issue Jul 29, 2021
1. added `test.yaml` for quickly verify RLs
2. change folder name from `algos` to `algorithms` for better reading
3. removed single agent recoder, all algorithms(sarl&marl) using  `SimpleMovingAverageRecoder`
4. removed `GymVectorizedType` in `common/specs.py`
5. removed `common/train/*`, and implement unified training interface in `rls/train`
6. reconstructed `make_env` function in `rls/envs/make_env`
7. optimized function `load_config`
8. moved `off_policy_buffer.yaml` to `rls/configs/buffer`
9. removed configurations like `eval_while_train`, `add_noise2buffer` etc.
10. optimized environments' configuration files
11. optimized environment wrappers and implemented unified env interface for `gym` and `unity`, see `env_base.py`
12. updated dockerfiles
13. updated README
StepNeverStop added a commit that referenced this issue Sep 3, 2021
use `Once` to control buffer that only be builded one time.
StepNeverStop added a commit that referenced this issue Sep 4, 2021
*. redefine version to v0.0.1
1. removed package `supersuit`
2. implemented class `MPIEnv`
3. implemented class `VECEnv`
4. optimized env wrappers, implemented `render` method to `gyms` environment.
5. reconstructed some of returns of `env.step` from `obs` to `obs_fa` and `obs_fs`.
  - `obs_fa` is used to choose action based by agent/policy. For the cross point of episode i and i+1, `obs_fa` represents $observation_{i+1}^{0}$, otherwise it keeps same with `obs_fs` which represents $observation_{i}^{t}$.
  - `obs_fs` is used to be stored in buffer. For the cross point of episode i and i+1, `obs_fs` represents $observation_{i}^{T}$, otherwise it keeps same with `obs_fa`.
6. optimzed `rssm` related based on mentioned `obs_fs`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request optimization Better performance or solution
Projects
None yet
Development

No branches or pull requests

1 participant