Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

值函数相关 #28

Open
2 of 4 tasks
StepNeverStop opened this issue Dec 28, 2020 · 0 comments
Open
2 of 4 tasks

值函数相关 #28

StepNeverStop opened this issue Dec 28, 2020 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@StepNeverStop
Copy link
Owner

StepNeverStop commented Dec 28, 2020

  • 通用的n-step值函数计算
  • TD($\lambda$)

以PPO为例,实现几种Trace计算方法:

  • Retrace
  • V-Trace
@StepNeverStop StepNeverStop self-assigned this Dec 28, 2020
@StepNeverStop StepNeverStop added the enhancement New feature or request label Dec 28, 2020
@StepNeverStop StepNeverStop changed the title Traces 值函数相关 Aug 27, 2021
StepNeverStop added a commit that referenced this issue Aug 30, 2021
…rn`. (#28, #45)

1. implemented function `n_step_return` to calculating $G_{t}^{n}$
2. implemented function `td_lambda_return` to calculating $TD(\lambda)$
3. renamed `no_save` to `is_save` and changed related command
4. removed `--prefill-steps`, `--info`, and `--save-frequency` in command, users could specify those parameters in configuration files
5. updated README
StepNeverStop added a commit that referenced this issue Aug 31, 2021
1. fixed rnn hidden states iteration
2. renamed `n_time_step` to `chunk_length`
2. added `train_interval` to both sarl and marl off-policy agorithms so as to control the training frequency related to data collecting
3. added `n_step_value` to calculate n-step return
4. updated README
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant