值函数相关 #28

StepNeverStop · 2020-12-28T08:27:08Z

通用的n-step值函数计算
TD($\lambda$)

以PPO为例，实现几种Trace计算方法：

Retrace
V-Trace

…rn`. (#28, #45) 1. implemented function `n_step_return` to calculating $G_{t}^{n}$ 2. implemented function `td_lambda_return` to calculating $TD(\lambda)$ 3. renamed `no_save` to `is_save` and changed related command 4. removed `--prefill-steps`, `--info`, and `--save-frequency` in command, users could specify those parameters in configuration files 5. updated README

1. fixed rnn hidden states iteration 2. renamed `n_time_step` to `chunk_length` 2. added `train_interval` to both sarl and marl off-policy agorithms so as to control the training frequency related to data collecting 3. added `n_step_value` to calculate n-step return 4. updated README

StepNeverStop self-assigned this Dec 28, 2020

StepNeverStop added the enhancement New feature or request label Dec 28, 2020

StepNeverStop changed the title ~~Traces~~ 值函数相关 Aug 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

值函数相关 #28

值函数相关 #28

StepNeverStop commented Dec 28, 2020 •

edited

Loading

值函数相关 #28

值函数相关 #28

Comments

StepNeverStop commented Dec 28, 2020 • edited Loading

StepNeverStop commented Dec 28, 2020 •

edited

Loading