You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
…rn`. (#28, #45)
1. implemented function `n_step_return` to calculating $G_{t}^{n}$
2. implemented function `td_lambda_return` to calculating $TD(\lambda)$
3. renamed `no_save` to `is_save` and changed related command
4. removed `--prefill-steps`, `--info`, and `--save-frequency` in command, users could specify those parameters in configuration files
5. updated README
1. fixed rnn hidden states iteration
2. renamed `n_time_step` to `chunk_length`
2. added `train_interval` to both sarl and marl off-policy agorithms so as to control the training frequency related to data collecting
3. added `n_step_value` to calculate n-step return
4. updated README
以PPO为例,实现几种Trace计算方法:
The text was updated successfully, but these errors were encountered: