Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Policy util rework #314

Merged
merged 3 commits into from
Apr 29, 2019
Merged

Policy util rework #314

merged 3 commits into from
Apr 29, 2019

Conversation

kengz
Copy link
Owner

@kengz kengz commented Apr 29, 2019

Policy util rework for training speedup

This PR reworks the way pdparam and action_pd are computed to produce near 5x speedup (200 FPS to 1000 FPS for A2C). The key is in noticing that forward and backpropagation are the most costly operations.

  • enforce no_grad during agent action to speed up forward propagation
  • compute gradient in batch during training, and minimize forward passes to only single passes without wastage.
  • improve overall pdparam -> action_pd -> log_probs/entropy logic

Remove variable tracking

As a result from above, all the related variables no longer need to be tracked via the body. This allows us to significantly simplify the API.

  • remove body attributes: action_tensor, action_pd, entropies, log_probs, mean_log_prob
  • remove body methods: action_pd_update, epi_reset, flush, space_fix_stats
  • propagate their removals to everywhere in the code

Generalization from A2C

Following #310 , REINFORCE, PPO, SIL are updated and generalized to work on vector environments like A2C.

@kengz kengz merged commit 9ee3dc0 into v4-dev Apr 29, 2019
@kengz kengz deleted the policy-rework branch April 29, 2019 06:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant