Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Accelerate TD lambda return estimate #1148

Closed
Blonck opened this issue May 12, 2023 · 0 comments · Fixed by #1158
Closed

[Performance] Accelerate TD lambda return estimate #1148

Blonck opened this issue May 12, 2023 · 0 comments · Fixed by #1158
Assignees
Labels
enhancement New feature or request

Comments

@Blonck
Copy link
Contributor

Blonck commented May 12, 2023

Description

The idea is to improve the performance of vec_td_lambda_return_estimate in the lambda/gamma are scalars (or tensors with a single unique value).
The vectorized version of TD lambda works by building a cumprod of the gamma decay

gamma_cumprod = [1, gamma, gamma ** 2, ...]

and applying conv1d to it which results in

[r00 + gamma r01 + gamma ** 2 r02 + ....,
 r01 + gamma r02 + gamma ** 2 r03 + ....,
 r0n
]

In principal, the same idea used in #1142 should be applicable.
Given consecutive trajectories in the form of:

reward = [r00, r01, r02, r03, r10, r11]
done = [False, False, False, True, False, False]

the traces are split into:

r_transformed = [
    [r00, r01, r02, r03],
    [r10, r11, 0, 0]
]

Apply the filter to r_transformed to calculate the TD lambda return.

Finally, vec_td_lambda_return_estimate should use this case if applicable.

if (isinstance(gamma, torch.Tensor) and gamma.numel() > 1):
   # use existing code
elif done.any():
   # use new version
else:
   # use existing code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants