[Performance] Accelerate TD lambda return estimate #1148

Blonck · 2023-05-12T07:51:53Z

Description

The idea is to improve the performance of vec_td_lambda_return_estimate in the lambda/gamma are scalars (or tensors with a single unique value).
The vectorized version of TD lambda works by building a cumprod of the gamma decay

gamma_cumprod = [1, gamma, gamma ** 2, ...]

and applying conv1d to it which results in

[r00 + gamma r01 + gamma ** 2 r02 + ....,
 r01 + gamma r02 + gamma ** 2 r03 + ....,
 r0n
]

In principal, the same idea used in #1142 should be applicable.
Given consecutive trajectories in the form of:

reward = [r00, r01, r02, r03, r10, r11]
done = [False, False, False, True, False, False]

the traces are split into:

r_transformed = [
    [r00, r01, r02, r03],
    [r10, r11, 0, 0]
]

Apply the filter to r_transformed to calculate the TD lambda return.

Finally, vec_td_lambda_return_estimate should use this case if applicable.

if (isinstance(gamma, torch.Tensor) and gamma.numel() > 1):
   # use existing code
elif done.any():
   # use new version
else:
   # use existing code

The text was updated successfully, but these errors were encountered:

Blonck added the enhancement New feature or request label May 12, 2023

Blonck assigned vmoens May 12, 2023

Blonck mentioned this issue May 15, 2023

[Performance] Accelerate TD lambda return estimate #1158

Merged

2 tasks

vmoens closed this as completed in #1158 May 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] Accelerate TD lambda return estimate #1148

[Performance] Accelerate TD lambda return estimate #1148

Blonck commented May 12, 2023 •

edited

Loading

[Performance] Accelerate TD lambda return estimate #1148

[Performance] Accelerate TD lambda return estimate #1148

Comments

Blonck commented May 12, 2023 • edited Loading

Description

Blonck commented May 12, 2023 •

edited

Loading