Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Accelerate GAE #1142

Merged
merged 18 commits into from
May 10, 2023
Merged

[Performance] Accelerate GAE #1142

merged 18 commits into from
May 10, 2023

Conversation

Blonck
Copy link
Contributor

@Blonck Blonck commented May 9, 2023

Description

An optimized vecotrized version for the generalized advantage estimation is used in case gamma and lambda are scalars.

Motivation and Context

When handling consecutive trajectories of the form

reward = [r00, r01, r02, r03, r10, r11]
done = [False, False, False, True, False, False]

, vec_generalized_advantage_estimate
needs to build a giant gamma tensor of size [Batch, T, T] with a decayed gamma tensor that suits each trajectory. Thus it needs to allocate a big tensor [B, T, T] and do a heavy matrix multiplication. In case gamma and lambda are scalars, this can be optimized by building a single tensor of the form

r_transformed = [
    [r00, r01, r02, r03]
    [r10, r11, 0, 0]
]

and applying the gamma filter [r00 + gamma r01 + gamma ** 2 r02 + ..., ro1 + gamma r02 + gamma ** 2 r03 + ...,] to calculate the GAE.

close #1052

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

  • [ x] Bug fix (non-breaking change which fixes an issue)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

  • [x ] I have read the CONTRIBUTION guide (required)
  • My change requires a change to the documentation.
  • [ x] I have updated the tests accordingly (required for a bug fix or a new feature).
  • I have updated the documentation accordingly.

vmoens and others added 8 commits April 6, 2023 15:56
An optimized vecotrized version for the generalized advantage estimation is used in case
gamma and lambda are scalars.

When handling consecutive trajectories of the form
```
reward = [r00, r01, r02, r03, r10, r11]
done = [False, False, False, True, False, False]
```
, `vec_generalized_advantage_estimate`
needs to build a giant famma tensor of size [Batch, T, T] with a decayed gamma
tensor that suits each trajectory. Thus it needs to allocate a big tensor `[B,
T, T]` and do a heavy matrix multiplication.  In case gamma and lambda are
scalars, this can be optimized by building a single tensor of the form
```
r_transformed = [[r00, r01, r02, r03]
		 [r10, r11, 0, 0]]
```
and applying the gamma filter `[r00 + gamma r01 + gamma ** 2 r02 + ..., ro1 + gamma r02 + gamma ** 2 r03 + ...,]`
to calculate the GAE.
* move helper methods to util
* reuse existing helper methods
* remove wip file
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 9, 2023
@vmoens vmoens added the performance Performance issue or suggestion for improvement label May 9, 2023
@vmoens vmoens changed the title Accelerate GAE [Performance] Accelerate GAE May 9, 2023
Blonck added 7 commits May 9, 2023 13:50
In case gamma and lmbda are scalars, `fast_vec_gae` should be always faster than
`vec_generalized_advantage_estimate` if len(T) is large enough.
in case there is only one split, _inv_pad_sequence can skip its calculation.
gamma = torch.full(size, gamma)
lmbda = 0.95

benchmark(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using benchmark.pedantic to get some extra options?
I'm open to use plain benchmark if you think it's a better fit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess was to use the automatic calibration is better here, when I have no need of the fine-grained control of benchmark.pedantic. The pytest-benchmark docu says roughly: "don't use pedantic if you don't need it".

Comment on lines 149 to 153
if reward.ndim > 2:
done = done.transpose(-2, -1)
reward = reward.transpose(-2, -1)
state_value = state_value.transpose(-2, -1)
next_state_value = next_state_value.transpose(-2, -1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if the reward has 2 dimensions? Don't we want to swap them>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, that does not make sense. In particular, there is the following line in vec_generalized_advantage_estimate

*batch_size, time_steps, lastdim = not_done.shape

which ensures that reward and the other tensors have at least 3 dimensions, so the checks are never executed.

I will remove the checks, although I need to remember why I introduced them in the first place.

Copy link
Contributor

@vmoens vmoens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks so much for this contribution!

Copy link
Contributor

@vmoens vmoens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks so much for this contribution!

@vmoens vmoens merged commit 9402664 into pytorch:main May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. performance Performance issue or suggestion for improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants