JAX TD3 prototype #225

joaogui1 · 2022-06-29T21:05:20Z

Description

Closes #218
Initiali implementation, needs testing

Types of changes

New feature
New algorithm

Checklist:

I've read the CONTRIBUTION guide (required).
I have ensured pre-commit run --all-files passes (required).
I have updated the documentation and previewed the changes via mkdocs serve.
I have updated the tests accordingly (if applicable).

If you are adding new algorithms or your change could result in performance difference, you may need to (re-)run tracked experiments. See #137 as an example PR.

vercel · 2022-06-29T21:05:25Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated
cleanrl	✅ Ready (Inspect)	Visit Preview	Jul 31, 2022 at 7:09PM (UTC)

vwxyzjn

Thanks for the PR. Looks great!

vwxyzjn · 2022-06-30T21:34:56Z

cleanrl/td3_continuous_action_jax.py

+            # TODO Maybe generate a lot of random keys right in the beginning
+            # also check https://jax.readthedocs.io/en/latest/jax.random.html
+            key, noise_key = jax.random.split(key, 2)
+            clipped_noise = jnp.clip((jax.random.normal(


clipped_noise is not actually used. Also, maybe generating it with numpy is a little bit faster? With jnp we would need to jit function probably. Would you mind doing a speed test like %timeit?

Added it and fixed clipping, will test what's the most efficient way to generate noise soonish

cleanrl/td3_continuous_action_jax.py

vwxyzjn · 2022-06-30T21:41:13Z

cleanrl/td3_continuous_action_jax.py

+        (qf1_loss_value, qf1_a_values), grads1 = jax.value_and_grad(mse_loss,
+                                                    has_aux=True)(qf1_state.params, qf1)
+        (qf2_loss_value, qf2_a_values), grads2 = jax.value_and_grad(mse_loss,
+                                                    has_aux=True)(qf2_state.params, qf2)
+        qf1_state = qf1_state.apply_gradients(grads=grads1)
+        qf2_state = qf2_state.apply_gradients(grads=grads2)


We are doing grad passes twice. Would it be faster to have them share the same optimizer as done in here?

cleanrl/cleanrl/ppo_continuous_action_envpool_jax.py

Lines 213 to 217 in 399f9a3

agent_params = AgentParams(

actor_params,

critic_params,

)

agent_optimizer_state = agent_optimizer.init(agent_params)

vwxyzjn

Hey, I added some changes. Overall it looks pretty good, but I don't know why the experiments do not work yet...

cleanrl/td3_continuous_action_jax.py

dosssman

Great addition.

Looking good on my side

vwxyzjn · 2022-07-27T22:09:02Z

@joaogui1 could you take a final look at https://cleanrl-git-fork-joaogui1-master-vwxyzjn.vercel.app/rl-algorithms/td3/#td3_continuous_action_jaxpy to see if there is anything missing?

joaogui1 · 2022-07-31T13:28:24Z

LGTM!

vwxyzjn

Thanks so much for this contribution!

initiali implementation, needs testing

d9588d9

vercel bot deployed to Preview June 29, 2022 21:05 View deployment

vwxyzjn changed the title ~~JAX TD3 prototypw~~ JAX TD3 prototype Jun 30, 2022

vwxyzjn reviewed Jun 30, 2022

View reviewed changes

use clipped noise, combine qf_nets

79eb3e8

vercel bot deployed to Preview July 12, 2022 11:03 View deployment

fully unify qf applies

2f72d29

vercel bot deployed to Preview July 12, 2022 11:04 View deployment

joaogui1 added 2 commits July 12, 2022 11:09

add correct clip limits

ed8df56

run pre-commit

928999e

vercel bot deployed to Preview July 12, 2022 13:24 View deployment

joaogui1 added 2 commits July 13, 2022 09:45

Fix bugs, now it runs

1f1a5b6

running tests

d1d1c5f

vercel bot deployed to Preview July 13, 2022 14:14 View deployment

Merge branch 'master' into master

80c0faa

vercel bot deployed to Preview July 13, 2022 14:20 View deployment

Minor edit

113dd68

vercel bot deployed to Preview July 21, 2022 02:45 View deployment

quick fix

2960f7a

vercel bot deployed to Preview July 21, 2022 02:50 View deployment

vwxyzjn reviewed Jul 21, 2022

View reviewed changes

cleanrl/td3_continuous_action_jax.py Outdated Show resolved Hide resolved

cleanrl/td3_continuous_action_jax.py Outdated Show resolved Hide resolved

joaogui1 added 2 commits July 21, 2022 12:12

tests

16ca6a3

Merge branch 'master' of github.com:joaogui1/cleanrl

83da544

vercel bot deployed to Preview July 21, 2022 12:13 View deployment

remove comment and move splitting to update critic

0e6a98b

vercel bot deployed to Preview July 21, 2022 13:00 View deployment

one fix and some debugging

79a9208

vercel bot deployed to Preview July 21, 2022 13:38 View deployment

remove debugs

e7e3bae

vercel bot deployed to Preview July 21, 2022 13:45 View deployment

updating second critic, good result

02b5128

vercel bot deployed to Preview July 21, 2022 16:15 View deployment

joaogui1 marked this pull request as ready for review July 21, 2022 16:15

precommit

b006adb

vercel bot deployed to Preview July 21, 2022 16:17 View deployment

vwxyzjn reviewed Jul 22, 2022

View reviewed changes

cleanrl/td3_continuous_action_jax.py Outdated Show resolved Hide resolved

vwxyzjn requested a review from dosssman July 22, 2022 16:07

dosssman approved these changes Jul 24, 2022

View reviewed changes

update docs

5e5647a

vercel bot deployed to Preview July 27, 2022 21:44 View deployment

typo

67ea6d3

vercel bot deployed to Preview July 27, 2022 21:46 View deployment

update docs

6154c5a

vercel bot deployed to Preview July 27, 2022 22:01 View deployment

update

3bf774b

vercel bot deployed to Preview July 27, 2022 22:10 View deployment

update test cases

3d8d28b

vercel bot deployed to Preview July 31, 2022 18:46 View deployment

add docs

b221bc6

vercel bot deployed to Preview July 31, 2022 19:09 View deployment

vwxyzjn approved these changes Jul 31, 2022

View reviewed changes

vwxyzjn merged commit 5bfdd45 into vwxyzjn:master Jul 31, 2022

vwxyzjn mentioned this pull request Aug 24, 2022

JAX + TD3 #219

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JAX TD3 prototype #225

JAX TD3 prototype #225

joaogui1 commented Jun 29, 2022 •

edited by vwxyzjn

Loading

vercel bot commented Jun 29, 2022 •

edited

Loading

vwxyzjn left a comment

vwxyzjn Jun 30, 2022

joaogui1 Jul 12, 2022

vwxyzjn Jun 30, 2022

vwxyzjn left a comment

dosssman left a comment

vwxyzjn commented Jul 27, 2022

joaogui1 commented Jul 31, 2022

vwxyzjn left a comment

	agent_params = AgentParams(
	actor_params,
	critic_params,
	)
	agent_optimizer_state = agent_optimizer.init(agent_params)

JAX TD3 prototype #225

JAX TD3 prototype #225

Conversation

joaogui1 commented Jun 29, 2022 • edited by vwxyzjn Loading

Description

Types of changes

Checklist:

vercel bot commented Jun 29, 2022 • edited Loading

vwxyzjn left a comment

Choose a reason for hiding this comment

vwxyzjn Jun 30, 2022

Choose a reason for hiding this comment

joaogui1 Jul 12, 2022

Choose a reason for hiding this comment

vwxyzjn Jun 30, 2022

Choose a reason for hiding this comment

vwxyzjn left a comment

Choose a reason for hiding this comment

dosssman left a comment

Choose a reason for hiding this comment

vwxyzjn commented Jul 27, 2022

joaogui1 commented Jul 31, 2022

vwxyzjn left a comment

Choose a reason for hiding this comment

joaogui1 commented Jun 29, 2022 •

edited by vwxyzjn

Loading

vercel bot commented Jun 29, 2022 •

edited

Loading