Contributing PPO + Transformer-XL #442

MarcoMeter · 2024-01-04T12:18:08Z

Hey @vwxyzjn
it's been quite a few extremely busy months, but now, I finally have the capacity to contribute a single file implementation of PPO with Transformer-XL as episodic memory. The implementation would be based on my repo. Concerning benchmark, I would like to use Memory Gym (Code, Paper).

If you are interested, I'll get started soon.

vwxyzjn · 2024-01-09T15:12:30Z

Hey @MarcoMeter, this is pretty cool! Sorry it took me a while to get back to you. Do you want to make it a bit more self-contained, like creating a cleanrl/ppo_trxl/ppo_trxl.py? You can create the dependencies in cleanrl/ppo_trxl/pyproject.toml cleanrl/ppo_trxl/poetry.lock.

The main thing we are looking for a succinct, understandable implementations, benchmarks, and docs(see how we documented https://docs.cleanrl.dev/rl-algorithms/dqn/#dqn_ataripy as an example).

MarcoMeter · 2024-01-22T14:14:24Z

Work is still in progress, so stay tuned ;)
https://github.com/MarcoMeter/episodic-transformer-memory-ppo/blob/cleanrl/train.py
I'll open a PR once ready.

MarcoMeter · 2024-04-02T14:51:40Z

@vwxyzjn
I finally resolved the issue. Only one linear layer was supposed to be between the CNN and the transformer blocks. Quiet surprising that this additional layer hampered performance a lot.

MarcoMeter closed this as completed Jan 22, 2024

vwxyzjn reopened this Aug 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributing PPO + Transformer-XL #442

Contributing PPO + Transformer-XL #442

MarcoMeter commented Jan 4, 2024 •

edited

Loading

vwxyzjn commented Jan 9, 2024

MarcoMeter commented Jan 22, 2024

MarcoMeter commented Apr 2, 2024

Contributing PPO + Transformer-XL #442

Contributing PPO + Transformer-XL #442

Comments

MarcoMeter commented Jan 4, 2024 • edited Loading

vwxyzjn commented Jan 9, 2024

MarcoMeter commented Jan 22, 2024

MarcoMeter commented Apr 2, 2024

MarcoMeter commented Jan 4, 2024 •

edited

Loading