`HalfCheetah-v4` Environment with `mlpack` TD3

In this project, we trained a reinforcement learning agent to solve the HalfCheetah-v4 environment using mlpack's Twin Delayed Deep Deterministic Policy Gradient (TD3) implementation. The goal was to train the agent to perform efficiently in a complex control task.

Environment

We used the OpenAI Gymnasium Toolkit's GUI interface for training and testing our agent. This interface is provided through a distributed infrastructure (TCP API), which enables us to interact with the environment. You can find more details about this infrastructure here.

Video

The full video output of the trained agent solving the environment can be found here

Code

The code used to train the agent can be found here.

Hyperparameters

Here are the hyperparameters used for training the TD3 agent:

Hyperparameter	Value
Training Steps	150,000
Step Size	7e-4
Target Network Sync Interval	1
Exploration Steps	10,000
Discount	0.99
Update Interval	1
Replay Buffer Size	1,000,000

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HalfCheetah_Writeup.md

HalfCheetah_Writeup.md

`HalfCheetah-v4` Environment with `mlpack` TD3

Environment

Video

Code

Hyperparameters

Files

HalfCheetah_Writeup.md

Latest commit

History

HalfCheetah_Writeup.md

File metadata and controls

HalfCheetah-v4 Environment with mlpack TD3

Environment

Video

Code

Hyperparameters

`HalfCheetah-v4` Environment with `mlpack` TD3