π Welcome to the world of Reinforcement Learning (RL) in space! This repository explores the exciting world of RL through the Lunar Lander problem. Whether you're a beginner to RL or a seasoned expert, this is your chance to see how different algorithms tackle the challenge of landing a spaceship π safely on a designated pad. Explore DQN, D3QN, and Adaptive Gamma D3QN with real performance visualizations and results. Fasten your seatbelts and prepare for takeoff! π―
- 1 - DQN (Deep Q-Networks)
- 2 - D3QN (Dueling Double DQN)
- 3 - Adaptive Gamma D3QN
- Results and Visualizations
- Future Work
The Deep Q-Network (DQN) algorithm is one of the most fundamental techniques in RL, combining Q-learning with neural networks. In this Lunar Lander problem, the DQN agent learns how to land a spaceship by estimating the Q-values for each action it can take in every state.
- π§ Q-Learning with Neural Networks: Q-values are approximated using neural networks.
- πΎ Replay Memory: Experiences are stored and used for training the agent in mini-batches.
- π² Epsilon-Greedy Exploration: The agent strikes a balance between exploring new actions and exploiting known actions.
Dueling Double Deep Q-Network (D3QN) extends the classic DQN approach by combining two powerful techniques: Double Q-Learning to reduce overestimation bias and Dueling Network Architecture to separately estimate the state-value and action-advantage. Together, they make learning faster and more stable.
- π Double Q-Learning: Reduces overestimation of Q-values by decoupling action selection from action evaluation.
- π Dueling Architecture: Separates the estimation of state-value and action-advantage, improving learning efficiency.
- π― Target Networks and Replay Memory: These techniques ensure more stable learning by reducing correlation in the training data.
What if the agent could dynamically adjust its focus on long-term versus short-term rewards during training? That's what the Adaptive Gamma D3QN approach does! Inspired by the paper "How to Discount Deep Reinforcement Learning", this method adjusts the discount factor ((\gamma)) dynamically to improve learning stability.
- π Dynamic Gamma Adjustment: Gradually increases the discount factor Ξ³ to focus more on long-term rewards as training progresses.
- π Incremental Gamma Strategy:
Ξ³_{k+1} = 1 - 0.98 * (1 - Ξ³_k)
This helps stabilize learning and accelerate convergence.
- π§ D3QN Architecture: Combines the benefits of Double DQN and Dueling Networks with dynamic gamma adjustments.
To better compare the performance of each algorithm, here are the results visualized through loss plots, reward plots, epsilon decay plots, and mean Q-value plots. These visualizations give you insight into how each algorithm learns over time.
Metric | DQN π | D3QN π | Adaptive Gamma D3QN π |
---|---|---|---|
Loss | Highly fluctuating, stabilizes towards end | Smoother, fewer fluctuations | Most stable, gradual decline |
Reward | Large fluctuations | More consistent, but some oscillations | Stable, higher rewards with faster convergence |
Epsilon Decay | Slow decay over time | Faster decay | Similar to D3QN |
Mean Q-Values | Steady rise, but slow | Faster rise | Steady and higher overall Q-values |
Epoch 10 |
Epoch 1000 |
Epoch 1637 |
Epoch 10 |
Epoch 750 |
Epoch 1500 |
Epoch 10 |
Epoch 500 |
Epoch 1000 |
As we push the boundaries of RL in this space adventure, here are some future directions we'd love to explore:
- π Prioritized Experience Replay: Focus on more significant experiences to boost learning efficiency.
- π² Noisy Networks: Introduce noise into network parameters to improve exploration.
- βοΈ Actor-Critic Comparison: Explore how D3QN stacks up against Actor-Critic methods like A3C or PPO.
- π Rainbow DQN: Combine all the best practices in one β including Double DQN, Dueling Networks, Noisy Nets, and Prioritized Replay for an ultimate RL agent.
Feel free to explore the code, experiment with parameters, and share your results! The sky (or perhaps space π) is the limit when it comes to Reinforcement Learning. Have fun coding and may your models land softly! π
Happy learning! π