Skip to content

Latest commit

Β 

History

History

Lunar_Lander

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Lunar Lander RL Practices πŸš€

Python Status

🌍 Welcome to the world of Reinforcement Learning (RL) in space! This repository explores the exciting world of RL through the Lunar Lander problem. Whether you're a beginner to RL or a seasoned expert, this is your chance to see how different algorithms tackle the challenge of landing a spaceship πŸš€ safely on a designated pad. Explore DQN, D3QN, and Adaptive Gamma D3QN with real performance visualizations and results. Fasten your seatbelts and prepare for takeoff! 🎯


Table of Contents


1 - DQN (Deep Q-Networks)

The Deep Q-Network (DQN) algorithm is one of the most fundamental techniques in RL, combining Q-learning with neural networks. In this Lunar Lander problem, the DQN agent learns how to land a spaceship by estimating the Q-values for each action it can take in every state.

Highlights:

  • 🧠 Q-Learning with Neural Networks: Q-values are approximated using neural networks.
  • πŸ’Ύ Replay Memory: Experiences are stored and used for training the agent in mini-batches.
  • 🎲 Epsilon-Greedy Exploration: The agent strikes a balance between exploring new actions and exploiting known actions.

Performance Visualization: DQN Performance


2 - D3QN (Dueling Double DQN)

Dueling Double Deep Q-Network (D3QN) extends the classic DQN approach by combining two powerful techniques: Double Q-Learning to reduce overestimation bias and Dueling Network Architecture to separately estimate the state-value and action-advantage. Together, they make learning faster and more stable.

Highlights:

  • πŸ“‰ Double Q-Learning: Reduces overestimation of Q-values by decoupling action selection from action evaluation.
  • πŸ… Dueling Architecture: Separates the estimation of state-value and action-advantage, improving learning efficiency.
  • 🎯 Target Networks and Replay Memory: These techniques ensure more stable learning by reducing correlation in the training data.

Performance Visualization: D3QN Performance


3 - Adaptive Gamma D3QN

What if the agent could dynamically adjust its focus on long-term versus short-term rewards during training? That's what the Adaptive Gamma D3QN approach does! Inspired by the paper "How to Discount Deep Reinforcement Learning", this method adjusts the discount factor ((\gamma)) dynamically to improve learning stability.

Highlights:

  • πŸ”„ Dynamic Gamma Adjustment: Gradually increases the discount factor Ξ³ to focus more on long-term rewards as training progresses.
  • πŸ“ˆ Incremental Gamma Strategy:
Ξ³_{k+1} = 1 - 0.98 * (1 - Ξ³_k)

This helps stabilize learning and accelerate convergence.

  • 🧠 D3QN Architecture: Combines the benefits of Double DQN and Dueling Networks with dynamic gamma adjustments.

Performance Visualization: Adaptive Gamma Performance


Results and Visualizations

To better compare the performance of each algorithm, here are the results visualized through loss plots, reward plots, epsilon decay plots, and mean Q-value plots. These visualizations give you insight into how each algorithm learns over time.

Comparison of DQN, D3QN, and Adaptive Gamma D3QN:

Metric DQN πŸ“Š D3QN πŸ“ˆ Adaptive Gamma D3QN πŸ“‰
Loss Highly fluctuating, stabilizes towards end Smoother, fewer fluctuations Most stable, gradual decline
Reward Large fluctuations More consistent, but some oscillations Stable, higher rewards with faster convergence
Epsilon Decay Slow decay over time Faster decay Similar to D3QN
Mean Q-Values Steady rise, but slow Faster rise Steady and higher overall Q-values

Results and Visualizations

DQN Results:

Epoch 10
Epoch 10 Performance
Epoch 1000
Epoch 1000 Performance
Epoch 1637
Epoch 1637 Performance

D3QN Results:

Epoch 10
Epoch 10 Performance
Epoch 750
Epoch 750 Performance
Epoch 1500
Epoch 1500 Performance

Adaptive Gamma D3QN Results:

Epoch 10
Epoch 10 Performance
Epoch 500
Epoch 500 Performance
Epoch 1000
Epoch 1000 Performance

Future Work: πŸš€

As we push the boundaries of RL in this space adventure, here are some future directions we'd love to explore:

  • 🌟 Prioritized Experience Replay: Focus on more significant experiences to boost learning efficiency.
  • 🎲 Noisy Networks: Introduce noise into network parameters to improve exploration.
  • βš–οΈ Actor-Critic Comparison: Explore how D3QN stacks up against Actor-Critic methods like A3C or PPO.
  • 🌈 Rainbow DQN: Combine all the best practices in one β€” including Double DQN, Dueling Networks, Noisy Nets, and Prioritized Replay for an ultimate RL agent.

Feel free to explore the code, experiment with parameters, and share your results! The sky (or perhaps space πŸš€) is the limit when it comes to Reinforcement Learning. Have fun coding and may your models land softly! πŸ˜„

Happy learning! πŸš€