Skip to content

Releases: jamesliu/nanoPPO

v0.15

06 Nov 20:12
Compare
Choose a tag to compare

Release v0.15

New Features

  • Created actor/critic causal attention policy. (521f062)
  • Added version number and custom learning rate scheduler to PPO agent. Modified train_ppo_agent.py to use the new scheduler.
    • Added new version number to __init__.py.
    • Modified PPOAgent class in continuous_action_ppo.py to accept an optional lr_scheduler argument.
    • Added cosine_lr_scheduler.py to define custom learning rate scheduler. (36479a2)
  • Added gradient and weight inf and nan check. (00abb4e)
  • Added debug flag when detecting NAN in model parameters. (dd21ff4)
  • Avoid policy loss:nan, entropy loss:-inf. Enhance the stability.
    • set torch.nn.utils.clip_grad_norm_ to use max_norm=0.7.
    • Sanitizing LogProbs by replacing -inf log probabilities with large negative numbers. (af2c474)
  • Modified check for stop reward and num cumulative rewards before saving best weights. (af434c1)
  • Prepare to upgrade to version 0.15. (4f8f491)
  • Update train_ppo_agent.py to use avg_reward instead of metrics for training. (28f9ab3)
  • Added 'train_reward' to metrics dictionary in train_agent function. (273238b)
  • Added Cosine LR scheduler and updated PPOAgent to use it, with iterative learning rate adjustment. (52833d9)
  • Set placeholder optimizer in Cosine LR scheduler in PPOAgent. (7f805c9)
  • Update python-version in publish.yml to include only 3.10 and 3.11. (c1aa7df)

nanoPPO v0.14 Release

07 Oct 17:25
Compare
Choose a tag to compare

Release Notes for v0.14

New Features:

  • Training Control: Added a stop_reward check within the train_agent() function. Training will now skip if the best_reward is greater than the stop_reward.
  • Gradient Clipping: Introduced gradient clipping with clip grad norm to prevent updates that are too aggressive.
  • Device Configuration: Enhanced the Proximal Policy Optimization (PPO) agent with added device parameters, facilitating both CPU and GPU configurations. This ensures the agent can now be run on a GPU for accelerated training.
  • Action Rescaling: Initialized the feature to rescale actions, though it's currently disabled for stability concerns.

Enhancements:

  • Cumulative Reward: Implemented the calculation of a rolling average cumulative reward.
  • Code Quality: The entire source code and examples have been reformatted for consistency and readability using the black code formatter.

Miscellaneous:

  • Various updates related to release configurations and changelog updates.

Note: Always refer to the official documentation or repository for a more detailed breakdown of changes.

v0.13

19 Sep 22:06
Compare
Choose a tag to compare

nanoPPO v0.13 Release

We are excited to announce the initial release of nanoPPO, version 0.13! This release lays the foundation for reinforcement learning practitioners, providing a lightweight and efficient implementation of the Proximal Policy Optimization (PPO) algorithm.

Highlights:

  • PPO Implementation: Besides supporting discrete action spaces in v0.1, now supporting continuous action spaces in v0.13 for a wide range of applications.
  • Ease of Use: Simple API to get started with PPO training quickly.
  • Examples Included: Contains examples to help users understand how to train agents on various environments.
  • Custom Environments: We create two environments: PointMass1D and PointMass2D for easy testing of the PPO agent training.
  • Test Suite: Initial test suite to ensure code quality and functionality.

Installation:
You can install nanoPPO via PyPI:

pip install nanoPPO

Or clone the repository and install from source:

git clone https://github.com/jamesliu/nanoPPO.git
cd nanoPPO
pip install .

Support & Contribution:
We welcome feedback, issues, and contributions. Please refer to our contribution guidelines for more details.

Thank you for your interest in nanoPPO, and we look forward to hearing your feedback and seeing what you build with it!

v0.12

19 Sep 21:38
Compare
Choose a tag to compare

nanoPPO v0.12 Release

We are excited to announce the initial release of nanoPPO, version 0.12! This release lays the foundation for reinforcement learning practitioners, providing a lightweight and efficient implementation of the Proximal Policy Optimization (PPO) algorithm.

Highlights:

  • PPO Implementation: Besides supporting discrete action spaces in v0.1, now supporting continuous action spaces in v0.12 for a wide range of applications.
  • Ease of Use: Simple API to get started with PPO training quickly.
  • Examples Included: Contains examples to help users understand how to train agents on various environments.
  • Custom Environments: We create two environments: PointMass1D and PointMass2D for easy testing of the PPO agent training.
  • Test Suite: Initial test suite to ensure code quality and functionality.

Installation:
You can install nanoPPO via PyPI:

pip install nanoPPO

Or clone the repository and install from source:

git clone https://github.com/jamesliu/nanoPPO.git
cd nanoPPO
pip install .

Support & Contribution:
We welcome feedback, issues, and contributions. Please refer to our contribution guidelines for more details.

Thank you for your interest in nanoPPO, and we look forward to hearing your feedback and seeing what you build with it!

nanoPPO v0.11 Release

21 Aug 06:47
Compare
Choose a tag to compare

nanoPPO v0.11 Release

We are excited to announce the initial release of nanoPPO, version 0.11! This release lays the foundation for reinforcement learning practitioners, providing a lightweight and efficient implementation of the Proximal Policy Optimization (PPO) algorithm.

Highlights:

  • PPO Implementation: Besides supporting discrete action spaces in v0.1, now supporting continuous action spaces in v0.11 for a wide range of applications.
  • Ease of Use: Simple API to get started with PPO training quickly.
  • Examples Included: Contains examples to help users understand how to train agents on various environments.
  • Custom Environments: We create two environments: PointMass1D and PointMass2D for easy testing of the PPO agent training.
  • Test Suite: Initial test suite to ensure code quality and functionality.

Installation:
You can install nanoPPO via PyPI:

pip install nanoPPO

Or clone the repository and install from source:

git clone https://github.com/jamesliu/nanoPPO.git
cd nanoPPO
pip install .

Support & Contribution:
We welcome feedback, issues, and contributions. Please refer to our contribution guidelines for more details.

Thank you for your interest in nanoPPO, and we look forward to hearing your feedback and seeing what you build with it!