Releases: jamesliu/nanoPPO
v0.15
Release v0.15
New Features
- Created actor/critic causal attention policy. (521f062)
- Added version number and custom learning rate scheduler to PPO agent. Modified
train_ppo_agent.py
to use the new scheduler.- Added new version number to
__init__.py
. - Modified
PPOAgent
class incontinuous_action_ppo.py
to accept an optionallr_scheduler
argument. - Added
cosine_lr_scheduler.py
to define custom learning rate scheduler. (36479a2)
- Added new version number to
- Added gradient and weight inf and nan check. (00abb4e)
- Added debug flag when detecting NAN in model parameters. (dd21ff4)
- Avoid policy loss:nan, entropy loss:-inf. Enhance the stability.
- set
torch.nn.utils.clip_grad_norm_
to usemax_norm=0.7
. - Sanitizing LogProbs by replacing
-inf
log probabilities with large negative numbers. (af2c474)
- set
- Modified check for stop reward and num cumulative rewards before saving best weights. (af434c1)
- Prepare to upgrade to version 0.15. (4f8f491)
- Update
train_ppo_agent.py
to use avg_reward instead of metrics for training. (28f9ab3) - Added 'train_reward' to metrics dictionary in
train_agent
function. (273238b) - Added Cosine LR scheduler and updated
PPOAgent
to use it, with iterative learning rate adjustment. (52833d9) - Set placeholder optimizer in Cosine LR scheduler in
PPOAgent
. (7f805c9) - Update python-version in
publish.yml
to include only 3.10 and 3.11. (c1aa7df)
nanoPPO v0.14 Release
Release Notes for v0.14
New Features:
- Training Control: Added a
stop_reward
check within thetrain_agent()
function. Training will now skip if thebest_reward
is greater than thestop_reward
. - Gradient Clipping: Introduced gradient clipping with
clip grad norm
to prevent updates that are too aggressive. - Device Configuration: Enhanced the Proximal Policy Optimization (PPO) agent with added device parameters, facilitating both CPU and GPU configurations. This ensures the agent can now be run on a GPU for accelerated training.
- Action Rescaling: Initialized the feature to rescale actions, though it's currently disabled for stability concerns.
Enhancements:
- Cumulative Reward: Implemented the calculation of a rolling average cumulative reward.
- Code Quality: The entire source code and examples have been reformatted for consistency and readability using the
black
code formatter.
Miscellaneous:
- Various updates related to release configurations and changelog updates.
Note: Always refer to the official documentation or repository for a more detailed breakdown of changes.
v0.13
nanoPPO v0.13 Release
We are excited to announce the initial release of nanoPPO, version 0.13! This release lays the foundation for reinforcement learning practitioners, providing a lightweight and efficient implementation of the Proximal Policy Optimization (PPO) algorithm.
Highlights:
- PPO Implementation: Besides supporting discrete action spaces in v0.1, now supporting continuous action spaces in v0.13 for a wide range of applications.
- Ease of Use: Simple API to get started with PPO training quickly.
- Examples Included: Contains examples to help users understand how to train agents on various environments.
- Custom Environments: We create two environments: PointMass1D and PointMass2D for easy testing of the PPO agent training.
- Test Suite: Initial test suite to ensure code quality and functionality.
Installation:
You can install nanoPPO via PyPI:
pip install nanoPPO
Or clone the repository and install from source:
git clone https://github.com/jamesliu/nanoPPO.git
cd nanoPPO
pip install .
Support & Contribution:
We welcome feedback, issues, and contributions. Please refer to our contribution guidelines for more details.
Thank you for your interest in nanoPPO, and we look forward to hearing your feedback and seeing what you build with it!
v0.12
nanoPPO v0.12 Release
We are excited to announce the initial release of nanoPPO, version 0.12! This release lays the foundation for reinforcement learning practitioners, providing a lightweight and efficient implementation of the Proximal Policy Optimization (PPO) algorithm.
Highlights:
- PPO Implementation: Besides supporting discrete action spaces in v0.1, now supporting continuous action spaces in v0.12 for a wide range of applications.
- Ease of Use: Simple API to get started with PPO training quickly.
- Examples Included: Contains examples to help users understand how to train agents on various environments.
- Custom Environments: We create two environments: PointMass1D and PointMass2D for easy testing of the PPO agent training.
- Test Suite: Initial test suite to ensure code quality and functionality.
Installation:
You can install nanoPPO via PyPI:
pip install nanoPPO
Or clone the repository and install from source:
git clone https://github.com/jamesliu/nanoPPO.git
cd nanoPPO
pip install .
Support & Contribution:
We welcome feedback, issues, and contributions. Please refer to our contribution guidelines for more details.
Thank you for your interest in nanoPPO, and we look forward to hearing your feedback and seeing what you build with it!
nanoPPO v0.11 Release
nanoPPO v0.11 Release
We are excited to announce the initial release of nanoPPO, version 0.11! This release lays the foundation for reinforcement learning practitioners, providing a lightweight and efficient implementation of the Proximal Policy Optimization (PPO) algorithm.
Highlights:
- PPO Implementation: Besides supporting discrete action spaces in v0.1, now supporting continuous action spaces in v0.11 for a wide range of applications.
- Ease of Use: Simple API to get started with PPO training quickly.
- Examples Included: Contains examples to help users understand how to train agents on various environments.
- Custom Environments: We create two environments: PointMass1D and PointMass2D for easy testing of the PPO agent training.
- Test Suite: Initial test suite to ensure code quality and functionality.
Installation:
You can install nanoPPO via PyPI:
pip install nanoPPO
Or clone the repository and install from source:
git clone https://github.com/jamesliu/nanoPPO.git
cd nanoPPO
pip install .
Support & Contribution:
We welcome feedback, issues, and contributions. Please refer to our contribution guidelines for more details.
Thank you for your interest in nanoPPO, and we look forward to hearing your feedback and seeing what you build with it!