Skip to content

Releases: CarperAI/trlx

v0.7.0: NeMo PPO, PEFT Migration, and Fixes

23 Jun 22:21
Compare
Choose a tag to compare

The v0.7.0 release includes several new features, bug fixes, and overall improvements to the codebase. Here are the key changes:

🐠 NeMo PPO and SFT support

This release introduces NeMo-backed PPO and SFT implementations for capabilities and improved system performance under large-scale training.

🦆 PEFT Migration

trlx now supports parameter-efficient tuning methods via the peft library, which we hope will provide greater access to RLHF training in low-resource settings.

Fixes and mores!

New Contributors

Full Changelog: v0.6.0...v0.7.0

v0.6.0: LLaMa (Alpaca), Benchmark Util, T5 ILQL, Tests

31 Mar 21:41
b7db6f9
Compare
Choose a tag to compare

The v0.6.0 release includes several new features, bug fixes, and overall improvements to the codebase. Here are the key changes:

📏 Benchmarking and Improved Unit Tests

This release introduces a new benchmark util to more easily track regressions in our training pipeline along with improved unit tests with the help of the hypothesis package:

🦙 LLaMa and Alpaca PPO/SFT Support

PPO support and examples for LLaMa are now available and we’ve baked in an example for instruction fine-tuning models with the Alpaca dataset using our SFT trainer:

5️⃣ T5 ILQL Support

T5 models can now be fine-tuned with ILQL:

  • Support ILQL for T5 model, Fix PPO T5 for refactored code by @PhungVanDuy in #290

Fixes

What's Changed

  • Move to Python config classes instead of ymls by @cat-state in #306
  • Add intermediate checkpointing to accelerate trainers by @jon-tow in #349
  • Enable infinite dataloader for prompt_dataloader in PPO Trainer by @alexandremuzio in #358
  • [feat] Add optional dependency list by @reciprocated in #381
  • Add some synchronization to the db download in the simulacra example by @dakinggg in #406

New Contributors

Full Changelog: v0.5.0...v0.6.0

v0.5.0: Initial NeMo integration, HH example, and improved Hugging Face integration

22 Feb 23:50
165422d
Compare
Choose a tag to compare

Highlights

What's Changed

New Contributors

Full Changelog: v0.4...v0.5.0

v0.4

13 Jan 16:50
84dd156
Compare
Choose a tag to compare
v0.4 Pre-release
Pre-release

Summary of release notes:

Along with many improvements to experiment tracking, rollout logging, and configuration flexibility, new highlight features include:

  • Support for T5-based student models. Check out this example, where we show how to fine-tune a FLAN-T5 model on CNN/DailyMail for summarization.

  • Support for parameter-efficient tuning methods. Some of our preliminary results have shown LoRA to be a promising technique in scaling RLHF under low-resource settings and hope users get the chance to explore its potential. We've seen a ~30% reduction in memory usage and ~20% reduction in wallclock time for the same performance (quick report here)

  • Out-of-the-box support for 8-bit Adam(W) optimizers via TimDettmers/bitsandbytes, leading to a 15% decrease in memory allocation in one of our baseline examples (related report).

Other interesting examples are in the works, so stay tuned!

What's Changed

New Contributors

Full Changelog: v0.3...v0.4

Pre alpha v0.3

21 Nov 16:27
ff0d077
Compare
Choose a tag to compare
Pre alpha v0.3 Pre-release
Pre-release

What's Changed

New Contributors

Full Changelog: v0.2...v0.3

Alpha v0.2

21 Oct 22:20
06cd30f
Compare
Choose a tag to compare
Alpha v0.2 Pre-release
Pre-release

Complete revamp of our initial release.

New features:

  • Hydra models, 20x faster than vanilla PPO with minimal performance hits at large scales
  • Massively revamped API, significantly less boiler plate.
  • Save/load callbacks.
  • Greatly improved orchestrator.
  • Better commented RL code, easier to understand whats going on.
  • Cool examples, including architext and simulacra.
  • Better extendability, and standardized styling.

Features coming soon:

  • Megatron support! we're already working on this.
  • More interesting examples that are relevant to production use cases of TRLX.
  • Better integration of W&B, including sweeps.
  • Evaluation and benchmarking.

:)

Autogenerated release notes below:

What's Changed

New Contributors

Full Changelog: https://github.com/CarperAI/trlx/commits/v0.2