Skip to content

Weekly patch release

Compare
Choose a tag to compare
@Borda Borda released this 30 Aug 12:29
· 1498 commits to release/stable since this release

App

Changed

  • Change top folder (#18212)
  • Remove _handle_is_headless calls in app run loop (#18362)

Fixed

  • refactor path to root preventing circular import (#18357)

Fabric

Changed

  • On XLA, avoid setting the global rank before processes have been launched as this will initialize the PJRT computation client in the main process (#16966)

Fixed

  • Fixed model parameters getting shared between processes when running with strategy="ddp_spawn" and accelerator="cpu"; this has a necessary memory impact, as parameters are replicated for each process now (#18238)
  • Removed false positive warning when using fabric.no_backward_sync with XLA strategies (#17761)
  • Fixed issue where Fabric would not initialize the global rank, world size, and rank-zero-only rank after initialization and before launch (#16966)
  • Fixed FSDP full-precision param_dtype training (16-mixed, bf16-mixed and 32-true configurations) to avoid FSDP assertion errors with PyTorch < 2.0 (#18278)

PyTorch

Changed

  • On XLA, avoid setting the global rank before processes have been launched as this will initialize the PJRT computation client in the main process (#16966)
  • Fix inefficiency in rich progress bar (#18369)

Fixed

  • Fixed FSDP full-precision param_dtype training (16-mixed and bf16-mixed configurations) to avoid FSDP assertion errors with PyTorch < 2.0 (#18278)
  • Fixed an issue that prevented the use of custom logger classes without an experiment property defined (#18093)
  • Fixed setting the tracking uri in MLFlowLogger for logging artifacts to the MLFlow server (#18395)
  • Fixed redundant iter() call to dataloader when checking dataloading configuration (#18415)
  • Fixed model parameters getting shared between processes when running with strategy="ddp_spawn" and accelerator="cpu"; this has a necessary memory impact, as parameters are replicated for each process now (#18238)
  • Properly manage fetcher.done with dataloader_iter (#18376)

Contributors

@awaelchli, @Borda, @carmocca, @quintenroets, @rlizzo, @speediedan, @tchaton

If we forgot someone due to not matching commit email with GitHub account, let us know :]