Releases · Lightning-AI/pytorch-lightning

24 Nov 16:55

Borda

1.0.8

0979e2c

standard weekly patch release

Detail changes

Added

Added casting to python types for numpy scalars when logging hparams (#4647)
Added warning when progress bar refresh rate is less than 20 on Google Colab to prevent crashing (#4654)
Added F1 class metric (#4656)

Changed

Consistently use step=trainer.global_step in LearningRateMonitor independently of logging_interval (#4376)
Metric states are no longer as default added to state_dict (#4685)
Renamed class metric Fbeta >> FBeta (#4656)
Model summary: add 1 decimal place (#4745)
Do not override PYTHONWARNINGS (#4700)

Fixed

Fixed checkpoint hparams dict casting when omegaconf is available (#4770)
Fixed incomplete progress bars when total batches not divisible by refresh rate (#4577)
Updated SSIM metric (#4566)(#4656)
Fixed batch_arg_name - add batch_arg_name to all calls to _adjust_batch_sizebug (#4812)
Fixed torchtext data to GPU (#4785)
Fixed a crash bug in MLFlow logger (#4716)

Contributors

@awaelchli, @jonashaag, @jungwhank, @M-Salti, @moi90, @pgagarinov, @s-rog, @Samyak2, @SkafteNicki, @teddykoker, @ydcjeff

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Assets 2

17 Nov 21:57

Borda

1.0.7

d9ba857

standard weekly patch release

Detail changes

Added

Added lambda closure to manual_optimizer_step (#4618)

Changed

Change Metrics persistent default mode to False (#4685)

Fixed

Prevent crash if sync_dist=True on CPU (#4626)
Fixed average pbar Metrics (#4534)
Fixed setup callback hook to correctly pass the LightningModule through (#4608)
Allowing decorate model init with saving hparams inside (#4662)
Fixed split_idx set by LoggerConnector in on_trainer_init to Trainer (#4697)

Contributors

@ananthsub, @Borda, @SeanNaren, @SkafteNicki, @tchaton

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Assets 2

11 Nov 13:16

Borda

1.0.6

777a046

standard weekly patch release

Detail changes

Added

Added metrics aggregation in Horovod and fixed early stopping (#3775)
Added manual_optimizer_step which work with AMP Native and accumulated_grad_batches (#4485)
Added persistent(mode) method to metrics, to enable and disable metric states being added to state_dict (#4482)
Added congratulations at the end of our notebooks (#4555)

Changed

Changed fsspec to tuner (#4458)
Unify sLURM/TorchElastic under backend plugin (#4578, #4580, #4581, #4582, #4583)

Fixed

Fixed feature-lack in hpc_load (#4526)
Fixed metrics states being overridden in DDP mode (#4482)
Fixed lightning_getattr, lightning_hasattr not finding the correct attributes in datamodule (#4347)
Fixed automatic optimization AMP by manual_optimization_step (#4485)
Replace MisconfigurationException with warning in ModelCheckpoint Callback (#4560)
Fixed logged keys in mlflow logger (#4412)
Fixed is_picklable by catching AttributeError (#4508)

Contributors

@dscarmo, @jtamir, @kazhang, @maxjeblick, @rohitgr7, @SkafteNicki, @tarepan, @tchaton, @tgaddair, @williamFalcon

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Assets 2

04 Nov 02:00

Borda

1.0.5

b3db197

standard weekly patch release

Detail changes

Added

Added PyTorch 1.7 Stable support (#3821)
Added timeout for tpu_device_exists to ensure process does not hang indefinitely (#4340)

Changed

W&B log in sync with Trainer step (#4405)
Hook on_after_backward is called only when optimizer_step is being called (#4439)
Moved track_and_norm_grad into training loop and called only when optimizer_step is being called (#4439)
Changed type checker with explicit cast of ref_model object (#4457)

Deprecated

Deprecated passing ModelCheckpoint instance to checkpoint_callback Trainer argument (#4336)

Fixed

Disable saving checkpoints if not trained (#4372)
Fixed error using auto_select_gpus=True with gpus=-1 (#4209)
Disabled training when limit_train_batches=0 (#4371)
Fixed that metrics do not store computational graph for all seen data (#4313)
Fixed AMP unscale for on_after_backward (#4439)
Fixed TorchScript export when module includes Metrics (#4428)
Fixed CSV logger warning (#4419)
Fixed skip DDP parameter sync (#4301)

Contributors

@ananthsub, @awaelchli, @borisdayma, @carmocca, @justusschock, @lezwon, @rohitgr7, @SeanNaren, @SkafteNicki, @ssaru, @tchaton, @ydcjeff

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Assets 2

27 Oct 22:15

williamFalcon

1.0.4

5d10a36

standard weekly patch release

Detail changes

Added

Added dirpath and filename parameter in ModelCheckpoint (#4213)
Added plugins docs and DDPPlugin to customize ddp across all accelerators (#4258)
Added strict option to the scheduler dictionary (#3586)
Added fsspec support for profilers (#4162)
Added autogenerated helptext to Trainer.add_argparse_args (#4344)
Added support for string values in Trainer's profiler parameter (#3656)

Changed

Improved error messages for invalid configure_optimizers returns (#3587)
Allow changing the logged step value in validation_step (#4130)
Allow setting replace_sampler_ddp=True with a distributed sampler already added (#4273)
Fixed santized parameters for WandbLogger.log_hyperparams (#4320)

Deprecated

Deprecated filepath in ModelCheckpoint (#4213)
Deprecated reorder parameter of the auc metric (#4237)
Deprecated bool values in Trainer's profiler parameter (#3656)

Fixed

Fixed setting device ids in DDP (#4297)
Fixed synchronization of best model path in ddp_accelerator (#4323)
Fixed WandbLogger not uploading checkpoint artifacts at the end of training (#4341)

Contributors

@ananthsub, @awaelchli, @carmocca, @ddrevicky, @louis-she, @mauvilsa, @rohitgr7, @SeanNaren, @tchaton

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Assets 2

20 Oct 23:12

williamFalcon

1.0.3

e0e402d

standard weekly patch release

Detail changes

Added

Added persistent flag to Metric.add_state (#4195)

Changed

Used checkpoint_connector.hpc_save in SLURM (#4217)
Moved base req. to root (#4219)

Fixed

Fixed hparams assign in init (#4189)
Fixed overwrite check for model hooks (#4010)

Contributors

@Borda, @EspenHa, @teddykoker

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Assets 2

15 Oct 14:59

williamFalcon

1.0.2

5c153c2

fixes a major logging bug for val in 1.0

Fixes the last major bugs for validation logging.
Also removes duplicate charts for metric / metric_loss.
Doing this minor release because correct validation metrics logging is critical.

Details changes

Added

Added trace functionality to the function to_torchscript (#4142)

Changed

Called on_load_checkpoint before loading state_dict (#4057)

Removed

Removed duplicate metric vs step log for train loop (#4173)

Fixed

Fixed the self.log problem in validation_step() (#4169)
Fixed hparams saving - save the state when save_hyperparameters() is called [in __init__] (#4163)
Fixed runtime failure while exporting hparams to yaml (#4158)

Contributors

@Borda, @NumesSanguis, @rohitgr7, @williamFalcon

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Assets 2

14 Oct 00:47

williamFalcon

1.0.1

bbbc111

minor jit fixes

Obligatory post 1.0 minor release. Main fix is to make Lightning module fully compatible with Jit (had some edge-cases we had not covered).

Assets 2

13 Oct 12:12

williamFalcon

1.0.0

09c2020

1.0.0 - General availability

Overview

...

Detail changes

Added Explained Variance Metric + metric fix (#4013)
Added Metric <-> Lightning Module integration tests (#4008)
Added parsing OS env vars in Trainer (#4022)
Added classification metrics (#4043)
Updated explained variance metric (#4024)
Enabled plugins (#4041)
Enabled custom clusters (#4048)
Enabled passing in custom accelerators (#4050)
Added LightningModule.toggle_optimizer (#4058)
Added LightningModule.manual_backward (#4063)

Changed

Integrated metrics API with self.log (#3961)
Decoupled Appex (#4052, #4054, #4055, #4056, #4058, #4060, #4061, #4062, #4063, #4064, #4065)
Renamed all backends to Accelerator (#4066)
Enabled manual returns (#4089)

Removed

Removed output argument from *_batch_end hooks (#3965, #3966)
Removed output argument from *_epoch_end hooks (#3967)
Removed support for EvalResult and TrainResult (#3968)
Removed deprecated trainer flags: overfit_pct, log_save_interval, row_log_interval (#3969)
Removed deprecated early_stop_callback (#3982)
Removed deprecated model hooks (#3980)
Removed deprecated callbacks (#3979)
Removed trainer argument in LightningModule.backward [#4056)

Fixed

Fixed current_epoch property update to reflect true epoch number inside LightningDataModule, when reload_dataloaders_every_epoch=True. (#3974)
Fixed to print scaler value in progress bar (#4053)
Fixed mismatch between docstring and code regarding when on_load_checkpoint hook is called (#3996)

Contributors

@ananyahjha93, @Borda, @edenlightning, @hbredin, @rohitgr7, @SkafteNicki, @teddykoker, @williamFalcon

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Assets 2

07 Oct 21:16

williamFalcon

0.10.0

b4051e7

Buffer release before 1.0

This release is a buffer in case 1.0 breaks any compatibility for people who upgrade. 0.10.0 has all the bug fixes and features of 1.0 but is 100% backward compatible. The 1.0 release following in the next 24 hours.

Overview

The major changes are:

Results objects are deprecated (we hated them too haha)
This means dataflow and logging have been decoupled

To log:

def any_step(...):
   self.log('something', i_computed)

Separately, return whatever you want from methods:

def training_step(...):
  return loss

def training_step(...):
   return {'loss': loss, 'whatever': [1, 'want']}

Detail changes

Added

Added new Metrics API. (#3868, [#3921)
Enable PyTorch 1.7 compatibility (#3541)
Added LightningModule.to_torchscript to support exporting as ScriptModule (#3258)
Added warning when dropping unpicklable hparams (#2874)
Added EMB similarity (#3349)
Added ModelCheckpoint.to_yaml method (#3048)
Allow ModelCheckpoint monitor to be None, meaning it will always save ([3630)
Disabled optimizers setup during testing (#3059)
Added support for datamodules to save and load checkpoints when training (#3563
Added support for datamodule in learning rate finder (#3425)
Added gradient clip test for native AMP (#3754)
Added dist lib to enable syncing anything across devices (#3762)
Added broadcast to TPUBackend (#3814)
Added XLADeviceUtils class to check XLA device type (#3274)

Changed

Refactored accelerator backends:
- moved TPU xxx_step to backend (#3118)
- refactored DDP backend forward (#3119)
- refactored GPU backend __step (#3120)
- refactored Horovod backend (#3121, #3122)
- remove obscure forward call in eval + CPU backend ___step (#3123)
- reduced all simplified forward (#3126)
- added hook base method (#3127)
- refactor eval loop to use hooks - use test_mode for if so we can split later (#3129)
- moved ___step_end hooks (#3130)
- training forward refactor (#3134)
- training AMP scaling refactor (#3135)
- eval step scaling factor (#3136)
- add eval loop object to streamline eval loop (#3138)
- refactored dataloader process hook (#3139)
- refactored inner eval loop (#3141)
- final inner eval loop hooks (#3154)
- clean up hooks in run_evaluation (#3156)
- clean up data reset (#3161)
- expand eval loop out (#3165)
- moved hooks around in eval loop (#3195)
- remove _evaluate fx (#3197)
- Trainer.fit hook clean up (#3198)
- DDPs train hooks (#3203)
- refactor DDP backend (#3204, #3207, #3208, #3209, #3210)
- reduced accelerator selection (#3211)
- group prepare data hook (#3212)
- added data connector (#3285)
- modular is_overridden (#3290)
- adding Trainer.tune() (#3293)
- move run_pretrain_routine -> setup_training (#3294)
- move train outside of setup training (#3297)
- move prepare_data to data connector (#3307)
- moved accelerator router (#3309)
- train loop refactor - moving train loop to own object (#3310, #3312, #3313, #3314)
- duplicate data interface definition up into DataHooks class (#3344)
- inner train loop (#3359, #3361, #3362, #3363, #3365, #3366, #3367, #3368, #3369, #3370, #3371, #3372, #3373, #3374, #3375, #3376, #3385, #3388, #3397)
- all logging related calls in a connector (#3395)
- device parser (#3400, #3405)
- added model connector (#3407)
- moved eval loop logging to loggers (#3408)
- moved eval loop (#3412[#3408)
- trainer/separate argparse (#3421, #3428, #3432)
- move lr_finder (#3434)
- organize args (##3435, #3442, #3447, #3448, #3449, #3456)
- move specific accelerator code (#3457)
- group connectors (#3472)
- accelerator connector methods x/n (#3469, #3470, #3474)
- merge backends (#3476, #3477, #3478, #3480, #3482)
- apex plugin (#3502)
- precision plugins (#3504)
- Result - make monitor default to checkpoint_on to simplify (#3571)
- reference to the Trainer on the LightningDataModule (#3684)
- add .log to lightning module (#3686, #3699, #3701, #3704, #3715)
- enable tracking original metric when step and epoch are both true (#3685)
- deprecated results obj, added support for simpler comms (#3681)
- move backends back to individual files (#3712)
- fixes logging for eval steps (#3763)
- decoupled DDP, DDP spawn (#3733, #3766, #3767, #3774, #3802, #3806)
- remove weight loading hack for ddp_cpu (#3808)
- separate torchelastic from DDP (#3810)
- separate SLURM from DDP (#3809)
- decoupled DDP2 (#3816)
- bug fix with logging val epoch end + monitor (#3812)
- decoupled DDP, DDP spawn (#3733, #3817, #3819, #3927)
- callback system and init DDP (#3836)
- adding compute environments (#3837, [#3842)
- epoch can now log independently (#3843)
- test selecting the correct backend. temp backends while slurm and TorchElastic are decoupled (#3848)
- fixed init_slurm_connection causing hostname errors (#3856)
- moves init apex from LM to apex connector (#3923)
- moves sync bn to each backend (#3925)
- moves configure ddp to each backend (#3924)
Deprecation warning (#3844)
Changed LearningRateLogger to LearningRateMonitor (#3251)
Used fsspec instead of gfile for all IO (#3320)
- Swaped torch.load for fsspec load in DDP spawn backend (#3787)
- Swaped torch.load for fsspec load in cloud_io loading (#3692)
- Added support for to_disk() to use remote filepaths with fsspec (#3930)
- Updated model_checkpoint's to_yaml to use fsspec open (#3801)
- Fixed fsspec is inconsistant when doing fs.ls (#3805)
Refactor GPUStatsMonitor to improve training speed (#3257)
Changed IoU score behavior for classes absent in target and pred (#3098)
Changed IoU remove_bg bool to ignore_index optional int (#3098)
Changed defaults of save_top_k and save_last to None in ModelCheckpoint (#3680)
row_log_interval and log_save_interval are now based on training loop's global_step instead of epoch-internal batch index (#3667)
Silenced some warnings. verified ddp refactors (#3483)
Cleaning up stale logger tests (#3490)
Allow ModelCheckpoint monitor to be None (#3633)
Enable None model checkpoint default (#3669)
Skipped best_model_path if checkpoint_callback is None (#2962)
Used raise .. from .. to explicitly chain exceptions (#3750)
Mocking loggers (#3596, #3617, #3851, #3859, #3884, #3853, #3910, #3889, #3926)
Write predictions in LightningModule instead of EvalResult [#3882

Deprecated

Deprecated TrainResult and EvalResult, use self.log and self.write from the LightningModule to log metrics and write predictions. training_step can now only return a scalar (for the loss) or a dictionary with anything you want. (#3681)
Deprecate early_stop_callback Trainer argument (#3845)
Rename Trainer arguments row_log_interval >> log_every_n_steps and log_save_interval >> flush_logs_every_n_steps (#3748)

Removed

Removed experimental Metric API (#3868, #3943, #3949, #3946), listed changes before final removal:
- Added EmbeddingSimilarity metric (#3349, [#3358)
- Added hooks to metric module interface (#2528)
- Added error when AUROC metric is used for multiclass problems (#3350)
- Fixed ModelCheckpoint with save_top_k=-1 option not tracking the best models when a monitor metric is available (#3735)
- Fixed counter-intuitive error being thrown in Accuracy metric for zero target tensor (#3764)
- Fixed aggregation of metrics (#3517)
- Fixed Metric aggregation (#3321)
- Fixed RMSLE metric (#3188)
- Renamed reduction to class_reduction in classification metrics (#3322)
- Changed class_reduction similar to sklearn for classification metrics (#3322)
- Renaming of precision recall metric (#3308)

Fixed

Fixed on_train_batch_start hook to end epoch early (#3700)
Fixed num_sanity_val_steps is clipped to limit_val_batches (#2917)
Fixed ONNX model save on GPU (#3145)
Fixed GpuUsageLogger to work on different platforms (#3008)
Fixed auto-scale batch size not dumping auto_lr_find parameter (#3151)
Fixed batch_outputs with optimizer frequencies (#3229)
Fixed setting batch size in LightningModule.datamodule when using auto_scale_batch_size (#3266)
Fixed Horovod distributed backend compatibility with native AMP (#3404)
Fixed batch size auto scaling exceeding the size of the dataset (#3271)
Fixed getting experiment_id from MLFlow only once instead of each training loop (#3394)
Fixed overfit_batches which now correctly disables shuffling for the training loader. (#3501)
Fixed gradient norm tracking for row_log_interval > 1 (#3489)
Fixed ModelCheckpoint name formatting ([3164)
Fixed auto-scale batch size (#3151)
Fixed example implementation of AutoEncoder (#3190)
Fixed invalid paths when remote logging with TensorBoard (#3236)
Fixed change t() to transpose() as XLA devices do not support .t() on 1-dim tensor (#3252)
Fixed (weights only) checkpoints loading without PL (#3287)
Fixed gather_all_tensors cross GPUs in DDP (#3319)
Fixed CometML save dir (#3419)
Fixed forward key metrics (#3467)
Fixed normalize mode at confusion matrix (replace NaNs with zeros) (#3465)
Fixed global step increment in training loop when training_epoch_end hook is used (#3673)
Fixed dataloader shuffling not getting turned off with overfit_batches > 0 and distributed_backend = "ddp" (#3534)
Fixed determinism in DDPSpawnBackend when using seed_everything in main process (#3335)
Fixed ModelCheckpoint period to actually save every period epochs (#363...

Assets 2

Releases: Lightning-AI/pytorch-lightning

standard weekly patch release

Detail changes

Added

Changed

Fixed

Contributors

standard weekly patch release

Detail changes

Added

Changed

Fixed

Contributors

standard weekly patch release

Detail changes

Added

Changed

Fixed

Contributors

standard weekly patch release

Detail changes

Added

Changed

Deprecated

Fixed

Contributors

standard weekly patch release

Detail changes

Added

Changed

Deprecated

Fixed

Contributors

standard weekly patch release

Detail changes

Added

Changed

Fixed

Contributors

fixes a major logging bug for val in 1.0

Details changes

Added

Changed

Removed

Fixed

Contributors

minor jit fixes

1.0.0 - General availability

Overview

Detail changes

Changed

Removed

Fixed

Contributors

Buffer release before 1.0

Overview

Detail changes

Added

Changed

Deprecated

Removed

Fixed