Skip to content

Releases: Lightning-AI/pytorch-lightning

standard weekly patch release

24 Nov 16:55
Compare
Choose a tag to compare

Detail changes

Added

  • Added casting to python types for numpy scalars when logging hparams (#4647)
  • Added warning when progress bar refresh rate is less than 20 on Google Colab to prevent crashing (#4654)
  • Added F1 class metric (#4656)

Changed

  • Consistently use step=trainer.global_step in LearningRateMonitor independently of logging_interval (#4376)
  • Metric states are no longer as default added to state_dict (#4685)
  • Renamed class metric Fbeta >> FBeta (#4656)
  • Model summary: add 1 decimal place (#4745)
  • Do not override PYTHONWARNINGS (#4700)

Fixed

  • Fixed checkpoint hparams dict casting when omegaconf is available (#4770)
  • Fixed incomplete progress bars when total batches not divisible by refresh rate (#4577)
  • Updated SSIM metric (#4566)(#4656)
  • Fixed batch_arg_name - add batch_arg_name to all calls to _adjust_batch_sizebug (#4812)
  • Fixed torchtext data to GPU (#4785)
  • Fixed a crash bug in MLFlow logger (#4716)

Contributors

@awaelchli, @jonashaag, @jungwhank, @M-Salti, @moi90, @pgagarinov, @s-rog, @Samyak2, @SkafteNicki, @teddykoker, @ydcjeff

If we forgot someone due to not matching commit email with GitHub account, let us know :]

standard weekly patch release

17 Nov 21:57
Compare
Choose a tag to compare

Detail changes

Added

  • Added lambda closure to manual_optimizer_step (#4618)

Changed

  • Change Metrics persistent default mode to False (#4685)

Fixed

  • Prevent crash if sync_dist=True on CPU (#4626)
  • Fixed average pbar Metrics (#4534)
  • Fixed setup callback hook to correctly pass the LightningModule through (#4608)
  • Allowing decorate model init with saving hparams inside (#4662)
  • Fixed split_idx set by LoggerConnector in on_trainer_init to Trainer (#4697)

Contributors

@ananthsub, @Borda, @SeanNaren, @SkafteNicki, @tchaton

If we forgot someone due to not matching commit email with GitHub account, let us know :]

standard weekly patch release

11 Nov 13:16
Compare
Choose a tag to compare

Detail changes

Added

  • Added metrics aggregation in Horovod and fixed early stopping (#3775)
  • Added manual_optimizer_step which work with AMP Native and accumulated_grad_batches (#4485)
  • Added persistent(mode) method to metrics, to enable and disable metric states being added to state_dict (#4482)
  • Added congratulations at the end of our notebooks (#4555)

Changed

Fixed

  • Fixed feature-lack in hpc_load (#4526)
  • Fixed metrics states being overridden in DDP mode (#4482)
  • Fixed lightning_getattr, lightning_hasattr not finding the correct attributes in datamodule (#4347)
  • Fixed automatic optimization AMP by manual_optimization_step (#4485)
  • Replace MisconfigurationException with warning in ModelCheckpoint Callback (#4560)
  • Fixed logged keys in mlflow logger (#4412)
  • Fixed is_picklable by catching AttributeError (#4508)

Contributors

@dscarmo, @jtamir, @kazhang, @maxjeblick, @rohitgr7, @SkafteNicki, @tarepan, @tchaton, @tgaddair, @williamFalcon

If we forgot someone due to not matching commit email with GitHub account, let us know :]

standard weekly patch release

04 Nov 02:00
Compare
Choose a tag to compare

Detail changes

Added

  • Added PyTorch 1.7 Stable support (#3821)
  • Added timeout for tpu_device_exists to ensure process does not hang indefinitely (#4340)

Changed

  • W&B log in sync with Trainer step (#4405)
  • Hook on_after_backward is called only when optimizer_step is being called (#4439)
  • Moved track_and_norm_grad into training loop and called only when optimizer_step is being called (#4439)
  • Changed type checker with explicit cast of ref_model object (#4457)

Deprecated

  • Deprecated passing ModelCheckpoint instance to checkpoint_callback Trainer argument (#4336)

Fixed

  • Disable saving checkpoints if not trained (#4372)
  • Fixed error using auto_select_gpus=True with gpus=-1 (#4209)
  • Disabled training when limit_train_batches=0 (#4371)
  • Fixed that metrics do not store computational graph for all seen data (#4313)
  • Fixed AMP unscale for on_after_backward (#4439)
  • Fixed TorchScript export when module includes Metrics (#4428)
  • Fixed CSV logger warning (#4419)
  • Fixed skip DDP parameter sync (#4301)

Contributors

@ananthsub, @awaelchli, @borisdayma, @carmocca, @justusschock, @lezwon, @rohitgr7, @SeanNaren, @SkafteNicki, @ssaru, @tchaton, @ydcjeff

If we forgot someone due to not matching commit email with GitHub account, let us know :]

standard weekly patch release

27 Oct 22:15
5d10a36
Compare
Choose a tag to compare

Detail changes

Added

  • Added dirpath and filename parameter in ModelCheckpoint (#4213)
  • Added plugins docs and DDPPlugin to customize ddp across all accelerators (#4258)
  • Added strict option to the scheduler dictionary (#3586)
  • Added fsspec support for profilers (#4162)
  • Added autogenerated helptext to Trainer.add_argparse_args (#4344)
  • Added support for string values in Trainer's profiler parameter (#3656)

Changed

  • Improved error messages for invalid configure_optimizers returns (#3587)
  • Allow changing the logged step value in validation_step (#4130)
  • Allow setting replace_sampler_ddp=True with a distributed sampler already added (#4273)
  • Fixed santized parameters for WandbLogger.log_hyperparams (#4320)

Deprecated

  • Deprecated filepath in ModelCheckpoint (#4213)
  • Deprecated reorder parameter of the auc metric (#4237)
  • Deprecated bool values in Trainer's profiler parameter (#3656)

Fixed

  • Fixed setting device ids in DDP (#4297)
  • Fixed synchronization of best model path in ddp_accelerator (#4323)
  • Fixed WandbLogger not uploading checkpoint artifacts at the end of training (#4341)

Contributors

@ananthsub, @awaelchli, @carmocca, @ddrevicky, @louis-she, @mauvilsa, @rohitgr7, @SeanNaren, @tchaton

If we forgot someone due to not matching commit email with GitHub account, let us know :]

standard weekly patch release

20 Oct 23:12
e0e402d
Compare
Choose a tag to compare

Detail changes

Added

  • Added persistent flag to Metric.add_state (#4195)

Changed

  • Used checkpoint_connector.hpc_save in SLURM (#4217)
  • Moved base req. to root (#4219)

Fixed

  • Fixed hparams assign in init (#4189)
  • Fixed overwrite check for model hooks (#4010)

Contributors

@Borda, @EspenHa, @teddykoker

If we forgot someone due to not matching commit email with GitHub account, let us know :]

fixes a major logging bug for val in 1.0

15 Oct 14:59
5c153c2
Compare
Choose a tag to compare

Fixes the last major bugs for validation logging.
Also removes duplicate charts for metric / metric_loss.
Doing this minor release because correct validation metrics logging is critical.

Details changes

Added

  • Added trace functionality to the function to_torchscript (#4142)

Changed

  • Called on_load_checkpoint before loading state_dict (#4057)

Removed

  • Removed duplicate metric vs step log for train loop (#4173)

Fixed

  • Fixed the self.log problem in validation_step() (#4169)
  • Fixed hparams saving - save the state when save_hyperparameters() is called [in __init__] (#4163)
  • Fixed runtime failure while exporting hparams to yaml (#4158)

Contributors

@Borda, @NumesSanguis, @rohitgr7, @williamFalcon

If we forgot someone due to not matching commit email with GitHub account, let us know :]

minor jit fixes

14 Oct 00:47
bbbc111
Compare
Choose a tag to compare

Obligatory post 1.0 minor release. Main fix is to make Lightning module fully compatible with Jit (had some edge-cases we had not covered).

1.0.0 - General availability

13 Oct 12:12
09c2020
Compare
Choose a tag to compare

Overview

...

Detail changes

  • Added Explained Variance Metric + metric fix (#4013)
  • Added Metric <-> Lightning Module integration tests (#4008)
  • Added parsing OS env vars in Trainer (#4022)
  • Added classification metrics (#4043)
  • Updated explained variance metric (#4024)
  • Enabled plugins (#4041)
  • Enabled custom clusters (#4048)
  • Enabled passing in custom accelerators (#4050)
  • Added LightningModule.toggle_optimizer (#4058)
  • Added LightningModule.manual_backward (#4063)

Changed

Removed

  • Removed output argument from *_batch_end hooks (#3965, #3966)
  • Removed output argument from *_epoch_end hooks (#3967)
  • Removed support for EvalResult and TrainResult (#3968)
  • Removed deprecated trainer flags: overfit_pct, log_save_interval, row_log_interval (#3969)
  • Removed deprecated early_stop_callback (#3982)
  • Removed deprecated model hooks (#3980)
  • Removed deprecated callbacks (#3979)
  • Removed trainer argument in LightningModule.backward [#4056)

Fixed

  • Fixed current_epoch property update to reflect true epoch number inside LightningDataModule, when reload_dataloaders_every_epoch=True. (#3974)
  • Fixed to print scaler value in progress bar (#4053)
  • Fixed mismatch between docstring and code regarding when on_load_checkpoint hook is called (#3996)

Contributors

@ananyahjha93, @Borda, @edenlightning, @hbredin, @rohitgr7, @SkafteNicki, @teddykoker, @williamFalcon

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Buffer release before 1.0

07 Oct 21:16
b4051e7
Compare
Choose a tag to compare

This release is a buffer in case 1.0 breaks any compatibility for people who upgrade. 0.10.0 has all the bug fixes and features of 1.0 but is 100% backward compatible. The 1.0 release following in the next 24 hours.

Overview

The major changes are:

  • Results objects are deprecated (we hated them too haha)
  • This means dataflow and logging have been decoupled

To log:

def any_step(...):
   self.log('something', i_computed)

Separately, return whatever you want from methods:

def training_step(...):
  return loss

or

def training_step(...):
   return {'loss': loss, 'whatever': [1, 'want']}

Detail changes

Added

  • Added new Metrics API. (#3868, [#3921)
  • Enable PyTorch 1.7 compatibility (#3541)
  • Added LightningModule.to_torchscript to support exporting as ScriptModule (#3258)
  • Added warning when dropping unpicklable hparams (#2874)
  • Added EMB similarity (#3349)
  • Added ModelCheckpoint.to_yaml method (#3048)
  • Allow ModelCheckpoint monitor to be None, meaning it will always save ([3630)
  • Disabled optimizers setup during testing (#3059)
  • Added support for datamodules to save and load checkpoints when training (#3563
  • Added support for datamodule in learning rate finder (#3425)
  • Added gradient clip test for native AMP (#3754)
  • Added dist lib to enable syncing anything across devices (#3762)
  • Added broadcast to TPUBackend (#3814)
  • Added XLADeviceUtils class to check XLA device type (#3274)

Changed

  • Refactored accelerator backends:
    • moved TPU xxx_step to backend (#3118)
    • refactored DDP backend forward (#3119)
    • refactored GPU backend __step (#3120)
    • refactored Horovod backend (#3121, #3122)
    • remove obscure forward call in eval + CPU backend ___step (#3123)
    • reduced all simplified forward (#3126)
    • added hook base method (#3127)
    • refactor eval loop to use hooks - use test_mode for if so we can split later (#3129)
    • moved ___step_end hooks (#3130)
    • training forward refactor (#3134)
    • training AMP scaling refactor (#3135)
    • eval step scaling factor (#3136)
    • add eval loop object to streamline eval loop (#3138)
    • refactored dataloader process hook (#3139)
    • refactored inner eval loop (#3141)
    • final inner eval loop hooks (#3154)
    • clean up hooks in run_evaluation (#3156)
    • clean up data reset (#3161)
    • expand eval loop out (#3165)
    • moved hooks around in eval loop (#3195)
    • remove _evaluate fx (#3197)
    • Trainer.fit hook clean up (#3198)
    • DDPs train hooks (#3203)
    • refactor DDP backend (#3204, #3207, #3208, #3209, #3210)
    • reduced accelerator selection (#3211)
    • group prepare data hook (#3212)
    • added data connector (#3285)
    • modular is_overridden (#3290)
    • adding Trainer.tune() (#3293)
    • move run_pretrain_routine -> setup_training (#3294)
    • move train outside of setup training (#3297)
    • move prepare_data to data connector (#3307)
    • moved accelerator router (#3309)
    • train loop refactor - moving train loop to own object (#3310, #3312, #3313, #3314)
    • duplicate data interface definition up into DataHooks class (#3344)
    • inner train loop (#3359, #3361, #3362, #3363, #3365, #3366, #3367, #3368, #3369, #3370, #3371, #3372, #3373, #3374, #3375, #3376, #3385, #3388, #3397)
    • all logging related calls in a connector (#3395)
    • device parser (#3400, #3405)
    • added model connector (#3407)
    • moved eval loop logging to loggers (#3408)
    • moved eval loop (#3412[#3408)
    • trainer/separate argparse (#3421, #3428, #3432)
    • move lr_finder (#3434)
    • organize args (##3435, #3442, #3447, #3448, #3449, #3456)
    • move specific accelerator code (#3457)
    • group connectors (#3472)
    • accelerator connector methods x/n (#3469, #3470, #3474)
    • merge backends (#3476, #3477, #3478, #3480, #3482)
    • apex plugin (#3502)
    • precision plugins (#3504)
    • Result - make monitor default to checkpoint_on to simplify (#3571)
    • reference to the Trainer on the LightningDataModule (#3684)
    • add .log to lightning module (#3686, #3699, #3701, #3704, #3715)
    • enable tracking original metric when step and epoch are both true (#3685)
    • deprecated results obj, added support for simpler comms (#3681)
    • move backends back to individual files (#3712)
    • fixes logging for eval steps (#3763)
    • decoupled DDP, DDP spawn (#3733, #3766, #3767, #3774, #3802, #3806)
    • remove weight loading hack for ddp_cpu (#3808)
    • separate torchelastic from DDP (#3810)
    • separate SLURM from DDP (#3809)
    • decoupled DDP2 (#3816)
    • bug fix with logging val epoch end + monitor (#3812)
    • decoupled DDP, DDP spawn (#3733, #3817, #3819, #3927)
    • callback system and init DDP (#3836)
    • adding compute environments (#3837, [#3842)
    • epoch can now log independently (#3843)
    • test selecting the correct backend. temp backends while slurm and TorchElastic are decoupled (#3848)
    • fixed init_slurm_connection causing hostname errors (#3856)
    • moves init apex from LM to apex connector (#3923)
    • moves sync bn to each backend (#3925)
    • moves configure ddp to each backend (#3924)
  • Deprecation warning (#3844)
  • Changed LearningRateLogger to LearningRateMonitor (#3251)
  • Used fsspec instead of gfile for all IO (#3320)
    • Swaped torch.load for fsspec load in DDP spawn backend (#3787)
    • Swaped torch.load for fsspec load in cloud_io loading (#3692)
    • Added support for to_disk() to use remote filepaths with fsspec (#3930)
    • Updated model_checkpoint's to_yaml to use fsspec open (#3801)
    • Fixed fsspec is inconsistant when doing fs.ls (#3805)
  • Refactor GPUStatsMonitor to improve training speed (#3257)
  • Changed IoU score behavior for classes absent in target and pred (#3098)
  • Changed IoU remove_bg bool to ignore_index optional int (#3098)
  • Changed defaults of save_top_k and save_last to None in ModelCheckpoint (#3680)
  • row_log_interval and log_save_interval are now based on training loop's global_step instead of epoch-internal batch index (#3667)
  • Silenced some warnings. verified ddp refactors (#3483)
  • Cleaning up stale logger tests (#3490)
  • Allow ModelCheckpoint monitor to be None (#3633)
  • Enable None model checkpoint default (#3669)
  • Skipped best_model_path if checkpoint_callback is None (#2962)
  • Used raise .. from .. to explicitly chain exceptions (#3750)
  • Mocking loggers (#3596, #3617, #3851, #3859, #3884, #3853, #3910, #3889, #3926)
  • Write predictions in LightningModule instead of EvalResult [#3882

Deprecated

  • Deprecated TrainResult and EvalResult, use self.log and self.write from the LightningModule to log metrics and write predictions. training_step can now only return a scalar (for the loss) or a dictionary with anything you want. (#3681)
  • Deprecate early_stop_callback Trainer argument (#3845)
  • Rename Trainer arguments row_log_interval >> log_every_n_steps and log_save_interval >> flush_logs_every_n_steps (#3748)

Removed

  • Removed experimental Metric API (#3868, #3943, #3949, #3946), listed changes before final removal:
    • Added EmbeddingSimilarity metric (#3349, [#3358)
    • Added hooks to metric module interface (#2528)
    • Added error when AUROC metric is used for multiclass problems (#3350)
    • Fixed ModelCheckpoint with save_top_k=-1 option not tracking the best models when a monitor metric is available (#3735)
    • Fixed counter-intuitive error being thrown in Accuracy metric for zero target tensor (#3764)
    • Fixed aggregation of metrics (#3517)
    • Fixed Metric aggregation (#3321)
    • Fixed RMSLE metric (#3188)
    • Renamed reduction to class_reduction in classification metrics (#3322)
    • Changed class_reduction similar to sklearn for classification metrics (#3322)
    • Renaming of precision recall metric (#3308)

Fixed

  • Fixed on_train_batch_start hook to end epoch early (#3700)
  • Fixed num_sanity_val_steps is clipped to limit_val_batches (#2917)
  • Fixed ONNX model save on GPU (#3145)
  • Fixed GpuUsageLogger to work on different platforms (#3008)
  • Fixed auto-scale batch size not dumping auto_lr_find parameter (#3151)
  • Fixed batch_outputs with optimizer frequencies (#3229)
  • Fixed setting batch size in LightningModule.datamodule when using auto_scale_batch_size (#3266)
  • Fixed Horovod distributed backend compatibility with native AMP (#3404)
  • Fixed batch size auto scaling exceeding the size of the dataset (#3271)
  • Fixed getting experiment_id from MLFlow only once instead of each training loop (#3394)
  • Fixed overfit_batches which now correctly disables shuffling for the training loader. (#3501)
  • Fixed gradient norm tracking for row_log_interval > 1 (#3489)
  • Fixed ModelCheckpoint name formatting ([3164)
  • Fixed auto-scale batch size (#3151)
  • Fixed example implementation of AutoEncoder (#3190)
  • Fixed invalid paths when remote logging with TensorBoard (#3236)
  • Fixed change t() to transpose() as XLA devices do not support .t() on 1-dim tensor (#3252)
  • Fixed (weights only) checkpoints loading without PL (#3287)
  • Fixed gather_all_tensors cross GPUs in DDP (#3319)
  • Fixed CometML save dir (#3419)
  • Fixed forward key metrics (#3467)
  • Fixed normalize mode at confusion matrix (replace NaNs with zeros) (#3465)
  • Fixed global step increment in training loop when training_epoch_end hook is used (#3673)
  • Fixed dataloader shuffling not getting turned off with overfit_batches > 0 and distributed_backend = "ddp" (#3534)
  • Fixed determinism in DDPSpawnBackend when using seed_everything in main process (#3335)
  • Fixed ModelCheckpoint period to actually save every period epochs (#363...
Read more