Releases: Lightning-AI/pytorch-lightning
Releases · Lightning-AI/pytorch-lightning
Stability and additional improvements
App
Added
- Added a possibility to set up basic authentication for Lightning apps (#16105)
Changed
- The LoadBalancer now uses internal ip + port instead of URL exposed (#16119)
- Added support for logging in different trainer stages with
DeviceStatsMonitor
(#16002) - Changed
lightning_app.components.serve.gradio
tolightning_app.components.serve.gradio_server
(#16201) - Made cluster creation/deletion async by default (#16185)
Fixed
- Fixed not being able to run multiple lightning apps locally due to port collision (#15819)
- Avoid
relpath
bug on Windows (#16164) - Avoid using the deprecated
LooseVersion
(#16162) - Porting fixes to autoscaler component (#16249)
- Fixed a bug where
lightning login
with env variables would not correctly save the credentials (#16339)
Fabric
Added
- Added
Fabric.launch()
to programmatically launch processes (e.g. in Jupyter notebook) (#14992) - Added the option to launch Fabric scripts from the CLI, without the need to wrap the code into the
run
method (#14992) - Added
Fabric.setup_module()
andFabric.setup_optimizers()
to support strategies that need to set up the model before an optimizer can be created (#15185) - Added support for Fully Sharded Data Parallel (FSDP) training in Lightning Lite (#14967)
- Added
lightning_fabric.accelerators.find_usable_cuda_devices
utility function (#16147) - Added basic support for LightningModules (#16048)
- Added support for managing callbacks via
Fabric(callbacks=...)
and emitting events throughFabric.call()
(#16074) - Added Logger support (#16121)
- Added
Fabric(loggers=...)
to support different Logger frameworks in Fabric - Added
Fabric.log
for logging scalars using multiple loggers - Added
Fabric.log_dict
for logging a dictionary of multiple metrics at once - Added
Fabric.loggers
andFabric.logger
attributes to access the individual logger instances - Added support for calling
self.log
andself.log_dict
in a LightningModule when using Fabric - Added access to
self.logger
andself.loggers
in a LightningModule when using Fabric
- Added
- Added
lightning_fabric.loggers.TensorBoardLogger
(#16121) - Added
lightning_fabric.loggers.CSVLogger
(#16346) - Added support for a consistent
.zero_grad(set_to_none=...)
on the wrapped optimizer regardless of which strategy is used (#16275)
Changed
- Renamed the class
LightningLite
toFabric
(#15932, #15938) - The
Fabric.run()
method is no longer abstract (#14992) - The
XLAStrategy
now inherits fromParallelStrategy
instead ofDDPSpawnStrategy
(#15838) - Merged the implementation of
DDPSpawnStrategy
intoDDPStrategy
and removedDDPSpawnStrategy
(#14952) - The dataloader wrapper returned from
.setup_dataloaders()
now calls.set_epoch()
on the distributed sampler if one is used (#16101) - Renamed
Strategy.reduce
toStrategy.all_reduce
in all strategies (#16370) - When using multiple devices, the strategy now defaults to "ddp" instead of "ddp_spawn" when none is set (#16388)
Removed
- Removed support for FairScale's sharded training (
strategy='ddp_sharded'|'ddp_sharded_spawn'
). Use Fully-Sharded Data Parallel instead (strategy='fsdp'
) (#16329)
Fixed
- Restored sampling parity between PyTorch and Fabric dataloaders when using the
DistributedSampler
(#16101) - Fixes an issue where the error message wouldn't tell the user the real value that was passed through the CLI (#16334)
PyTorch
Added
- Added support for native logging of
MetricCollection
with enabled compute groups (#15580) - Added support for custom artifact names in
pl.loggers.WandbLogger
(#16173) - Added support for DDP with
LRFinder
(#15304) - Added utilities to migrate checkpoints from one Lightning version to another (#15237)
- Added support to upgrade all checkpoints in a folder using the
pl.utilities.upgrade_checkpoint
script (#15333) - Add an axes argument
ax
to the.lr_find().plot()
to enable writing to a user-defined axes in a matplotlib figure (#15652) - Added
log_model
parameter toMLFlowLogger
(#9187) - Added a check to validate that wrapped FSDP models are used while initializing optimizers (#15301)
- Added a warning when
self.log(..., logger=True)
is called without a configured logger (#15814) - Added support for colossalai 0.1.11 (#15888)
- Added
LightningCLI
support for optimizer and learning schedulers via callable type dependency injection (#15869) - Added support for activation checkpointing for the
DDPFullyShardedNativeStrategy
strategy (#15826) - Added the option to set
DDPFullyShardedNativeStrategy(cpu_offload=True|False)
via bool instead of needing to pass a configuration object (#15832) - Added info message for Ampere CUDA GPU users to enable tf32 matmul precision (#16037)
- Added support for returning optimizer-like classes in
LightningModule.configure_optimizers
(#16189)
Changed
- Switch from
tensorboard
totensorboardx
inTensorBoardLogger
(#15728) - From now on, Lightning Trainer and
LightningModule.load_from_checkpoint
automatically upgrade the loaded checkpoint if it was produced in an old version of Lightning (#15237) Trainer.{validate,test,predict}(ckpt_path=...)
no longer restores theTrainer.global_step
andtrainer.current_epoch
value from the checkpoints - From now on, onlyTrainer.fit
will restore this value (#15532)- The
ModelCheckpoint.save_on_train_epoch_end
attribute is now computed dynamically every epoch, accounting for changes to the validation dataloaders (#15300) - The Trainer now raises an error if it is given multiple stateful callbacks of the same time with colliding state keys (#15634)
MLFlowLogger
now logs hyperparameters and metrics in batched API calls (#15915)- Overriding the
on_train_batch_{start,end}
hooks in conjunction with taking adataloader_iter
in thetraining_step
no longer errors out and instead shows a warning (#16062) - Move
tensorboardX
to extra dependencies. Use theCSVLogger
by default (#16349) - Drop PyTorch 1.9 support (#15347)
Deprecated
- Deprecated
description
,env_prefix
andenv_parse
parameters inLightningCLI.__init__
in favour of giving them throughparser_kwargs
(#15651) - Deprecated
pytorch_lightning.profiler
in favor ofpytorch_lightning.profilers
(#16059) - Deprecated
Trainer(auto_select_gpus=...)
in favor ofpytorch_lightning.accelerators.find_usable_cuda_devices
(#16147) - Deprecated
pytorch_lightning.tuner.auto_gpu_select.{pick_single_gpu,pick_multiple_gpus}
in favor ofpytorch_lightning.accelerators.find_usable_cuda_devices
(#16147) nvidia/apex
deprecation (#16039)- Deprecated
pytorch_lightning.plugins.NativeMixedPrecisionPlugin
in favor ofpytorch_lightning.plugins.MixedPrecisionPlugin
- Deprecated the
LightningModule.optimizer_step(using_native_amp=...)
argument - Deprecated the
Trainer(amp_backend=...)
argument - Deprecated the
Trainer.amp_backend
property - Deprecated the
Trainer(amp_level=...)
argument - Deprecated the
pytorch_lightning.plugins.ApexMixedPrecisionPlugin
class - Deprecates the
pytorch_lightning.utilities.enums.AMPType
enum - Deprecates the
DeepSpeedPrecisionPlugin(amp_type=..., amp_level=...)
arguments
- Deprecated
horovod
deprecation (#16141)- Deprecated
Trainer(strategy="horovod")
- Deprecated the
HorovodStrategy
class
- Deprecated
- Deprecated
pytorch_lightning.lite.LightningLite
in favor oflightning.fabric.Fabric
(#16314) FairScale
deprecation (in favor of PyTorch's FSDP implementation) (#16353)- Deprecated the
pytorch_lightning.overrides.fairscale.LightningShardedDataParallel
class - Deprecated the
pytorch_lightning.plugins.precision.fully_sharded_native_amp.FullyShardedNativeMixedPrecisionPlugin
class - Deprecated the
pytorch_lightning.plugins.precision.sharded_native_amp.ShardedNativeMixedPrecisionPlugin
class - Deprecated the
pytorch_lightning.strategies.fully_sharded.DDPFullyShardedStrategy
class - Deprecated the
pytorch_lightning.strategies.sharded.DDPShardedStrategy
class - Deprecated the
pytorch_lightning.strategies.sharded_spawn.DDPSpawnShardedStrategy
class
- Deprecated the
Removed
- Removed deprecated
pytorch_lightning.utilities.memory.get_gpu_memory_map
in favor ofpytorch_lightning.accelerators.cuda.get_nvidia_gpu_stats
(#15617) - Temporarily removed support for Hydra multi-run (#15737)
- Removed deprecated
pytorch_lightning.profiler.base.AbstractProfiler
in favor ofpytorch_lightning.profilers.profiler.Profiler
(#15637) - Removed deprecated
pytorch_lightning.profiler.base.BaseProfiler
in favor ofpytorch_lightning.profilers.profiler.Profiler
(#15637) - Removed deprecated code in
pytorch_lightning.utilities.meta
(#16038) - Removed the deprecated
LightningDeepSpeedModule
(#16041) - Removed the deprecated
pytorch_lightning.accelerators.GPUAccelerator
in favor ofpytorch_lightning.accelerators.CUDAAccelerator
(#16050) - Removed the deprecated
pytorch_lightning.profiler.*
classes in favor ofpytorch_lightning.profilers
(#16059) - Removed the deprecated
pytorch_lightning.utilities.cli
module in favor ofpytorch_lightning.cli
(#16116) - Removed the deprecated
pytorch_lightning.loggers.base
module in favor ofpytorch_lightning.loggers.logger
(#16120) - Removed the deprecated
pytorch_lightning.loops.base
module in favor ofpytorch_lightning.loops.loop
(#16142) - Removed the deprecated
pytorch_lightning.core.lightning
module in favor ofpytorch_lightning.core.module
(#16318) - Removed the deprecated
pytorch_lightning.callbacks.base
module in favor ofpytorch_lightning.callbacks.callback
(#16319) - Removed the deprecated
Trainer.reset_train_val_dataloaders()
in favor ofTrainer.reset_{train,val}_dataloader
(#16131) - Removed support for `LightningCLI(seed_ever...
Weekly patch release
App
Added
- Added partial support for fastapi
Request
annotation inconfigure_api
handlers (#16047) - Added a nicer UI with URL and examples for the autoscaler component (#16063)
- Enabled users to have more control over scaling out/in intervals (#16093)
- Added more datatypes to the serving component (#16018)
- Added
work.delete
method to delete the work (#16103) - Added
display_name
property to LightningWork for the cloud (#16095) - Added
ColdStartProxy
to the AutoScaler (#16094) - Added status endpoint, enable
ready
(#16075) - Implemented
ready
for components (#16129)
Changed
- The default
start_method
for creating Work processes locally on macOS is now 'spawn' (previously 'fork') (#16089) - The utility
lightning.app.utilities.cloud.is_running_in_cloud
now returnsTrue
during the loading of the app locally when running with--cloud
(#16045) - Updated Multinode Warning (#16091)
- Updated app testing (#16000)
- Changed overwrite to
True
(#16009) - Simplified messaging in cloud dispatch (#16160)
- Added annotations endpoint (#16159)
Fixed
- Fixed
PythonServer
messaging "Your app has started" (#15989) - Fixed auto-batching to enable batching for requests coming even after the batch interval but is in the queue (#16110)
- Fixed a bug where
AutoScaler
would fail with min_replica=0 (#16092 - Fixed a non-thread safe deepcopy in the scheduler (#16114)
- Fixed HTTP Queue sleeping for 1 sec by default if no delta was found (#16114)
- Fixed the endpoint info tab not showing up in the
AutoScaler
UI (#16128) - Fixed an issue where an exception would be raised in the logs when using a recent version of
streamlit
(#16139) - Fixed e2e tests (#16146)
Full Changelog: 1.8.5.post0...1.8.6
Minor patch release
App
- Fixed install/upgrade - removing single quote (#16079)
- Fixed bug where components that are re-instantiated several times failed to initialize if they were modifying
self.lightningignore
(#16080) - Fixed a bug where apps that had previously been deleted could not be run again from the CLI (#16082)
Pytorch
- Add function to remove checkpoint to allow override for extended classes (#16067)
Full Changelog: 1.8.5...1.8.5.post0
Weekly patch release
App
Added
- Added
Lightning{Flow,Work}.lightningignores
attributes to programmatically ignore files before uploading to the cloud (#15818) - Added a progress bar while connecting to an app through the CLI (#16035)
- Support running on multiple clusters (#16016)
- Added guards to cluster deletion from cli (#16053)
- Added creation of the default
.lightningignore
that ignoresvenv
(#16056)
Changed
- Cleanup cluster waiting (#16054)
Fixed
- Fixed
DDPStrategy
import in app framework (#16029) - Fixed
AutoScaler
raising an exception when non-default cloud compute is specified (#15991) - Fixed and improvements of login flow (#16052)
- Fixed the debugger detection mechanism for the lightning App in VSCode (#16068)
Pytorch
- some minor cleaning
Full Changelog: 1.8.4.post0...1.8.5
Minor patch release
App
- Fixed MultiNode Component to use separate cloud computes (#15965)
- Fixed Registration for CloudComputes of Works in
L.app.structures
(#15964) - Fixed a bug where auto-upgrading to the latest lightning via the CLI could get stuck in a loop (#15984)
Pytorch
- Fixed the
XLAProfiler
not recording anything due to mismatching of action names (#15885)
Full Changelog: 1.8.4...1.8.4.post0
Dependency hotfix
Weekly patch release
App
Added
- Add
code_dir
argument to tracer run (#15771) - Added the CLI command
lightning run model
to launch aLightningLite
accelerated script (#15506) - Added the CLI command
lightning delete app
to delete a lightning app on the cloud (#15783) - Added a CloudMultiProcessBackend which enables running a child App from within the Flow in the cloud (#15800)
- Utility for pickling work object safely even from a child process (#15836)
- Added
AutoScaler
component (#15769) - Added the property
ready
of the LightningFlow to inform when theOpen App
should be visible (#15921) - Added private work attributed
_start_method
to customize how to start the works (#15923) - Added a
configure_layout
method to theLightningWork
which can be used to control how the work is handled in the layout of a parent flow (#15926) - Added the ability to run a Lightning App or Component directly from the Gallery using
lightning run app organization/name
(#15941) - Added automatic conversion of list and dict of works and flows to structures (#15961)
Changed
- The
MultiNode
components now warn the user when running withnum_nodes > 1
locally (#15806) - Cluster creation and deletion now waits by default [#15458
- Running an app without a UI locally no longer opens the browser (#15875)
- Show a message when
BuildConfig(requirements=[...])
is passed but arequirements.txt
file is already present in the Work (#15799) - Show a message when
BuildConfig(dockerfile="...")
is passed but aDockerfile
file is already present in the Work (#15799) - Dropped name column from cluster list (#15721)
- Apps without UIs no longer activate the "Open App" button when running in the cloud (#15875)
- Wait for full file to be transferred in Path / Payload (#15934)
Removed
- Removed the
SingleProcessRuntime
(#15933)
Fixed
- Fixed SSH CLI command listing stopped components (#15810)
- Fixed bug when launching apps on multiple clusters (#15484)
- Fixed Sigterm Handler causing thread lock which caused KeyboardInterrupt to hang (#15881)
- Fixed MPS error for multinode component (defaults to cpu on mps devices now as distributed operations are not supported by pytorch on mps) (#15748)
- Fixed the work not stopped when successful when passed directly to the LightningApp (#15801)
- Fixed the PyTorch Inference locally on GPU (#15813)
- Fixed the
enable_spawn
method of theWorkRunExecutor
(#15812) - Fixed require/import decorator (#15849)
- Fixed a bug where using
L.app.structures
would cause multiple apps to be opened and fail with an error in the cloud (#15911) - Fixed PythonServer generating noise on M1 (#15949)
- Fixed multiprocessing breakpoint (#15950)
- Fixed detection of a Lightning App running in debug mode (#15951)
- Fixed
ImportError
on Multinode if package not present (#15963)
Lite
- Fixed
shuffle=False
having no effect when using DDP/DistributedSampler (#15931)
Pytorch
Changed
- Direct support for compiled models (#15922)
Fixed
- Fixed issue with unsupported torch.inference_mode() on hpu backends (#15918)
- Fixed LRScheduler import for PyTorch 2.0 (#15940)
- Fixed
fit_loop.restarting
to beFalse
for lr finder (#15620) - Fixed
torch.jit.script
-ing a LightningModule causing an unintended error message about deprecateduse_amp
property (#15947)
Full Changelog: 1.8.3...1.8.4
Hotfix for Python Server
Hotfix for requirements
Revert/s3fs (#15792) * revert s3fs * post
Weekly patch release
App
Changed
- Deduplicate top-level lighting CLI command groups (#15761)
lightning add ssh-key
CLI command has been transitioned tolightning create ssh-key
lightning remove ssh-key
CLI command has been transitioned tolightning delete ssh-key
- Set Torch inference mode for prediction (#15719)
- Improved
LightningTrainerScript
start-up time (#15751) - Disable XSRF protection in
StreamlitFrontend
to support upload in localhost (#15684)
Fixed
Lite
Changed
- Temporarily removed support for Hydra multi-run (#15737)
Pytorch
Changed
- Temporarily removed support for Hydra multi-run (#15737)
- Switch from
tensorboard
totensorboardx
inTensorBoardLogger
(#15728)
Full Changelog: 1.8.2...1.8.3