Skip to content

sumanmichael/seq2seq-attention-ocr-pytorch

Repository files navigation

crnn_seq2seq_ocr_pytorch

This software implements the Convolutional Recurrent Neural Network (CRNN), a combination of CNN and Sequence to sequence model with attention for image-based sequence recognition tasks, such as scene text recognition and OCR in pytorch using pytorch-lightning.

Dependencies

All dependencies should be installed are as follows:

  • numpy
  • opencv-python
  • Pillow
  • torch
  • torchvision
  • torchmetrics
  • pytorch-lightning
  • jsonargparse[signatures]
  • rapidfuzz

Required packages can be installed with

pip3 install -r requirements.txt

Train a new model

  • Create train_list.txt and val_list.txt as follow format
data/dataset/20210420_093920_rst-l11.jpg निद्रातः स्वयमेव भो अनुचरा! उबुध्य चात्मेच्छया,
data/dataset/20210420_093920_rst-l12.jpg प्रोत्थास्यामि न कश्चिदेत्य हि पुमान् मां जातुचिद् बोधयेत् । 
data/dataset/20210420_093920_rst-l13.jpg प्रोच्यैवं शयनं चकार नृपतिस्तत्पादपद्म पुन 
data/dataset/20210420_093920_rst-l14.jpg र्दास्यौ द्व अनुकूलमेव विनयात् संवाहयांचक्रतुः ॥१४॥
  • Modify YAML & Start training
python3 trainer.py --config default_config.yaml

Help - YAML Config & CLI

➜ python trainer.py -h
usage: trainer.py [-h] [--config CONFIG] [--print_config [={comments,skip_null}+]] [--seed_everything SEED_EVERYTHING] [--trainer.logger LOGGER]
                  [--trainer.checkpoint_callback {true,false}] [--trainer.callbacks CALLBACKS] [--trainer.default_root_dir DEFAULT_ROOT_DIR]
                  [--trainer.gradient_clip_val GRADIENT_CLIP_VAL] [--trainer.gradient_clip_algorithm GRADIENT_CLIP_ALGORITHM] [--trainer.process_position PROCESS_POSITION]
                  [--trainer.num_nodes NUM_NODES] [--trainer.num_processes NUM_PROCESSES] [--trainer.gpus GPUS] [--trainer.auto_select_gpus {true,false}]
                  [--trainer.tpu_cores TPU_CORES] [--trainer.ipus IPUS] [--trainer.log_gpu_memory LOG_GPU_MEMORY] [--trainer.progress_bar_refresh_rate PROGRESS_BAR_REFRESH_RATE]
                  [--trainer.overfit_batches OVERFIT_BATCHES] [--trainer.track_grad_norm TRACK_GRAD_NORM] [--trainer.check_val_every_n_epoch CHECK_VAL_EVERY_N_EPOCH]
                  [--trainer.fast_dev_run FAST_DEV_RUN] [--trainer.accumulate_grad_batches ACCUMULATE_GRAD_BATCHES] [--trainer.max_epochs MAX_EPOCHS]
                  [--trainer.min_epochs MIN_EPOCHS] [--trainer.max_steps MAX_STEPS] [--trainer.min_steps MIN_STEPS] [--trainer.max_time MAX_TIME]
                  [--trainer.limit_train_batches LIMIT_TRAIN_BATCHES] [--trainer.limit_val_batches LIMIT_VAL_BATCHES] [--trainer.limit_test_batches LIMIT_TEST_BATCHES]
                  [--trainer.limit_predict_batches LIMIT_PREDICT_BATCHES] [--trainer.val_check_interval VAL_CHECK_INTERVAL]
                  [--trainer.flush_logs_every_n_steps FLUSH_LOGS_EVERY_N_STEPS] [--trainer.log_every_n_steps LOG_EVERY_N_STEPS] [--trainer.accelerator ACCELERATOR]
                  [--trainer.sync_batchnorm {true,false}] [--trainer.precision PRECISION] [--trainer.weights_summary WEIGHTS_SUMMARY] [--trainer.weights_save_path WEIGHTS_SAVE_PATH]
                  [--trainer.num_sanity_val_steps NUM_SANITY_VAL_STEPS] [--trainer.truncated_bptt_steps TRUNCATED_BPTT_STEPS]
                  [--trainer.resume_from_checkpoint RESUME_FROM_CHECKPOINT] [--trainer.profiler PROFILER] [--trainer.benchmark {true,false}] [--trainer.deterministic {true,false}]
                  [--trainer.reload_dataloaders_every_n_epochs RELOAD_DATALOADERS_EVERY_N_EPOCHS] [--trainer.reload_dataloaders_every_epoch {true,false}]
                  [--trainer.auto_lr_find AUTO_LR_FIND] [--trainer.replace_sampler_ddp {true,false}] [--trainer.terminate_on_nan {true,false}]
                  [--trainer.auto_scale_batch_size AUTO_SCALE_BATCH_SIZE] [--trainer.prepare_data_per_node {true,false}] [--trainer.plugins PLUGINS]
                  [--trainer.amp_backend AMP_BACKEND] [--trainer.amp_level AMP_LEVEL] [--trainer.distributed_backend DISTRIBUTED_BACKEND] [--trainer.move_metrics_to_cpu {true,false}]
                  [--trainer.multiple_trainloader_mode MULTIPLE_TRAINLOADER_MODE] [--trainer.stochastic_weight_avg {true,false}] [--model.hidden_size HIDDEN_SIZE]
                  [--model.max_enc_seq_len MAX_ENC_SEQ_LEN] [--model.teaching_forcing_prob TEACHING_FORCING_PROB] [--model.learning_rate LEARNING_RATE] [--model.dropout_p DROPOUT_P]
                  [--model.encoder_pth ENCODER_PTH] [--model.decoder_pth DECODER_PTH] [--model.save_model_dir SAVE_MODEL_DIR] [--data.train_list TRAIN_LIST]
                  [--data.val_list VAL_LIST] [--data.img_height IMG_HEIGHT] [--data.img_width IMG_WIDTH] [--data.num_workers NUM_WORKERS] [--data.batch_size BATCH_SIZE]
                  [--data.random_sampler {true,false}] [--early_stopping.monitor MONITOR] [--early_stopping.min_delta MIN_DELTA] [--early_stopping.patience PATIENCE]
                  [--early_stopping.verbose {true,false}] [--early_stopping.mode MODE] [--early_stopping.strict {true,false}] [--early_stopping.check_finite {true,false}]
                  [--early_stopping.stopping_threshold STOPPING_THRESHOLD] [--early_stopping.divergence_threshold DIVERGENCE_THRESHOLD]
                  [--early_stopping.check_on_train_epoch_end {true,false}] [--model_checkpoint.dirpath DIRPATH] [--model_checkpoint.filename FILENAME]
                  [--model_checkpoint.monitor MONITOR] [--model_checkpoint.verbose {true,false}] [--model_checkpoint.save_last {true,false,null}]
                  [--model_checkpoint.save_top_k SAVE_TOP_K] [--model_checkpoint.save_weights_only {true,false}] [--model_checkpoint.mode MODE]
                  [--model_checkpoint.auto_insert_metric_name {true,false}] [--model_checkpoint.every_n_train_steps EVERY_N_TRAIN_STEPS]
                  [--model_checkpoint.train_time_interval.help CLASS] [--model_checkpoint.train_time_interval TRAIN_TIME_INTERVAL]
                  [--model_checkpoint.every_n_val_epochs EVERY_N_VAL_EPOCHS] [--model_checkpoint.period PERIOD] [--encoder_optimizer.lr LR] [--encoder_optimizer.betas BETAS]
                  [--encoder_optimizer.eps EPS] [--encoder_optimizer.weight_decay WEIGHT_DECAY] [--encoder_optimizer.amsgrad AMSGRAD] [--decoder_optimizer.lr LR]
                  [--decoder_optimizer.betas BETAS] [--decoder_optimizer.eps EPS] [--decoder_optimizer.weight_decay WEIGHT_DECAY] [--decoder_optimizer.amsgrad AMSGRAD]

pytorch-lightning trainer command line tool

optional arguments:
  -h, --help            Show this help message and exit.
  --config CONFIG       Path to a configuration file in json or yaml format.
  --print_config [={comments,skip_null}+]
                        Print configuration and exit.
  --seed_everything SEED_EVERYTHING
                        Set to an int to run seed_everything with this value before classes instantiation (type: Union[int, null], default: null)

Customize every aspect of training via flags:
  --trainer.logger LOGGER
                        Logger (or iterable collection of loggers) for experiment tracking. A ``True`` value uses the default ``TensorBoardLogger``. ``False`` will disable logging.
                        (type: Union[LightningLoggerBase, Iterable[LightningLoggerBase], bool], default: True)
  --trainer.checkpoint_callback {true,false}
                        If ``True``, enable checkpointing. It will configure a default ModelCheckpoint callback if there is no user-defined ModelCheckpoint in
                        :paramref:`~pytorch_lightning.trainer.trainer.Trainer.callbacks`. (type: bool, default: True)
  --trainer.callbacks CALLBACKS
                        Add a callback or list of callbacks. (type: Union[List[Callback], Callback, null], default: null)
  --trainer.default_root_dir DEFAULT_ROOT_DIR
                        Default path for logs and weights when no logger/ckpt_callback passed. Default: ``os.getcwd()``. Can be remote file paths such as `s3://mybucket/path` or
                        'hdfs://path/' (type: Union[str, null], default: null)
  --trainer.gradient_clip_val GRADIENT_CLIP_VAL
                        0 means don't clip. (type: float, default: 0.0)
  --trainer.gradient_clip_algorithm GRADIENT_CLIP_ALGORITHM
                        'value' means clip_by_value, 'norm' means clip_by_norm. Default: 'norm' (type: str, default: norm)
  --trainer.process_position PROCESS_POSITION
                        orders the progress bar when running multiple models on same machine. (type: int, default: 0)
  --trainer.num_nodes NUM_NODES
                        number of GPU nodes for distributed training. (type: int, default: 1)
  --trainer.num_processes NUM_PROCESSES
                        number of processes for distributed training with distributed_backend="ddp_cpu" (type: int, default: 1)
  --trainer.gpus GPUS   number of gpus to train on (int) or which GPUs to train on (list or str) applied per node (type: Union[int, str, List[int], null], default: null)
  --trainer.auto_select_gpus {true,false}
                        If enabled and `gpus` is an integer, pick available gpus automatically. This is especially useful when GPUs are configured to be in "exclusive mode", such
                        that only one process at a time can access them. (type: bool, default: False)
  --trainer.tpu_cores TPU_CORES
                        How many TPU cores to train on (1 or 8) / Single TPU to train on [1] (type: Union[int, str, List[int], null], default: null)
  --trainer.ipus IPUS   How many IPUs to train on. (type: Union[int, null], default: null)
  --trainer.log_gpu_memory LOG_GPU_MEMORY
                        None, 'min_max', 'all'. Might slow performance (type: Union[str, null], default: null)
  --trainer.progress_bar_refresh_rate PROGRESS_BAR_REFRESH_RATE
                        How often to refresh progress bar (in steps). Value ``0`` disables progress bar. Ignored when a custom progress bar is passed to
                        :paramref:`~Trainer.callbacks`. Default: None, means a suitable value will be chosen based on the environment (terminal, Google COLAB, etc.). (type:
                        Union[int, null], default: null)
  --trainer.overfit_batches OVERFIT_BATCHES
                        Overfit a fraction of training data (float) or a set number of batches (int). (type: Union[int, float], default: 0.0)
  --trainer.track_grad_norm TRACK_GRAD_NORM
                        -1 no tracking. Otherwise tracks that p-norm. May be set to 'inf' infinity-norm. (type: Union[int, float, str], default: -1)
  --trainer.check_val_every_n_epoch CHECK_VAL_EVERY_N_EPOCH
                        Check val every n train epochs. (type: int, default: 5)
  --trainer.fast_dev_run FAST_DEV_RUN
                        runs n if set to ``n`` (int) else 1 if set to ``True`` batch(es) of train, val and test to find any bugs (ie: a sort of unit test). (type: Union[int, bool],
                        default: False)
  --trainer.accumulate_grad_batches ACCUMULATE_GRAD_BATCHES
                        Accumulates grads every k batches or as set up in the dict. (type: Union[int, Dict[int, int], List[list]], default: 1)
  --trainer.max_epochs MAX_EPOCHS
                        Stop training once this number of epochs is reached. Disabled by default (None). If both max_epochs and max_steps are not specified, defaults to
                        ``max_epochs`` = 1000. (type: Union[int, null], default: 3)
  --trainer.min_epochs MIN_EPOCHS
                        Force training for at least these many epochs. Disabled by default (None). If both min_epochs and min_steps are not specified, defaults to ``min_epochs`` = 1.
                        (type: Union[int, null], default: null)
  --trainer.max_steps MAX_STEPS
                        Stop training after this number of steps. Disabled by default (None). (type: Union[int, null], default: null)
  --trainer.min_steps MIN_STEPS
                        Force training for at least these number of steps. Disabled by default (None). (type: Union[int, null], default: null)
  --trainer.max_time MAX_TIME
                        Stop training after this amount of time has passed. Disabled by default (None). The time duration can be specified in the format DD:HH:MM:SS (days, hours,
                        minutes seconds), as a :class:`datetime.timedelta`, or a dictionary with keys that will be passed to :class:`datetime.timedelta`. (type: Union[str, timedelta,
                        Dict[str, int], null], default: null)
  --trainer.limit_train_batches LIMIT_TRAIN_BATCHES
                        How much of training dataset to check (float = fraction, int = num_batches) (type: Union[int, float], default: 1.0)
  --trainer.limit_val_batches LIMIT_VAL_BATCHES
                        How much of validation dataset to check (float = fraction, int = num_batches) (type: Union[int, float], default: 1.0)
  --trainer.limit_test_batches LIMIT_TEST_BATCHES
                        How much of test dataset to check (float = fraction, int = num_batches) (type: Union[int, float], default: 1.0)
  --trainer.limit_predict_batches LIMIT_PREDICT_BATCHES
                        How much of prediction dataset to check (float = fraction, int = num_batches) (type: Union[int, float], default: 1.0)
  --trainer.val_check_interval VAL_CHECK_INTERVAL
                        How often to check the validation set. Use float to check within a training epoch, use int to check every n steps (batches). (type: Union[int, float],
                        default: 1.0)
  --trainer.flush_logs_every_n_steps FLUSH_LOGS_EVERY_N_STEPS
                        How often to flush logs to disk (defaults to every 100 steps). (type: int, default: 100)
  --trainer.log_every_n_steps LOG_EVERY_N_STEPS
                        How often to log within steps (defaults to every 50 steps). (type: int, default: 5)
  --trainer.accelerator ACCELERATOR
                        Previously known as distributed_backend (dp, ddp, ddp2, etc...). Can also take in an accelerator object for custom hardware. (type: Union[str, Accelerator,
                        null], default: null)
  --trainer.sync_batchnorm {true,false}
                        Synchronize batch norm layers between process groups/whole world. (type: bool, default: False)
  --trainer.precision PRECISION
                        Double precision (64), full precision (32) or half precision (16). Can be used on CPU, GPU or TPUs. (type: int, default: 32)
  --trainer.weights_summary WEIGHTS_SUMMARY
                        Prints a summary of the weights when training begins. (type: Union[str, null], default: top)
  --trainer.weights_save_path WEIGHTS_SAVE_PATH
                        Where to save weights if specified. Will override default_root_dir for checkpoints only. Use this if for whatever reason you need the checkpoints stored in a
                        different place than the logs written in `default_root_dir`. Can be remote file paths such as `s3://mybucket/path` or 'hdfs://path/' Defaults to
                        `default_root_dir`. (type: Union[str, null], default: null)
  --trainer.num_sanity_val_steps NUM_SANITY_VAL_STEPS
                        Sanity check runs n validation batches before starting the training routine. Set it to `-1` to run all batches in all validation dataloaders. (type: int,
                        default: 2)
  --trainer.truncated_bptt_steps TRUNCATED_BPTT_STEPS
                        Deprecated in v1.3 to be removed in 1.5. Please use :paramref:`~pytorch_lightning.core.lightning.LightningModule.truncated_bptt_steps` instead. (type:
                        Union[int, null], default: null)
  --trainer.resume_from_checkpoint RESUME_FROM_CHECKPOINT
                        Path/URL of the checkpoint from which training is resumed. If there is no checkpoint file at the path, start from scratch. If resuming from mid-epoch
                        checkpoint, training will start from the beginning of the next epoch. (type: Union[str, Path, null], default: null)
  --trainer.profiler PROFILER
                        To profile individual steps during training and assist in identifying bottlenecks. (type: Union[BaseProfiler, str, null], default: pytorch)
  --trainer.benchmark {true,false}
                        If true enables cudnn.benchmark. (type: bool, default: True)
  --trainer.deterministic {true,false}
                        If true enables cudnn.deterministic. (type: bool, default: False)
  --trainer.reload_dataloaders_every_n_epochs RELOAD_DATALOADERS_EVERY_N_EPOCHS
                        Set to a non-negative integer to reload dataloaders every n epochs. Default: 0 (type: int, default: 0)
  --trainer.reload_dataloaders_every_epoch {true,false}
                        Set to True to reload dataloaders every epoch. .. deprecated:: v1.4 ``reload_dataloaders_every_epoch`` has been deprecated in v1.4 and will be removed in
                        v1.6. Please use ``reload_dataloaders_every_n_epochs``. (type: bool, default: False)
  --trainer.auto_lr_find AUTO_LR_FIND
                        If set to True, will make trainer.tune() run a learning rate finder, trying to optimize initial learning for faster convergence. trainer.tune() method will
                        set the suggested learning rate in self.lr or self.learning_rate in the LightningModule. To use a different key set a string instead of True with the key
                        name. (type: Union[bool, str], default: False)
  --trainer.replace_sampler_ddp {true,false}
                        Explicitly enables or disables sampler replacement. If not specified this will toggled automatically when DDP is used. By default it will add ``shuffle=True``
                        for train sampler and ``shuffle=False`` for val/test sampler. If you want to customize it, you can set ``replace_sampler_ddp=False`` and add your own
                        distributed sampler. (type: bool, default: True)
  --trainer.terminate_on_nan {true,false}
                        If set to True, will terminate training (by raising a `ValueError`) at the end of each training batch, if any of the parameters or the loss are NaN or +/-inf.
                        (type: bool, default: False)
  --trainer.auto_scale_batch_size AUTO_SCALE_BATCH_SIZE
                        If set to True, will `initially` run a batch size finder trying to find the largest batch size that fits into memory. The result will be stored in
                        self.batch_size in the LightningModule. Additionally, can be set to either `power` that estimates the batch size through a power search or `binsearch` that
                        estimates the batch size through a binary search. (type: Union[str, bool], default: False)
  --trainer.prepare_data_per_node {true,false}
                        If True, each LOCAL_RANK=0 will call prepare data. Otherwise only NODE_RANK=0, LOCAL_RANK=0 will prepare data (type: bool, default: True)
  --trainer.plugins PLUGINS
                        Plugins allow modification of core behavior like ddp and amp, and enable custom lightning plugins. (type: Union[List[Union[Plugin, ClusterEnvironment, str]],
                        Plugin, ClusterEnvironment, str, null], default: null)
  --trainer.amp_backend AMP_BACKEND
                        The mixed precision backend to use ("native" or "apex") (type: str, default: native)
  --trainer.amp_level AMP_LEVEL
                        The optimization level to use (O1, O2, etc...). (type: str, default: O2)
  --trainer.distributed_backend DISTRIBUTED_BACKEND
                        deprecated. Please use 'accelerator' (type: Union[str, null], default: null)
  --trainer.move_metrics_to_cpu {true,false}
                        Whether to force internal logged metrics to be moved to cpu. This can save some gpu memory, but can make training slower. Use with attention. (type: bool,
                        default: False)
  --trainer.multiple_trainloader_mode MULTIPLE_TRAINLOADER_MODE
                        How to loop over the datasets when there are multiple train loaders. In 'max_size_cycle' mode, the trainer ends one epoch when the largest dataset is
                        traversed, and smaller datasets reload when running out of their data. In 'min_size' mode, all the datasets reload when reaching the minimum length of
                        datasets. (type: str, default: max_size_cycle)
  --trainer.stochastic_weight_avg {true,false}
                        Whether to use `Stochastic Weight Averaging (SWA) <https://pytorch.org/blog/pytorch-1.6-now-includes-stochastic-weight-averaging/>_` (type: bool, default:
                        False)

OCR model:
  --model.hidden_size HIDDEN_SIZE
                        size of the lstm hidden state (type: int, default: 256)
  --model.max_enc_seq_len MAX_ENC_SEQ_LEN
                        the width of the feature map out from cnn (type: int, default: 129)
  --model.teaching_forcing_prob TEACHING_FORCING_PROB
                        percentage of samples to apply teach forcing (type: float, default: 0.5)
  --model.learning_rate LEARNING_RATE
                        learning_rate (type: float, default: 0.0001)
  --model.dropout_p DROPOUT_P
                        Dropout probability in Decoder Dropout layer (type: float, default: 0.1)
  --model.encoder_pth ENCODER_PTH
                        path to encoder (to continue training) (type: Union[str, null], default: null)
  --model.decoder_pth DECODER_PTH
                        path to decoder (to continue training) (type: Union[str, null], default: null)
  --model.save_model_dir SAVE_MODEL_DIR
                        Where to store samples and models (type: Union[str, null], default: null)

<class 'aocr.OCRDataModule'>:
  --data.train_list TRAIN_LIST
                        path to train dataset list file (type: Union[str, null], default: data/dataset/train_list.txt)
  --data.val_list VAL_LIST
                        path to validation dataset list file (type: Union[str, null], default: data/dataset/train_list.txt)
  --data.img_height IMG_HEIGHT
                        the height of the input image to network (type: int, default: 32)
  --data.img_width IMG_WIDTH
                        the width of the input image to network (type: int, default: 512)
  --data.num_workers NUM_WORKERS
                        number of data loading num_workers (type: int, default: 2)
  --data.batch_size BATCH_SIZE
                        input batch size (type: int, default: 4)
  --data.random_sampler {true,false}
                        whether to sample the dataset with random sampler (type: bool, default: True)

Linked arguments:
  model.batch_size <-- data.batch_size [applied on parse]
                        input batch size (type: int)
  model.img_height <-- data.img_height [applied on parse]
                        the height of the input image to network (type: int)
  model.img_width <-- data.img_width [applied on parse]
                        the width of the input image to network (type: int)
  model.encoder_optimizer_args <-- add_class_path(encoder_optimizer) [applied on parse]
                        (type: Union[dict, null])
  model.decoder_optimizer_args <-- add_class_path(decoder_optimizer) [applied on parse]
                        (type: Union[dict, null])

Monitor a metric and stop training when it stops improving:
  --early_stopping.monitor MONITOR
                        quantity to be monitored. (type: Union[str, null], default: train_loss)
  --early_stopping.min_delta MIN_DELTA
                        minimum change in the monitored quantity to qualify as an improvement, i.e. an absolute change of less than `min_delta`, will count as no improvement. (type:
                        float, default: 0.0)
  --early_stopping.patience PATIENCE
                        number of checks with no improvement after which training will be stopped. Under the default configuration, one check happens after every training epoch.
                        However, the frequency of validation can be modified by setting various parameters on the ``Trainer``, for example ``check_val_every_n_epoch`` and
                        ``val_check_interval``. .. note:: It must be noted that the patience parameter counts the number of validation checks with no improvement, and not the number
                        of training epochs. Therefore, with parameters ``check_val_every_n_epoch=10`` and ``patience=3``, the trainer will perform at least 40 training epochs before
                        being stopped. (type: int, default: 5)
  --early_stopping.verbose {true,false}
                        verbosity mode. (type: bool, default: False)
  --early_stopping.mode MODE
                        one of ``'min'``, ``'max'``. In ``'min'`` mode, training will stop when the quantity monitored has stopped decreasing and in ``'max'`` mode it will stop when
                        the quantity monitored has stopped increasing. (type: str, default: min)
  --early_stopping.strict {true,false}
                        whether to crash the training if `monitor` is not found in the validation metrics. (type: bool, default: True)
  --early_stopping.check_finite {true,false}
                        When set ``True``, stops training when the monitor becomes NaN or infinite. (type: bool, default: True)
  --early_stopping.stopping_threshold STOPPING_THRESHOLD
                        Stop training immediately once the monitored quantity reaches this threshold. (type: Union[float, null], default: null)
  --early_stopping.divergence_threshold DIVERGENCE_THRESHOLD
                        Stop training as soon as the monitored quantity becomes worse than this threshold. (type: Union[float, null], default: null)
  --early_stopping.check_on_train_epoch_end {true,false}
                        whether to run early stopping at the end of the training epoch. If this is ``False``, then the check runs at the end of the validation epoch. (type: bool,
                        default: True)

Save the model periodically by monitoring a quantity. Every metric logged with:
  --model_checkpoint.dirpath DIRPATH
                        directory to save the model file. Example:: # custom path # saves a file like: my/path/epoch=0-step=10.ckpt >>> checkpoint_callback =
                        ModelCheckpoint(dirpath='my/path/') By default, dirpath is ``None`` and will be set at runtime to the location specified by
                        :class:`~pytorch_lightning.trainer.trainer.Trainer`'s :paramref:`~pytorch_lightning.trainer.trainer.Trainer.default_root_dir` or
                        :paramref:`~pytorch_lightning.trainer.trainer.Trainer.weights_save_path` arguments, and if the Trainer uses a logger, the path will also contain logger name
                        and version. (type: Union[str, Path, null], default: null)
  --model_checkpoint.filename FILENAME
                        checkpoint filename. Can contain named formatting options to be auto-filled. Example:: # save any arbitrary metrics like `val_loss`, etc. in name # saves a
                        file like: my/path/epoch=2-val_loss=0.02-other_metric=0.03.ckpt >>> checkpoint_callback = ModelCheckpoint( ... dirpath='my/path', ...
                        filename='{epoch}-{val_loss:.2f}-{other_metric:.2f}' ... ) By default, filename is ``None`` and will be set to ``'{epoch}-{step}'``. (type: Union[str, null],
                        default: null)
  --model_checkpoint.monitor MONITOR
                        quantity to monitor. By default it is ``None`` which saves a checkpoint only for the last epoch. (type: Union[str, null], default: train_loss)
  --model_checkpoint.verbose {true,false}
                        verbosity mode. Default: ``False``. (type: bool, default: False)
  --model_checkpoint.save_last {true,false,null}
                        When ``True``, always saves the model at the end of the epoch to a file `last.ckpt`. Default: ``None``. (type: Union[bool, null], default: null)
  --model_checkpoint.save_top_k SAVE_TOP_K
                        if ``save_top_k == k``, the best k models according to the quantity monitored will be saved. if ``save_top_k == 0``, no models are saved. if ``save_top_k ==
                        -1``, all models are saved. Please note that the monitors are checked every ``period`` epochs. if ``save_top_k >= 2`` and the callback is called multiple
                        times inside an epoch, the name of the saved file will be appended with a version count starting with ``v1``. (type: int, default: 1)
  --model_checkpoint.save_weights_only {true,false}
                        if ``True``, then only the model's weights will be saved (``model.save_weights(filepath)``), else the full model is saved (``model.save(filepath)``). (type:
                        bool, default: False)
  --model_checkpoint.mode MODE
                        one of {min, max}. If ``save_top_k != 0``, the decision to overwrite the current save file is made based on either the maximization or the minimization of the
                        monitored quantity. For ``'val_acc'``, this should be ``'max'``, for ``'val_loss'`` this should be ``'min'``, etc. (type: str, default: min)
  --model_checkpoint.auto_insert_metric_name {true,false}
                        When ``True``, the checkpoints filenames will contain the metric name. For example, ``filename='checkpoint_{epoch:02d}-{acc:02d}`` with epoch 1 and acc 80
                        will resolve to ``checkpoint_epoch=01-acc=80.ckp``. Is useful to set it to ``False`` when metric names contain ``/`` as this will result in extra folders.
                        (type: bool, default: True)
  --model_checkpoint.every_n_train_steps EVERY_N_TRAIN_STEPS
                        Number of training steps between checkpoints. If ``every_n_train_steps == None or every_n_train_steps == 0``, we skip saving during training. To disable, set
                        ``every_n_train_steps = 0``. This value must be ``None`` or non-negative. This must be mutually exclusive with ``train_time_interval`` and
                        ``every_n_val_epochs``. (type: Union[int, null], default: null)
  --model_checkpoint.train_time_interval.help CLASS
                        Show the help for the given subclass of timedelta and exit.
  --model_checkpoint.train_time_interval TRAIN_TIME_INTERVAL
                        Checkpoints are monitored at the specified time interval. For all practical purposes, this cannot be smaller than the amount of time it takes to process a
                        single training batch. This is not guaranteed to execute at the exact time specified, but should be close. This must be mutually exclusive with
                        ``every_n_train_steps`` and ``every_n_val_epochs``. (type: Union[timedelta, null], default: null)
  --model_checkpoint.every_n_val_epochs EVERY_N_VAL_EPOCHS
                        Number of validation epochs between checkpoints. If ``every_n_val_epochs == None or every_n_val_epochs == 0``, we skip saving on validation end. To disable,
                        set ``every_n_val_epochs = 0``. This value must be ``None`` or non-negative. This must be mutually exclusive with ``every_n_train_steps`` and
                        ``train_time_interval``. Setting both ``ModelCheckpoint(..., every_n_val_epochs=V)`` and ``Trainer(max_epochs=N, check_val_every_n_epoch=M)`` will only save
                        checkpoints at epochs 0 < E <= N where both values for ``every_n_val_epochs`` and ``check_val_every_n_epoch`` evenly divide E. (type: Union[int, null],
                        default: null)
  --model_checkpoint.period PERIOD
                        Interval (number of epochs) between checkpoints. .. warning:: This argument has been deprecated in v1.3 and will be removed in v1.5. Use
                        ``every_n_val_epochs`` instead. (type: Union[int, null], default: null)

Implements Adam algorithm:
  --encoder_optimizer.lr LR
                        learning rate (default: 1e-3) (type: Any, default: 0.001)
  --encoder_optimizer.betas BETAS
                        coefficients used for computing running averages of gradient and its square (default: (0.9, 0.999)) (type: Any, default: (0.9, 0.999))
  --encoder_optimizer.eps EPS
                        term added to the denominator to improve numerical stability (default: 1e-8) (type: Any, default: 1e-08)
  --encoder_optimizer.weight_decay WEIGHT_DECAY
                        weight decay (L2 penalty) (default: 0) (type: Any, default: 0)
  --encoder_optimizer.amsgrad AMSGRAD
                        whether to use the AMSGrad variant of this algorithm from the paper `On the Convergence of Adam and Beyond`_ (default: False) (type: Any, default: False)

Implements Adam algorithm:
  --decoder_optimizer.lr LR
                        learning rate (default: 1e-3) (type: Any, default: 0.001)
  --decoder_optimizer.betas BETAS
                        coefficients used for computing running averages of gradient and its square (default: (0.9, 0.999)) (type: Any, default: (0.9, 0.999))
  --decoder_optimizer.eps EPS
                        term added to the denominator to improve numerical stability (default: 1e-8) (type: Any, default: 1e-08)
  --decoder_optimizer.weight_decay WEIGHT_DECAY
                        weight decay (L2 penalty) (default: 0) (type: Any, default: 0)
  --decoder_optimizer.amsgrad AMSGRAD
                        whether to use the AMSGrad variant of this algorithm from the paper `On the Convergence of Adam and Beyond`_ (default: False) (type: Any, default: False)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published