Updated NumPy SDE requirement #3442

vsl9 · 2022-01-14T00:23:37Z

Signed-off-by: Vitaly Lavrukhin [email protected]

Signed-off-by: Vitaly Lavrukhin <[email protected]>

* cache_hf (#3406) Signed-off-by: ekmb <[email protected]> Signed-off-by: Bonham79 <[email protected]> * Learning annealing scheduler fix (#3400) Signed-off-by: MaximumEntropy <[email protected]> Signed-off-by: Bonham79 <[email protected]> * T5 Pre-training in NeMo using Megatron (#3036) * add vocab_file and merge_file to megatron init Signed-off-by: ericharper <[email protected]> * add forward Signed-off-by: ericharper <[email protected]> * add train loss Signed-off-by: ericharper <[email protected]> * add optimizer Signed-off-by: ericharper <[email protected]> * add exp_manager Signed-off-by: ericharper <[email protected]> * multi-gpu is working Signed-off-by: ericharper <[email protected]> * adding val loop Signed-off-by: ericharper <[email protected]> * style Signed-off-by: ericharper <[email protected]> * adding val loop Signed-off-by: ericharper <[email protected]> * fix ranks Signed-off-by: ericharper <[email protected]> * fix model parallel checkpoint saving Signed-off-by: ericharper <[email protected]> * fix _del_model Signed-off-by: ericharper <[email protected]> * Initial megatron dataset port Signed-off-by: MaximumEntropy <[email protected]> * added megatron batch sampler Signed-off-by: ericharper <[email protected]> * try to fix num steps Signed-off-by: ericharper <[email protected]> * add wandb to config Signed-off-by: ericharper <[email protected]> * log lr Signed-off-by: ericharper <[email protected]> * add warmup ratio to config Signed-off-by: ericharper <[email protected]> * update configs Signed-off-by: ericharper <[email protected]> * update configs Signed-off-by: ericharper <[email protected]> * Fix merge conflicts Signed-off-by: MaximumEntropy <[email protected]> * add cpu init to args Signed-off-by: ericharper <[email protected]> * update config Signed-off-by: ericharper <[email protected]> * License fixes and megatron model porting Signed-off-by: MaximumEntropy <[email protected]> * Style fixes Signed-off-by: MaximumEntropy <[email protected]> * More fixes to import from nemo rather than megatron Signed-off-by: MaximumEntropy <[email protected]> * Fix circular imports Signed-off-by: MaximumEntropy <[email protected]> * Style fixes Signed-off-by: MaximumEntropy <[email protected]> * Revert config file Signed-off-by: MaximumEntropy <[email protected]> * update config Signed-off-by: ericharper <[email protected]> * Restructure further to avoid circular imports Signed-off-by: MaximumEntropy <[email protected]> * add Makefile Signed-off-by: ericharper <[email protected]> * Add megatron modules Signed-off-by: MaximumEntropy <[email protected]> * Add data makefile Signed-off-by: MaximumEntropy <[email protected]> * add license Signed-off-by: ericharper <[email protected]> * Port from latest megatron Signed-off-by: MaximumEntropy <[email protected]> * update cfg Signed-off-by: ericharper <[email protected]> * update config Signed-off-by: ericharper <[email protected]> * add _del_model_without_trainer Signed-off-by: ericharper <[email protected]> * add data preprocessing script Signed-off-by: ericharper <[email protected]> * update config Signed-off-by: ericharper <[email protected]> * use apex mpu Signed-off-by: ericharper <[email protected]> * replace print_rank_0 with nemo utils logging Signed-off-by: ericharper <[email protected]> * use apex mpu Signed-off-by: ericharper <[email protected]> * use apex mpu Signed-off-by: ericharper <[email protected]> * add use_cpu_initialization Signed-off-by: ericharper <[email protected]> * fixing autoresume in progress Signed-off-by: ericharper <[email protected]> * properly removing last checkpoint Signed-off-by: ericharper <[email protected]> * log consumed samples Signed-off-by: ericharper <[email protected]> * fix mp autoresume Signed-off-by: ericharper <[email protected]> * Megatron GPT training with NeMo tokenizers (#2818) * Update files from megatron repo Signed-off-by: MaximumEntropy <[email protected]> * Remove non NLP data related files from megatron Signed-off-by: MaximumEntropy <[email protected]> * Merge megatron and nemo tokenizers Signed-off-by: MaximumEntropy <[email protected]> * Remove get_tokenizer() calls from gpt model Signed-off-by: MaximumEntropy <[email protected]> * Update tokenizer yaml config Signed-off-by: MaximumEntropy <[email protected]> * add NLPSaveRestoreConnector Signed-off-by: ericharper <[email protected]> * add todo Signed-off-by: ericharper <[email protected]> * update config Signed-off-by: ericharper <[email protected]> * make init_method_std configurable Signed-off-by: ericharper <[email protected]> * make gpu init work by setting random seed earlier Signed-off-by: ericharper <[email protected]> * fix gpu init after removing debug print in mpu Signed-off-by: ericharper <[email protected]> * add fused_adam Signed-off-by: ericharper <[email protected]> * check ds is not none before logging len Signed-off-by: ericharper <[email protected]> * set fp16 arg to true and fix enum conflict Signed-off-by: ericharper <[email protected]> * make fp16 arg configurable Signed-off-by: ericharper <[email protected]> * add grad clip from megatron Signed-off-by: ericharper <[email protected]> * Linear warmup with cosine annealing and constant holding (#2846) * Testing cosine schedule Signed-off-by: MaximumEntropy <[email protected]> * Style fixes Signed-off-by: MaximumEntropy <[email protected]> * Fixes Signed-off-by: MaximumEntropy <[email protected]> * More fixes Signed-off-by: MaximumEntropy <[email protected]> * update config for constant steps in schedule Signed-off-by: ericharper <[email protected]> * temporarily import enum from megatron Signed-off-by: ericharper <[email protected]> * add grad clip for fp32 Signed-off-by: ericharper <[email protected]> * update check for _del_model_without_trainer Signed-off-by: ericharper <[email protected]> * updating restore for model parallel Signed-off-by: ericharper <[email protected]> * add predict script Signed-off-by: ericharper <[email protected]> * update test iters Signed-off-by: ericharper <[email protected]> * add barrier Signed-off-by: ericharper <[email protected]> * return if clip_val is 0 or None Signed-off-by: ericharper <[email protected]> * when using amp clip grads after they are unscaled Signed-off-by: ericharper <[email protected]> * make native amp scaler hyperparams configurable Signed-off-by: ericharper <[email protected]> * (1) nvfuser, (2) amp-casting decoration (#2894) * (1) nvfuser, (2) amp-casting decoration Signed-off-by: Sangkug Lym <[email protected]> * support bf16 Signed-off-by: Sangkug Lym <[email protected]> * update package info Signed-off-by: ericharper <[email protected]> * add set device to constructor Signed-off-by: ericharper <[email protected]> * set_device in constructor Signed-off-by: ericharper <[email protected]> * [BigNLP] Remove megatron-lm dependency. (#2910) * remove args Signed-off-by: ericharper <[email protected]> * remove args Signed-off-by: ericharper <[email protected]> * remove args Signed-off-by: ericharper <[email protected]> * remove args Signed-off-by: ericharper <[email protected]> * remove args in progress Signed-off-by: ericharper <[email protected]> * remove args in progress Signed-off-by: ericharper <[email protected]> * remove args in progress Signed-off-by: ericharper <[email protected]> * remove args in progress Signed-off-by: ericharper <[email protected]> * add load_fused_kernels Signed-off-by: ericharper <[email protected]> * add load_fused_kernels Signed-off-by: ericharper <[email protected]> * update megatron_init Signed-off-by: ericharper <[email protected]> * add fused kernels Signed-off-by: ericharper <[email protected]> * add fused kernels Signed-off-by: ericharper <[email protected]> * update process batch Signed-off-by: ericharper <[email protected]> * remove erroneous import Signed-off-by: ericharper <[email protected]> * remove erroneous import Signed-off-by: ericharper <[email protected]> * remove erroneous import Signed-off-by: ericharper <[email protected]> * add megatron clip_grad Signed-off-by: ericharper <[email protected]> * trying to resolve circular import error Signed-off-by: ericharper <[email protected]> * rename file Signed-off-by: ericharper <[email protected]> * remove non-gpt models and datasets from __init__ files Signed-off-by: ericharper <[email protected]> * set device in constructorfor gpu init Signed-off-by: ericharper <[email protected]> * set device in constructorfor gpu init Signed-off-by: ericharper <[email protected]> * set_device in constructor Signed-off-by: ericharper <[email protected]> * clean config Signed-off-by: ericharper <[email protected]> * update MegatronDataset Signed-off-by: ericharper <[email protected]> * clean up MegatronModule Signed-off-by: ericharper <[email protected]> * clean up MegatronModule Signed-off-by: ericharper <[email protected]> * rename fp16 and bf16 flags to fused_softmax_input_in_fp16/bf16 Signed-off-by: ericharper <[email protected]> * rename to fused_fp16 Signed-off-by: ericharper <[email protected]> * add fused_fp16 arg to LayerNorm calls Signed-off-by: ericharper <[email protected]> * fix arg name Signed-off-by: ericharper <[email protected]> * fix arg name Signed-off-by: ericharper <[email protected]> * fix import Signed-off-by: ericharper <[email protected]> * update arg Signed-off-by: ericharper <[email protected]> * skip warmup default to True Signed-off-by: ericharper <[email protected]> * skip warmup default to True Signed-off-by: ericharper <[email protected]> * Adding complete method to MegatronGPTModel (#2935) Signed-off-by: Oleksii Kuchaiev <[email protected]> * make ffn_hidden_size mandatory Signed-off-by: ericharper <[email protected]> * Manually migrating timing of step into branch (#2937) * 1. Manually migrating timing of step into branch. Signed-off-by: Micha Livne <[email protected]> * 1. Updated file name and content. Signed-off-by: Micha Livne <[email protected]> * 1. Updated to latest code. Signed-off-by: Micha Livne <[email protected]> Co-authored-by: Micha Livne <[email protected]> * remove unused imports Signed-off-by: ericharper <[email protected]> * remove unused import Signed-off-by: ericharper <[email protected]> * remove unused import Signed-off-by: ericharper <[email protected]> * remove unused import Signed-off-by: ericharper <[email protected]> * check fused_fp16 and fused_bf16 are not both True Signed-off-by: ericharper <[email protected]> * update predict script for model parallel .nemo Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Micha Livne <[email protected]> Co-authored-by: Micha Livne <[email protected]> * NVfuser (#2943) * activation checkpoint recompute Signed-off-by: Sangkug Lym <[email protected]> * selective nvfuser setup * Megatron gpt bfloat support (#2926) * Save/restore fix Signed-off-by: MaximumEntropy <[email protected]> * Another merge Signed-off-by: MaximumEntropy <[email protected]> * Bf16 args in init Signed-off-by: MaximumEntropy <[email protected]> * Set precision Signed-off-by: MaximumEntropy <[email protected]> * Remove debug stuff Signed-off-by: MaximumEntropy <[email protected]> * add bf16 casting decorator Signed-off-by: Sangkug Lym <[email protected]> * Bfloat layernorm propagation Signed-off-by: MaximumEntropy <[email protected]> * activation checkpoint recompute Signed-off-by: Sangkug Lym <[email protected]> * selective nvfuser setup * More arg removal Signed-off-by: MaximumEntropy <[email protected]> * Remove BERTDataset Signed-off-by: MaximumEntropy <[email protected]> * update to latest apex and patch transformer autocast Signed-off-by: ericharper <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: ericharper <[email protected]> * don't set jit for bf16 Signed-off-by: ericharper <[email protected]> * replace apex.mpu Signed-off-by: ericharper <[email protected]> * fix grad clip Signed-off-by: ericharper <[email protected]> * NVFuser fixes (#2951) * Fuser fixes Signed-off-by: MaximumEntropy <[email protected]> * Remove dummy handler Signed-off-by: MaximumEntropy <[email protected]> * Remove PTL plugin based logic for fusion Signed-off-by: MaximumEntropy <[email protected]> * remove duplicated file Signed-off-by: ericharper <[email protected]> * T5 model initial changes Signed-off-by: MaximumEntropy <[email protected]> * typo (#2960) Signed-off-by: ericharper <[email protected]> * [BigNLP] Script to convert GPT checkpoint to .nemo (#2958) * remove args Signed-off-by: ericharper <[email protected]> * remove args Signed-off-by: ericharper <[email protected]> * remove args Signed-off-by: ericharper <[email protected]> * remove args Signed-off-by: ericharper <[email protected]> * remove args in progress Signed-off-by: ericharper <[email protected]> * remove args in progress Signed-off-by: ericharper <[email protected]> * remove args in progress Signed-off-by: ericharper <[email protected]> * remove args in progress Signed-off-by: ericharper <[email protected]> * add load_fused_kernels Signed-off-by: ericharper <[email protected]> * add load_fused_kernels Signed-off-by: ericharper <[email protected]> * update megatron_init Signed-off-by: ericharper <[email protected]> * add fused kernels Signed-off-by: ericharper <[email protected]> * add fused kernels Signed-off-by: ericharper <[email protected]> * update process batch Signed-off-by: ericharper <[email protected]> * remove erroneous import Signed-off-by: ericharper <[email protected]> * remove erroneous import Signed-off-by: ericharper <[email protected]> * remove erroneous import Signed-off-by: ericharper <[email protected]> * add megatron clip_grad Signed-off-by: ericharper <[email protected]> * trying to resolve circular import error Signed-off-by: ericharper <[email protected]> * rename file Signed-off-by: ericharper <[email protected]> * remove non-gpt models and datasets from __init__ files Signed-off-by: ericharper <[email protected]> * set device in constructorfor gpu init Signed-off-by: ericharper <[email protected]> * set device in constructorfor gpu init Signed-off-by: ericharper <[email protected]> * set_device in constructor Signed-off-by: ericharper <[email protected]> * clean config Signed-off-by: ericharper <[email protected]> * update MegatronDataset Signed-off-by: ericharper <[email protected]> * clean up MegatronModule Signed-off-by: ericharper <[email protected]> * clean up MegatronModule Signed-off-by: ericharper <[email protected]> * rename fp16 and bf16 flags to fused_softmax_input_in_fp16/bf16 Signed-off-by: ericharper <[email protected]> * rename to fused_fp16 Signed-off-by: ericharper <[email protected]> * add fused_fp16 arg to LayerNorm calls Signed-off-by: ericharper <[email protected]> * fix arg name Signed-off-by: ericharper <[email protected]> * fix arg name Signed-off-by: ericharper <[email protected]> * fix import Signed-off-by: ericharper <[email protected]> * update arg Signed-off-by: ericharper <[email protected]> * skip warmup default to True Signed-off-by: ericharper <[email protected]> * skip warmup default to True Signed-off-by: ericharper <[email protected]> * Adding complete method to MegatronGPTModel (#2935) Signed-off-by: Oleksii Kuchaiev <[email protected]> * make ffn_hidden_size mandatory Signed-off-by: ericharper <[email protected]> * Manually migrating timing of step into branch (#2937) * 1. Manually migrating timing of step into branch. Signed-off-by: Micha Livne <[email protected]> * 1. Updated file name and content. Signed-off-by: Micha Livne <[email protected]> * 1. Updated to latest code. Signed-off-by: Micha Livne <[email protected]> Co-authored-by: Micha Livne <[email protected]> * remove unused imports Signed-off-by: ericharper <[email protected]> * remove unused import Signed-off-by: ericharper <[email protected]> * remove unused import Signed-off-by: ericharper <[email protected]> * remove unused import Signed-off-by: ericharper <[email protected]> * check fused_fp16 and fused_bf16 are not both True Signed-off-by: ericharper <[email protected]> * update predict script for model parallel .nemo Signed-off-by: ericharper <[email protected]> * typo Signed-off-by: ericharper <[email protected]> * add script to convert .ckpt to .nemo Signed-off-by: ericharper <[email protected]> * in progress Signed-off-by: ericharper <[email protected]> * update Signed-off-by: ericharper <[email protected]> * convert mp checkpoints to nemo Signed-off-by: ericharper <[email protected]> * update help Signed-off-by: ericharper <[email protected]> * add safeguard for model parallel save_to Signed-off-by: ericharper <[email protected]> * adjust NLPModel save_to to be safer for model parallel Signed-off-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Micha Livne <[email protected]> Co-authored-by: Micha Livne <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> * [BigNLP] Update GPT evaluation to work with tensor model parallel (#2959) * in progress Signed-off-by: ericharper <[email protected]> * update args Signed-off-by: ericharper <[email protected]> * add request dataset Signed-off-by: ericharper <[email protected]> * tokenize request Signed-off-by: ericharper <[email protected]> * in progress Signed-off-by: ericharper <[email protected]> * able to run Signed-off-by: ericharper <[email protected]> * reduce logits Signed-off-by: ericharper <[email protected]> * capture response Signed-off-by: ericharper <[email protected]> * squeeze and unsqueeze Signed-off-by: ericharper <[email protected]> * handle non model parallel case Signed-off-by: ericharper <[email protected]> * clean imports Signed-off-by: ericharper <[email protected]> * add file Signed-off-by: ericharper <[email protected]> * convert logits to log_probs Signed-off-by: Oleksii Kuchaiev <[email protected]> * rename logits to log_probs Signed-off-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> * More changes Signed-off-by: MaximumEntropy <[email protected]> * Missing import Signed-off-by: MaximumEntropy <[email protected]> * Tokenizer fixes and adafactor Signed-off-by: MaximumEntropy <[email protected]> * Add adafactor Signed-off-by: MaximumEntropy <[email protected]> * Add training and conf scripts Signed-off-by: MaximumEntropy <[email protected]> * Add megatron t5 model Signed-off-by: MaximumEntropy <[email protected]> * t5 config to fp32 Signed-off-by: MaximumEntropy <[email protected]> * [BigNLP] Remove fused kernel code instead use Apex (#2984) * remove fused_kernels Signed-off-by: ericharper <[email protected]> * remove fused_kernels Signed-off-by: ericharper <[email protected]> * remove fused layer norm and fused softmax and use apex instead Signed-off-by: ericharper <[email protected]> * update imports Signed-off-by: ericharper <[email protected]> * remove comment Signed-off-by: ericharper <[email protected]> * use apex enums Signed-off-by: ericharper <[email protected]> * use apex enums Signed-off-by: ericharper <[email protected]> * Timer with sliding window (#3002) Co-authored-by: Micha Livne <[email protected]> * check for rank zero Signed-off-by: ericharper <[email protected]> * Remove ict dataset import Signed-off-by: MaximumEntropy <[email protected]> * Remove fused kernels Signed-off-by: MaximumEntropy <[email protected]> * style fix Signed-off-by: ericharper <[email protected]> * fix consumed_samples when resuming Signed-off-by: ericharper <[email protected]> * T5 consumed samples fix Signed-off-by: MaximumEntropy <[email protected]> * Remove megatron dep Signed-off-by: MaximumEntropy <[email protected]> * Change checkpoint filename format Signed-off-by: MaximumEntropy <[email protected]> * Log consumed samples in T5 Signed-off-by: MaximumEntropy <[email protected]> * T5 lr scheduler Signed-off-by: MaximumEntropy <[email protected]> * Checkpoint conversion and data preproc updates for t5 Signed-off-by: MaximumEntropy <[email protected]> * Denoising eval Signed-off-by: MaximumEntropy <[email protected]> * Clean up denoising example to explicitly provide mask positions Signed-off-by: MaximumEntropy <[email protected]> * Better logging of results Signed-off-by: MaximumEntropy <[email protected]> * Better printing of results Signed-off-by: MaximumEntropy <[email protected]> * Minor changes Signed-off-by: MaximumEntropy <[email protected]> * Update config Signed-off-by: MaximumEntropy <[email protected]> * Update config Signed-off-by: MaximumEntropy <[email protected]> * properly removing last checkpoint Signed-off-by: ericharper <[email protected]> * add todo Signed-off-by: ericharper <[email protected]> * add predict script Signed-off-by: ericharper <[email protected]> * T5 model initial changes Signed-off-by: MaximumEntropy <[email protected]> * Add adafactor Signed-off-by: MaximumEntropy <[email protected]> * Add training and conf scripts Signed-off-by: MaximumEntropy <[email protected]> * Add megatron t5 model Signed-off-by: MaximumEntropy <[email protected]> * t5 config to fp32 Signed-off-by: MaximumEntropy <[email protected]> * Remove fused kernels Signed-off-by: MaximumEntropy <[email protected]> * fix consumed_samples when resuming Signed-off-by: ericharper <[email protected]> * T5 consumed samples fix Signed-off-by: MaximumEntropy <[email protected]> * Remove megatron dep Signed-off-by: MaximumEntropy <[email protected]> * Change checkpoint filename format Signed-off-by: MaximumEntropy <[email protected]> * Log consumed samples in T5 Signed-off-by: MaximumEntropy <[email protected]> * T5 lr scheduler Signed-off-by: MaximumEntropy <[email protected]> * Checkpoint conversion and data preproc updates for t5 Signed-off-by: MaximumEntropy <[email protected]> * Denoising eval Signed-off-by: MaximumEntropy <[email protected]> * Clean up denoising example to explicitly provide mask positions Signed-off-by: MaximumEntropy <[email protected]> * Better logging of results Signed-off-by: MaximumEntropy <[email protected]> * Minor changes Signed-off-by: MaximumEntropy <[email protected]> * Update config Signed-off-by: MaximumEntropy <[email protected]> * Update config Signed-off-by: MaximumEntropy <[email protected]> * Merge main into megatron_t5 Signed-off-by: MaximumEntropy <[email protected]> * Dataset prerproc script Signed-off-by: MaximumEntropy <[email protected]> * Remove biencoder file Signed-off-by: MaximumEntropy <[email protected]> * Remove another unused file Signed-off-by: MaximumEntropy <[email protected]> * Remove preprocess script since it has moved Signed-off-by: MaximumEntropy <[email protected]> * Remove ICT dataset Signed-off-by: MaximumEntropy <[email protected]> * Remove orqa dataset Signed-off-by: MaximumEntropy <[email protected]> * Remove realm datase Signed-off-by: MaximumEntropy <[email protected]> * More file removing Signed-off-by: MaximumEntropy <[email protected]> * Fix 2 files Signed-off-by: MaximumEntropy <[email protected]> * Style fixes Signed-off-by: MaximumEntropy <[email protected]> * Rename checkpoint fname Signed-off-by: MaximumEntropy <[email protected]> * Loss averaging fixes in t5 Signed-off-by: MaximumEntropy <[email protected]> * Minor changes Signed-off-by: MaximumEntropy <[email protected]> * add megatron gpt pretraining Signed-off-by: ericharper <[email protected]> Signed-off-by: MaximumEntropy <[email protected]> * Remove weight decay stuff Signed-off-by: MaximumEntropy <[email protected]> * Training script update for PTL 1.5 Signed-off-by: MaximumEntropy <[email protected]> * Update grad clip Signed-off-by: MaximumEntropy <[email protected]> * Update config Signed-off-by: MaximumEntropy <[email protected]> * Add barrier Signed-off-by: MaximumEntropy <[email protected]> * Style fixes and adding more stuff Signed-off-by: MaximumEntropy <[email protected]> * Missed merge conflict fix Signed-off-by: MaximumEntropy <[email protected]> * Unittest fixes Signed-off-by: MaximumEntropy <[email protected]> * Style fix Signed-off-by: MaximumEntropy <[email protected]> * Inference changes Signed-off-by: MaximumEntropy <[email protected]> * Style fixes Signed-off-by: MaximumEntropy <[email protected]> * Fix reinstall script Signed-off-by: MaximumEntropy <[email protected]> * T5 CI tests Signed-off-by: MaximumEntropy <[email protected]> * Style fixes Signed-off-by: MaximumEntropy <[email protected]> * Style fixes Signed-off-by: MaximumEntropy <[email protected]> * Minor fixes Signed-off-by: MaximumEntropy <[email protected]> * Minor fixes Signed-off-by: MaximumEntropy <[email protected]> * Tokenizer arg fix Signed-off-by: MaximumEntropy <[email protected]> * Style fixes Signed-off-by: MaximumEntropy <[email protected]> * Helpers fix Signed-off-by: MaximumEntropy <[email protected]> * Style fix Signed-off-by: MaximumEntropy <[email protected]> * PR review changes Signed-off-by: MaximumEntropy <[email protected]> * Style fixes Signed-off-by: MaximumEntropy <[email protected]> * Refactor bert dataset stuff Signed-off-by: MaximumEntropy <[email protected]> * Fix typo Signed-off-by: MaximumEntropy <[email protected]> * Fix request dataset variable Signed-off-by: MaximumEntropy <[email protected]> * Fix sched params in CI test Signed-off-by: MaximumEntropy <[email protected]> * Change to kwargs and Jenkins test for inference Signed-off-by: MaximumEntropy <[email protected]> * PR review related changes Signed-off-by: MaximumEntropy <[email protected]> * More fixes Signed-off-by: MaximumEntropy <[email protected]> * Test helper building Signed-off-by: MaximumEntropy <[email protected]> * Restore helper compilation everywhere Signed-off-by: MaximumEntropy <[email protected]> * Fix PR comments Signed-off-by: MaximumEntropy <[email protected]> * PR comments Signed-off-by: MaximumEntropy <[email protected]> * Add docstring to additional_special_tokens Signed-off-by: MaximumEntropy <[email protected]> * Improve docstring Signed-off-by: MaximumEntropy <[email protected]> * Fix resume from checkpoint path Signed-off-by: MaximumEntropy <[email protected]> * Fix for TP>1 Signed-off-by: MaximumEntropy <[email protected]> * Remove fused fp16 and bf16 args Signed-off-by: MaximumEntropy <[email protected]> * Add missed file Signed-off-by: MaximumEntropy <[email protected]> * Learning annealing scheduler fix Signed-off-by: MaximumEntropy <[email protected]> * Change default optim and scheduler to adam Signed-off-by: MaximumEntropy <[email protected]> * dummy for CI restart Signed-off-by: MaximumEntropy <[email protected]> * Remove constant steps after switch to adam Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: ericharper <[email protected]> Co-authored-by: Sangkug Lym <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Micha Livne <[email protected]> Co-authored-by: Micha Livne <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Bonham79 <[email protected]> * Updates on ASR with diarization util files (#3359) * Initial commit Signed-off-by: Taejin Park <[email protected]> * Update LM part and multiscale part in README. Signed-off-by: Taejin Park <[email protected]> * Removed redundant parts Signed-off-by: Taejin Park <[email protected]> * modified example script Signed-off-by: Taejin Park <[email protected]> * Revised doc strings Signed-off-by: Taejin Park <[email protected]> * Changed paths_to_manifest.py script Signed-off-by: Taejin Park <[email protected]> * Reflected PR comments and revised tutorials Signed-off-by: Taejin Park <[email protected]> * Added ASR models and kenlm installation Signed-off-by: [email protected] * Added ASR models and kenlm installation Signed-off-by: [email protected] Signed-off-by: Taejin Park <[email protected]> * Changed docstrings and style fix Signed-off-by: Taejin Park <[email protected]> * Fixed unused import and vars Signed-off-by: Taejin Park <[email protected]> * Added LM part in ASR_diar tutorial. Signed-off-by: Taejin Park <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Bonham79 <[email protected]> * update docs and replace speakernet with titanet in tutorials (#3405) * update docs and replace speakernet with titanet in tutorials Signed-off-by: nithinraok <[email protected]> * update dataset usage description Signed-off-by: nithinraok <[email protected]> * updated based on comments Signed-off-by: nithinraok <[email protected]> * spell fix Signed-off-by: nithinraok <[email protected]> Signed-off-by: Bonham79 <[email protected]> * Update Mixer-TTS, FastPitch and TTSDataset (#3366) * update tts dataset, fastpitch and mixer tts Signed-off-by: Oktai Tatanov <[email protected]> * fix style and notebooks Signed-off-by: Oktai Tatanov <[email protected]> * update notebooks Signed-off-by: Oktai Tatanov <[email protected]> * update mixer-tts, mixer-tts-x and fastpitch configs Signed-off-by: Oktai Tatanov <[email protected]> * update notebooks and configs Signed-off-by: Oktai Tatanov <[email protected]> * update configs Signed-off-by: Oktai Tatanov <[email protected]> * add links, update README, fix tutorials Signed-off-by: Oktai Tatanov <[email protected]> * fix style Signed-off-by: Oktai Tatanov <[email protected]> * remove unnecessary code from fastpitch model Signed-off-by: Oktai Tatanov <[email protected]> * update jenkinsfile and fastpitch typo fix Signed-off-by: Oktai Tatanov <[email protected]> * fix configs Signed-off-by: Oktai Tatanov <[email protected]> * revert jenkinsfile Signed-off-by: Oktai Tatanov <[email protected]> Signed-off-by: Bonham79 <[email protected]> * Asr fr (#3404) * Pushing WFST_tutorial for open draft. (Still need to review collab code. Signed-off-by: tbartley94 <[email protected]> * Checked tutorial code for WFST_Tutorial is properly functioning. Also included some formatting edits. Signed-off-by: tbartley94 <[email protected]> * Responding to editorial comments for WFST_tutorial Signed-off-by: tbartley94 <[email protected]> * Added images to folder and wrote README for tutorials Signed-off-by: tbartley94 <[email protected]> * Few more editorial changes to explain permutations in classification. Signed-off-by: tbartley94 <[email protected]> * Updated tutorials documentation page. Signed-off-by: tbartley94 <[email protected]> * Forgot links for README Signed-off-by: tbartley94 <[email protected]> * TOC links were dead Signed-off-by: tbartley94 <[email protected]> * More dead links to fix. Signed-off-by: tbartley94 <[email protected]> * removing collab install and appending a warning instead. Signed-off-by: tbartley94 <[email protected]> * Update WFST_Tutorial.ipynb Signed-off-by: tbartley94 <[email protected]> * Adding pretrained French models to ctc_bpe_models and rnnt_bpe_models available models listing Signed-off-by: tbartley94 <[email protected]> * Updating ctc_bpe_models import for updated Fr Conformer Ctc version. Signed-off-by: tbartley94 <[email protected]> * Added new French ASR models to documentation and imports: conformer transducer and conformer ctc trained without hyphenization. Signed-off-by: tbartley94 <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Co-authored-by: Yang Zhang <[email protected]> Signed-off-by: Bonham79 <[email protected]> * [fix] for resume training on SLURM multi-node multi-gpu (#3374) * [fix] for resume training on SLURM multi-node multi-gpu On SLURM resuming training in a multi-node multi-gpu settings fails, as when `LOCAL_RANK` is undefined `is_globa_rank_zero()` returns true on all processes that run on node 0. In this case `exp_manager.py` https://github.com/NVIDIA/NeMo/blob/f83b2c5524a787be21ffea170850c4b5486eac2b/nemo/utils/exp_manager.py#L446, creates multiple `run_*` folders, and eventually leads to failure (missing files because other processes have moved them already). Checking also for `SLURM_PROCID` solves this issue, as the environment variable contains the global rank id. Signed-off-by: Iztok Lebar Bajec <[email protected]> * Update get_rank.py In SLURM environment return SLURM global_rank (SLURM_PROCID), fallback to previous behaviour otherwise. Signed-off-by: Iztok Lebar Bajec <[email protected]> * style Signed-off-by: Jason <[email protected]> * Sloved bug when either RANK or SLURM_PROC reurn 0, and conditionals return False Signed-off-by: Iztok Lebar Bajec <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Bonham79 <[email protected]> * Fix running token classification in multinode setting (#3413) * fix: master device check Signed-off-by: PeganovAnton <[email protected]> * Fix bug with use_cache parameter Signed-off-by: PeganovAnton <[email protected]> * create pickled features file regardless of value of use_cache Signed-off-by: PeganovAnton <[email protected]> * Improve docs Signed-off-by: PeganovAnton <[email protected]> Signed-off-by: Bonham79 <[email protected]> * Fix order of lang checking to ignore input langs (#3417) * Fix order of lang checking Signed-off-by: MaximumEntropy <[email protected]> * Fix == error Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: PeganovAnton <[email protected]> Signed-off-by: Bonham79 <[email protected]> * Refactor ASR Examples Directory (#3392) * Begin refactor of ASR files Signed-off-by: smajumdar <[email protected]> * Update jenkins paths for ASR Signed-off-by: smajumdar <[email protected]> * Update speech_to_text_ctc Signed-off-by: smajumdar <[email protected]> * Update speech_to_text_ctc_bpe Signed-off-by: smajumdar <[email protected]> * Lowercase all directories Signed-off-by: smajumdar <[email protected]> * Fix RNNT num_workers Signed-off-by: smajumdar <[email protected]> * Fix RNNT num_workers Signed-off-by: smajumdar <[email protected]> Signed-off-by: Bonham79 <[email protected]> * NMT MIM mean variance fix (#3385) * 1. Updated default NMT bottleneck encoder to be non-autoregressive Signed-off-by: Micha Livne <[email protected]> * 1. Fixed mena/variance being tied when latent and hidden dimensions are the same. Signed-off-by: Micha Livne <[email protected]> * 1. Debugging. Signed-off-by: Micha Livne <[email protected]> * 1. Fixed style. Signed-off-by: Micha Livne <[email protected]> Co-authored-by: Micha Livne <[email protected]> Signed-off-by: Bonham79 <[email protected]> * update to 21.12 (#3424) Signed-off-by: ericharper <[email protected]> Signed-off-by: Bonham79 <[email protected]> * Working around Pytorch exporter issue with expand() (#3422) Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Bonham79 <[email protected]> * update copyright (#3426) Signed-off-by: ericharper <[email protected]> Signed-off-by: Bonham79 <[email protected]> * remove apex (#3428) Signed-off-by: ekmb <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Bonham79 <[email protected]> * vad infer refactor (#3394) * vad infer refactor Signed-off-by: fayejf <[email protected]> * remove duplicate in write_long_audio_manifest Signed-off-by: fayejf <[email protected]> * remove duplicate in script vad_overlap_posterior Signed-off-by: fayejf <[email protected]> * style fix Signed-off-by: fayejf <[email protected]> * fix nb Signed-off-by: fayejf <[email protected]> * small fix Signed-off-by: fayejf <[email protected]> * fix Signed-off-by: fayejf <[email protected]> * small fixes Signed-off-by: fayejf <[email protected]> * reflect taejin's review Signed-off-by: fayejf <[email protected]> * update tutorial about rename Signed-off-by: fayejf <[email protected]> * small fix Signed-off-by: fayejf <[email protected]> * merge main and fix Signed-off-by: fayejf <[email protected]> * tiny path fix Signed-off-by: fayejf <[email protected]> Signed-off-by: Bonham79 <[email protected]> * doc update for refactory (#3430) Signed-off-by: fayejf <[email protected]> Signed-off-by: Bonham79 <[email protected]> * Update LJSpeech preprocessing (#3423) * update lj speech preprocessing Signed-off-by: Oktai Tatanov <[email protected]> * update lj speech preprocessing 2 Signed-off-by: Oktai Tatanov <[email protected]> Signed-off-by: Bonham79 <[email protected]> * NMT Shared Embeddings Weights (#3340) * 1. Debugging. Signed-off-by: Micha Livne <[email protected]> * 1. Implemented encoder deocder embedding weights tie. Signed-off-by: Micha Livne <[email protected]> * 1. Fixed style. Signed-off-by: Micha Livne <[email protected]> * 1. Debugging. Signed-off-by: Micha Livne <[email protected]> Co-authored-by: Micha Livne <[email protected]> Signed-off-by: Bonham79 <[email protected]> * [BigNLP] Make saving .nemo during on_train_end configurable (#3427) * make save nemo configurable on train end Signed-off-by: ericharper <[email protected]> * add warning when save_best_model is True Signed-off-by: ericharper <[email protected]> Co-authored-by: Jason <[email protected]> Signed-off-by: Bonham79 <[email protected]> * Preprocess an entire folder of .json or .json.gz files into a single .bin and .idx file. (#3425) * Folder preproc Signed-off-by: MaximumEntropy <[email protected]> * Style fixes Signed-off-by: MaximumEntropy <[email protected]> * Fix usless enumerate Signed-off-by: MaximumEntropy <[email protected]> * Address PR comments Signed-off-by: MaximumEntropy <[email protected]> * Style fixes Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Bonham79 <[email protected]> * Update speaker diarization docs (#3419) * Initial commit Signed-off-by: Taejin Park <[email protected]> * Fixed minor mistakes Signed-off-by: Taejin Park <[email protected]> * Some changes regarding diarization utils Signed-off-by: Taejin Park <[email protected]> * Fixed minor typos Signed-off-by: Taejin Park <[email protected]> * Reflected PR comments Signed-off-by: Taejin Park <[email protected]> * Reflected PR comments Signed-off-by: Taejin Park <[email protected]> * Reflected addtional comments Signed-off-by: Taejin Park <[email protected]> * Changed pics and refined text Signed-off-by: Taejin Park <[email protected]> * Minor typos Signed-off-by: Taejin Park <[email protected]> * Minor change on dataset Signed-off-by: Taejin Park <[email protected]> * Minor change on dataset 2 Signed-off-by: Taejin Park <[email protected]> * Changed manifest input to yaml format Signed-off-by: Taejin Park <[email protected]> * Capitalization of titles Signed-off-by: Taejin Park <[email protected]> * Last commit Signed-off-by: Taejin Park <[email protected]> Co-authored-by: fayejf <[email protected]> Co-authored-by: Nithin Rao <[email protected]> Signed-off-by: Bonham79 <[email protected]> * Update ContextNet models trained on more datasets (#3440) * Update ContextNet models trained on more datasets Signed-off-by: smajumdar <[email protected]> * Update ContextNet models trained on more datasets Signed-off-by: smajumdar <[email protected]> Signed-off-by: Bonham79 <[email protected]> * 1. Updated default buffer_size for TimingCallback to 1. (#3439) Signed-off-by: Micha Livne <[email protected]> Co-authored-by: Micha Livne <[email protected]> Signed-off-by: Bonham79 <[email protected]> * Fix bug for missing variable (#3437) Signed-off-by: MaximumEntropy <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Bonham79 <[email protected]> * Extending input_example() to take max batch and dimension arguments (#3429) * Extending input_example() to take max batch and dimension arguments Signed-off-by: Boris Fomitchev <[email protected]> * Fixing conformer size reconfig, extending export script, some refactoring Signed-off-by: Boris Fomitchev <[email protected]> * Addressing comments Signed-off-by: Boris Fomitchev <[email protected]> * Fixing test issue Signed-off-by: Boris Fomitchev <[email protected]> * Fixing DecoderJoint input example Signed-off-by: Boris Fomitchev <[email protected]> * Removing soon-deprecated external format option addition Signed-off-by: Boris Fomitchev <[email protected]> * Fixing indentation typo Signed-off-by: Boris Fomitchev <[email protected]> Signed-off-by: Bonham79 <[email protected]> * Byte-level Multilingual NMT (#3368) * init Signed-off-by: Abhinav Khattar <[email protected]> * style Signed-off-by: Abhinav Khattar <[email protected]> * rm debug stuff Signed-off-by: Abhinav Khattar <[email protected]> * changes Signed-off-by: Abhinav Khattar <[email protected]> * fix Signed-off-by: Abhinav Khattar <[email protected]> * fix Signed-off-by: Abhinav Khattar <[email protected]> * error fix Signed-off-by: Abhinav Khattar <[email protected]> * make spl tokens optional Signed-off-by: Abhinav Khattar <[email protected]> Co-authored-by: Micha Livne <[email protected]> Signed-off-by: Bonham79 <[email protected]> * Asr patches (#3443) * Fix issues with num_workers for transcribe Signed-off-by: smajumdar <[email protected]> * During inference use full context of chunk Signed-off-by: smajumdar <[email protected]> Signed-off-by: Bonham79 <[email protected]> * Updated NumPy SDE requirement (#3442) Signed-off-by: Vitaly Lavrukhin <[email protected]> Signed-off-by: Bonham79 <[email protected]> * refactor data preprocessing script (#3444) Signed-off-by: Yang Zhang <[email protected]> Signed-off-by: Bonham79 <[email protected]> * Prompt tuning loss mask fix (#3438) * Switched to calcualte loss on answer only Signed-off-by: Virginia Adams <[email protected]> * Added CI tests and unit tests for prompt tuning dataset Signed-off-by: Virginia Adams <[email protected]> * Fixed Jenkinsfile typo Signed-off-by: Virginia Adams <[email protected]> * fixed Jenkinsfile typo Signed-off-by: Virginia Adams <[email protected]> * Fixed more typos so CI tests run all the way through Signed-off-by: Virginia Adams <[email protected]> * Fixed code formatting Signed-off-by: Virginia Adams <[email protected]> * Needed to add save nemo file on train end flag to CI test Signed-off-by: Virginia Adams <[email protected]> * Added save .nemo on train end flag to example script Signed-off-by: Virginia Adams <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]> Signed-off-by: Bonham79 <[email protected]> * BioMegatron token classification tutorial fix to be compatible with current Megatron BERT (#3435) * fixed the tokenizer Signed-off-by: Yi Dong <[email protected]> * training is working Signed-off-by: Yi Dong <[email protected]> * fixed text Signed-off-by: Yi Dong <[email protected]> * fixed text Signed-off-by: Yi Dong <[email protected]> * working notebook Signed-off-by: Yi Dong <[email protected]> * style fix Signed-off-by: Yi Dong <[email protected]> * fixed text Signed-off-by: Yi Dong <[email protected]> * handles the different megatron-lm checkpoint versions Signed-off-by: Yi Dong <[email protected]> * fixed the text classification notebook Signed-off-by: Yi Dong <[email protected]> * fixed key error Signed-off-by: Yi Dong <[email protected]> * more key error Signed-off-by: Yi Dong <[email protected]> * replace the old notebooks Signed-off-by: Yi Dong <[email protected]> * register vocab to nemo file Signed-off-by: Yi Dong <[email protected]> * added the missing notebook Signed-off-by: Yi Dong <[email protected]> Co-authored-by: Eric Harper <[email protected]> Signed-off-by: Bonham79 <[email protected]> * (1) O2-style mixed precision recipe, (2) Persistent layer-norm, (3) Grade scale hysteresis, (4) gradient_as_bucket_view (#3259) * half precision training w/o autocast using master param stage fp16 working version fix: fp32 grad accumulation bf16 support Signed-off-by: Sangkug Lym <[email protected]> add closure fn at bf16 * change autocast compatible with latest pytorch version Signed-off-by: Sangkug Lym <[email protected]> * add module to the state_dict naming Signed-off-by: Sangkug Lym <[email protected]> * cleanup arguments Signed-off-by: Sangkug Lym <[email protected]> * fix module state matching upon checkpoint resume Signed-off-by: Sangkug Lym <[email protected]> * persistent layer norm and dependency check Signed-off-by: Sangkug Lym <[email protected]> check container version instead of pytorch version Signed-off-by: Sangkug Lym <[email protected]> update config * dependency check Signed-off-by: Sangkug Lym <[email protected]> * add graadient_as_bucket_view arg to config Signed-off-by: Sangkug Lym <[email protected]> * (1) add hysteresis to grad scaler, and (2) add grad_scaler to TB Signed-off-by: Sangkug Lym <[email protected]> * doc link fixes (#3264) Signed-off-by: nithinraok <[email protected]> * escape chars fix (#3253) * escape chars fix Signed-off-by: ekmb <[email protected]> * bug fixes Signed-off-by: ekmb <[email protected]> * review Signed-off-by: ekmb <[email protected]> Co-authored-by: Yang Zhang <[email protected]> * Improve data pipeline for punctuation capitalization model and make other useful changes (#3159) * Fix: inference on short sequences problem Signed-off-by: PeganovAnton <[email protected]> * Add draft of new punctuation and capitalization model Signed-off-by: PeganovAnton <[email protected]> * Fix debug config Signed-off-by: PeganovAnton <[email protected]> * Add parameter check Signed-off-by: PeganovAnton <[email protected]> * Update punctuation training script Signed-off-by: PeganovAnton <[email protected]> * Fix head config parameter names Signed-off-by: PeganovAnton <[email protected]> * Fix ds_item and class_label parameters in config Signed-off-by: PeganovAnton <[email protected]> * Fix dataloader shuffling for tarred dataset Signed-off-by: PeganovAnton <[email protected]> * Reduce validation batch Signed-off-by: PeganovAnton <[email protected]> * Add debug print Signed-off-by: PeganovAnton <[email protected]> * Fix metrics initialization Signed-off-by: PeganovAnton <[email protected]> * Fix minor bug Signed-off-by: PeganovAnton <[email protected]> * Fix device problem Signed-off-by: PeganovAnton <[email protected]> * Add debug print Signed-off-by: PeganovAnton <[email protected]> * Register metrics properly Signed-off-by: PeganovAnton <[email protected]> * Put metrics setup after module init Signed-off-by: PeganovAnton <[email protected]> * Reduce model size Signed-off-by: PeganovAnton <[email protected]> * Add wandb logging Signed-off-by: PeganovAnton <[email protected]> * Change wandb name Signed-off-by: PeganovAnton <[email protected]> * Fix logging names for metrics Signed-off-by: PeganovAnton <[email protected]> * Add debug print Signed-off-by: PeganovAnton <[email protected]> * Add returning from eval steps Signed-off-by: PeganovAnton <[email protected]> * Add second dev dataset Signed-off-by: PeganovAnton <[email protected]> * Move config Signed-off-by: PeganovAnton <[email protected]> * Fix path to dataset" Signed-off-by: PeganovAnton <[email protected]> * Add more tokenizer parameters Signed-off-by: PeganovAnton <[email protected]> * Add debug script for more tokenizer in creating tarred dataset Signed-off-by: PeganovAnton <[email protected]> * Update output path in debug script Signed-off-by: PeganovAnton <[email protected]> * Fix minor bug in typing Signed-off-by: PeganovAnton <[email protected]> * Fix bug in parsing arguments Signed-off-by: PeganovAnton <[email protected]> * Do not pass tokenizer through queue Signed-off-by: PeganovAnton <[email protected]> * Set hf tokenizer in debug script Signed-off-by: PeganovAnton <[email protected]> * Try char vocabulary Signed-off-by: PeganovAnton <[email protected]> * Fix typo Signed-off-by: PeganovAnton <[email protected]> * Improve error message Signed-off-by: PeganovAnton <[email protected]> * Fix OOV problem Signed-off-by: PeganovAnton <[email protected]> * Add label ids creation and getting Signed-off-by: PeganovAnton <[email protected]> * fix: add missing parameter Signed-off-by: PeganovAnton <[email protected]> * Improve error message for label ids building Signed-off-by: PeganovAnton <[email protected]> * Add short tar files repacking Signed-off-by: PeganovAnton <[email protected]> * Fix minor bug and add more security Signed-off-by: PeganovAnton <[email protected]> * Fix minor bug Signed-off-by: PeganovAnton <[email protected]> * fix: replace Path with str Signed-off-by: PeganovAnton <[email protected]> * fix: iter datasets Signed-off-by: PeganovAnton <[email protected]> * Improve logging Signed-off-by: PeganovAnton <[email protected]> * Turn off repacking Signed-off-by: PeganovAnton <[email protected]> * Turn off repacking Signed-off-by: PeganovAnton <[email protected]> * Turn on repacking Signed-off-by: PeganovAnton <[email protected]> * Turn off repacking Signed-off-by: PeganovAnton <[email protected]> * Add debug print Signed-off-by: PeganovAnton <[email protected]> * Improve unexpected removal Signed-off-by: PeganovAnton <[email protected]> * Turn on repacking Signed-off-by: PeganovAnton <[email protected]> * fix: remove repacked files Signed-off-by: PeganovAnton <[email protected]> * Add default config for testing Signed-off-by: PeganovAnton <[email protected]> * Improve code style in evaluate script Signed-off-by: PeganovAnton <[email protected]> * Add docstrings Signed-off-by: PeganovAnton <[email protected]> * Remove debug config Signed-off-by: PeganovAnton <[email protected]> * Remove commented code Signed-off-by: PeganovAnton <[email protected]> * Fix code style in doc string Signed-off-by: PeganovAnton <[email protected]> * Fix usage of parser.error function Signed-off-by: PeganovAnton <[email protected]> * Improve working with config and fix restoring of old checkpoints Signed-off-by: PeganovAnton <[email protected]> * Do not demand cfg as dataclass Signed-off-by: PeganovAnton <[email protected]> * Add backward compatibility for absense of use_tarred_dataset Signed-off-by: PeganovAnton <[email protected]> * Fight for backwards compatibility Signed-off-by: PeganovAnton <[email protected]> * Add tokens_in_batch backward compatibility Signed-off-by: PeganovAnton <[email protected]> * Undo unintentional changes in tutorial Signed-off-by: PeganovAnton <[email protected]> * Do not allow more workers than queries Signed-off-by: PeganovAnton <[email protected]> * Fix metric names in tests Signed-off-by: PeganovAnton <[email protected]> * Fix metric location Signed-off-by: PeganovAnton <[email protected]> * Fix metric location Signed-off-by: PeganovAnton <[email protected]> * Require ds_item or data_dir Signed-off-by: PeganovAnton <[email protected]> * Disable multiprocessing data preparation by default Signed-off-by: PeganovAnton <[email protected]> * Disable multiprocessing data preparation by default Signed-off-by: PeganovAnton <[email protected]> * Disable multiprocessing data preparation by default Signed-off-by: PeganovAnton <[email protected]> * Make minor improvements in docstrings and typing Signed-off-by: PeganovAnton <[email protected]> * Fix finetuning code Signed-off-by: PeganovAnton <[email protected]> * Fix shuffle train dataset config parameter Signed-off-by: PeganovAnton <[email protected]> * Fix evaluation script Signed-off-by: PeganovAnton <[email protected]> * Update test Signed-off-by: PeganovAnton <[email protected]> * Add new test and make minor changes Signed-off-by: PeganovAnton <[email protected]> * Fix repacked file names Signed-off-by: PeganovAnton <[email protected]> * Add assertion error Signed-off-by: PeganovAnton <[email protected]> * Fix minor bug in regex Signed-off-by: PeganovAnton <[email protected]> * Improve Jenkins command Signed-off-by: PeganovAnton <[email protected]> * Fix code style Signed-off-by: PeganovAnton <[email protected]> * fix: add name to Jenkins stage Signed-off-by: PeganovAnton <[email protected]> * fix: add steps block to Jenkins stage Signed-off-by: PeganovAnton <[email protected]> * fix: move nemo_experiments removal to post section Previously I encoutered a weird error + rm -rf nemo_experiments rm: cannot remove 'nemo_experiments': Directory not empty script returned exit code 1 And suspect that this could be because to parallel stages try to remove same directory simultaneously. Signed-off-by: PeganovAnton <[email protected]> * Turn off cache usage in Jenkins for token classification models Signed-off-by: PeganovAnton <[email protected]> * Stop pickling features Signed-off-by: PeganovAnton <[email protected]> * Reference webdataset in docs Signed-off-by: PeganovAnton <[email protected]> * Make multiple minor improvements Signed-off-by: PeganovAnton <[email protected]> * Add parameters tokens_in_batch, repack to documentation Signed-off-by: PeganovAnton <[email protected]> * Refactoring and improving readability Signed-off-by: PeganovAnton <[email protected]> * Make tar_shuffle_n optional parameter Signed-off-by: PeganovAnton <[email protected]> * Fix path to label vocab files Signed-off-by: PeganovAnton <[email protected]> * Fix metadata label vocab key Signed-off-by: PeganovAnton <[email protected]> * Create for_nemo directory Signed-off-by: PeganovAnton <[email protected]> * Fix tar_shuffle_n default value Signed-off-by: PeganovAnton <[email protected]> * First round of review fixes Signed-off-by: PeganovAnton <[email protected]> * Return tokens_in_batch default value Signed-off-by: PeganovAnton <[email protected]> * Remove duplicate parameters in `CommonDatasetParameters` Signed-off-by: PeganovAnton <[email protected]> * Remove duplicate parameters in config Signed-off-by: PeganovAnton <[email protected]> * Refactor user interface Signed-off-by: PeganovAnton <[email protected]> * fix: add missing parameter in calling setting dataloader up Signed-off-by: PeganovAnton <[email protected]> * fix: replace data config with model config Signed-off-by: PeganovAnton <[email protected]> * fix: typo in config parameter name Signed-off-by: PeganovAnton <[email protected]> * fix: location of label ids parameters in config Signed-off-by: PeganovAnton <[email protected]> * fix: transforming not first legacy data config Signed-off-by: PeganovAnton <[email protected]> * fix: num_samples can be negative Signed-off-by: PeganovAnton <[email protected]> * fix: create directory for nemo ids files Signed-off-by: PeganovAnton <[email protected]> * fix: remove unremoved with_label Signed-off-by: PeganovAnton <[email protected]> * fix: features contain ids if loaded from pickle Signed-off-by: PeganovAnton <[email protected]> * Fix kwargs parameters Signed-off-by: PeganovAnton <[email protected]> * Add label setting for testing case Signed-off-by: PeganovAnton <[email protected]> * Fix: change parameter location in config Signed-off-by: PeganovAnton <[email protected]> * Fix: transform legacy config in init Signed-off-by: PeganovAnton <[email protected]> * Fix: make minor improvement in checking config Signed-off-by: PeganovAnton <[email protected]> * fix: check label ids for None before checking pad label id Signed-off-by: PeganovAnton <[email protected]> * fix: set labels when restoring Signed-off-by: PeganovAnton <[email protected]> * fix: place where label ids are taken Signed-off-by: PeganovAnton <[email protected]> * Fix minor bug Signed-off-by: PeganovAnton <[email protected]> * fix: register artifacts in set_label_ids Signed-off-by: PeganovAnton <[email protected]> * fix: perform checking only if label ids are not set Signed-off-by: PeganovAnton <[email protected]> * fix: set label_ids_are_set Signed-off-by: PeganovAnton <[email protected]> * Fix using of dataset in create tarred dataset Signed-off-by: PeganovAnton <[email protected]> * fix: manipulate label ids if fragment_idx is zero Signed-off-by: PeganovAnton <[email protected]> * fix: remove directory correctly Signed-off-by: PeganovAnton <[email protected]> * fix: vocab file names Signed-off-by: PeganovAnton <[email protected]> * fix: vocab file names Signed-off-by: PeganovAnton <[email protected]> * Add debug print Signed-off-by: PeganovAnton <[email protected]> * Add directories for cache and label info Signed-off-by: PeganovAnton <[email protected]> * Minor fixes Signed-off-by: PeganovAnton <[email protected]> * Minor fix Signed-off-by: PeganovAnton <[email protected]> * Minor fix Signed-off-by: PeganovAnton <[email protected]> * Improve debug config Signed-off-by: PeganovAnton <[email protected]> * Create missing directories Signed-off-by: PeganovAnton <[email protected]> * Improve feature pkl file name Signed-off-by: PeganovAnton <[email protected]> * WORKING VERSION OF VOCAB CONFIG Signed-off-by: PeganovAnton <[email protected]> * Improve vocab file extraction Signed-off-by: PeganovAnton <[email protected]> * Fix config Signed-off-by: PeganovAnton <[email protected]> * Improve vocab file extraction Signed-off-by: PeganovAnton <[email protected]> * fix register artifact calls Signed-off-by: PeganovAnton <[email protected]> * fix: add class_labels to legacy fixing Signed-off-by: PeganovAnton <[email protected]> * fix: add missing method Signed-off-by: PeganovAnton <[email protected]> * Add support for checkpoints without class labels artifact Signed-off-by: PeganovAnton <[email protected]> * fix: add missing return values to function Signed-off-by: PeganovAnton <[email protected]> * fix saving label ids in creation of tarred dataset Signed-off-by: PeganovAnton <[email protected]> * fix: adjust tarred dataset consistency check Signed-off-by: PeganovAnton <[email protected]> * fix: consistency check call Signed-off-by: PeganovAnton <[email protected]> * Try checking labels every time dataloader is set Signed-off-by: PeganovAnton <[email protected]> * fi…

vsl9 added 2 commits January 13, 2022 15:14

Updated NumPy SDE requirement

c46dc7d

Signed-off-by: Vitaly Lavrukhin <[email protected]>

Merge branch 'main' into numpy-fix

6aa5d0f

titu1994 approved these changes Jan 14, 2022

View reviewed changes

nithinraok and others added 3 commits January 13, 2022 23:49

Merge branch 'main' into numpy-fix

ee3ae00

Merge branch 'main' into numpy-fix

a5913d8

Merge branch 'main' into numpy-fix

373bd49

vsl9 merged commit e05c43c into main Jan 14, 2022

vsl9 deleted the numpy-fix branch January 14, 2022 20:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated NumPy SDE requirement #3442

Updated NumPy SDE requirement #3442

vsl9 commented Jan 14, 2022

Updated NumPy SDE requirement #3442

Updated NumPy SDE requirement #3442

Conversation

vsl9 commented Jan 14, 2022