diff --git a/3d_segmentation/unetr_btcv_segmentation_3d.ipynb b/3d_segmentation/unetr_btcv_segmentation_3d.ipynb
index 9d318d6311..c1c093c5fd 100644
--- a/3d_segmentation/unetr_btcv_segmentation_3d.ipynb
+++ b/3d_segmentation/unetr_btcv_segmentation_3d.ipynb
@@ -33,7 +33,7 @@
     "\n",
     "Under Institutional Review Board (IRB) supervision, 50 abdomen CT scans of were randomly selected from a combination of an ongoing colorectal cancer chemotherapy trial, and a retrospective ventral hernia study. The 50 scans were captured during portal venous contrast phase with variable volume sizes (512 x 512 x 85 - 512 x 512 x 198) and field of views (approx. 280 x 280 x 280 mm3 - 500 x 500 x 650 mm3). The in-plane resolution varies from 0.54 x 0.54 mm2 to 0.98 x 0.98 mm2, while the slice thickness ranges from 2.5 mm to 5.0 mm. \n",
     "\n",
-    "Target: 13 abdominal organs including 1. Spleen 2. Right Kidney 3. Left Kideny 4.Gallbladder 5.Esophagus 6. Liver 7. Stomach 8.Aorta 9. IVC 10. Portal and Splenic Veins 11. Pancreas 12 Right adrenal gland 13 Left adrenal gland.\n",
+    "Target: 13 abdominal organs including 1. Spleen 2. Right Kidney 3. Left Kidney 4.Gallbladder 5.Esophagus 6. Liver 7. Stomach 8.Aorta 9. IVC 10. Portal and Splenic Veins 11. Pancreas 12 Right adrenal gland 13 Left adrenal gland.\n",
     "\n",
     "Modality: CT\n",
     "Size: 30 3D volumes (24 Training + 6 Testing)  \n",
diff --git a/3d_segmentation/unetr_btcv_segmentation_3d_lightning.ipynb b/3d_segmentation/unetr_btcv_segmentation_3d_lightning.ipynb
index b61951452e..d7f24c2fec 100644
--- a/3d_segmentation/unetr_btcv_segmentation_3d_lightning.ipynb
+++ b/3d_segmentation/unetr_btcv_segmentation_3d_lightning.ipynb
@@ -36,7 +36,7 @@
     "\n",
     "Under Institutional Review Board (IRB) supervision, 50 abdomen CT scans of were randomly selected from a combination of an ongoing colorectal cancer chemotherapy trial, and a retrospective ventral hernia study. The 50 scans were captured during portal venous contrast phase with variable volume sizes (512 x 512 x 85 - 512 x 512 x 198) and field of views (approx. 280 x 280 x 280 mm3 - 500 x 500 x 650 mm3). The in-plane resolution varies from 0.54 x 0.54 mm2 to 0.98 x 0.98 mm2, while the slice thickness ranges from 2.5 mm to 5.0 mm. \n",
     "\n",
-    "Target: 13 abdominal organs including 1. Spleen 2. Right Kidney 3. Left Kideny 4.Gallbladder 5.Esophagus 6. Liver 7. Stomach 8.Aorta 9. IVC 10. Portal and Splenic Veins 11. Pancreas 12 Right adrenal gland 13 Left adrenal gland.\n",
+    "Target: 13 abdominal organs including 1. Spleen 2. Right Kidney 3. Left Kidney 4.Gallbladder 5.Esophagus 6. Liver 7. Stomach 8.Aorta 9. IVC 10. Portal and Splenic Veins 11. Pancreas 12 Right adrenal gland 13 Left adrenal gland.\n",
     "\n",
     "Modality: CT\n",
     "Size: 30 3D volumes (24 Training + 6 Testing)  \n",
diff --git a/auto3dseg/notebooks/auto3dseg_autorunner_ref_api.ipynb b/auto3dseg/notebooks/auto3dseg_autorunner_ref_api.ipynb
index fc86dfbee7..a56790987d 100644
--- a/auto3dseg/notebooks/auto3dseg_autorunner_ref_api.ipynb
+++ b/auto3dseg/notebooks/auto3dseg_autorunner_ref_api.ipynb
@@ -52,7 +52,6 @@
    "outputs": [],
    "source": [
     "import os\n",
-    "import torch\n",
     "import tempfile\n",
     "\n",
     "from monai.apps import download_and_extract\n",
@@ -64,7 +63,7 @@
     "    export_bundle_algo_history,\n",
     "    import_bundle_algo_history,\n",
     ")\n",
-    "from monai.auto3dseg import algo_to_pickle, datafold_read\n",
+    "from monai.auto3dseg import algo_to_pickle\n",
     "from monai.bundle.config_parser import ConfigParser\n",
     "from monai.config import print_config\n",
     "\n",
@@ -72,15 +71,13 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "## Download dataset\n",
     "\n",
-    "We provide a toy datalist file that splits a subset of the downloaded datasets into five folds.\n",
-    "\n",
-    "> NOTE: Each validation set only has 6 images in one fold of training.\n",
-    "> Therefore, we need to set a limit on the total number of GPUs we're using in this notebook."
+    "We provide a toy datalist file that splits a subset of the downloaded datasets into five folds."
    ]
   },
   {
@@ -101,11 +98,7 @@
     "if not os.path.exists(dataroot):\n",
     "    download_and_extract(resource, compressed_file, root_dir)\n",
     "\n",
-    "datalist_file = os.path.join(\"..\", \"tasks\", \"msd\", msd_task, \"msd_\" + msd_task.lower() + \"_folds.json\")\n",
-    "\n",
-    "if torch.cuda.device_count() > 6:\n",
-    "    os.environ[\"CUDA_DEVICE_ORDER\"] = \"PCI_BUS_ID\"\n",
-    "    os.environ[\"CUDA_VISIBLE_DEVICES\"] = \"0,1,2,3,4,5\""
+    "datalist_file = os.path.join(\"..\", \"tasks\", \"msd\", msd_task, \"msd_\" + msd_task.lower() + \"_folds.json\")"
    ]
   },
   {
@@ -231,6 +224,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -238,7 +232,7 @@
     "\n",
     "If the users continue to train the algorithms on local system, The history of the algorithm generation can be fetched via `get_history` method of the `BundleGen` object. There also are scenarios that users need to stop the Python process after the `algo_gen`. For example, the users may need to transfer the files to a remote cluster to start the training. `Auto3DSeg` offers a utility function `export_bundle_algo_history` to dump the history to hard drive and recall it by `import_bundle_algo_history`. \n",
     "\n",
-    "If the files are copied to a remote system, please make sure the algorithm templates are also copied there. Some functions require the path to instantiate the algorithm class properly."
+    "If the files are copied to a remote system, please ensure the algorithm templates are also copied there. Some functions require the path to instantiate the algorithm class properly."
    ]
   },
   {
@@ -252,6 +246,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -266,7 +261,15 @@
     "The users can use either `train()` or `train({})` if no changes are needed.\n",
     "Then the algorithms will go for the full training and repeat 5 folds.\n",
     "\n",
-    "On the other hand, users can also use set `train_param` for each algorithm."
+    "On the other hand, users can also use set `train_param` for each algorithm.\n",
+    "\n",
+    "\n",
+    "For demo purposes, below is a code block to convert num_epoch to iteration style and override all algorithms with the same training parameters.\n",
+    "The setup works fine for a machine that has GPUs less than or equal to 8.\n",
+    "The datalist in this example is only using a subset of the original dataset.\n",
+    "Users need to ensure the number of GPUs is not greater than the number that the training dataset can be partitioned.\n",
+    "For example, the following code block is not suitable for a 16-GPU system.\n",
+    "In such cases, please change the code block accordingly."
    ]
   },
   {
@@ -277,24 +280,11 @@
    "source": [
     "max_epochs = 2  # change epoch number to 2 to cut down the notebook running time\n",
     "\n",
-    "# safeguard to ensure max_epochs is greater or equal to 2\n",
-    "max_epochs = max(max_epochs, 2)\n",
-    "\n",
-    "num_gpus = 1 if \"multigpu\" in input and not input[\"multigpu\"] else torch.cuda.device_count()\n",
-    "\n",
-    "num_epoch = max_epochs\n",
-    "num_images_per_batch = 2\n",
-    "files_train_fold0, _ = datafold_read(datalist_file, \"\", 0)\n",
-    "n_data = len(files_train_fold0)\n",
-    "n_iter = int(num_epoch * n_data / num_images_per_batch / max(num_gpus, 1))\n",
-    "n_iter_val = int(n_iter / 2)\n",
-    "\n",
     "train_param = {\n",
-    "    \"num_iterations\": n_iter,\n",
-    "    \"num_iterations_per_validation\": n_iter_val,\n",
-    "    \"num_images_per_batch\": num_images_per_batch,\n",
-    "    \"num_epochs\": num_epoch,\n",
-    "    \"num_warmup_iterations\": n_iter_val,\n",
+    "    \"num_epochs_per_validation\": 1,\n",
+    "    \"num_images_per_batch\": 2,\n",
+    "    \"num_epochs\": max_epochs,\n",
+    "    \"num_warmup_epochs\": 1,\n",
     "}\n",
     "\n",
     "print(train_param)"
diff --git a/auto3dseg/notebooks/auto3dseg_hello_world.ipynb b/auto3dseg/notebooks/auto3dseg_hello_world.ipynb
index 4d5bf4992e..0762e3822c 100644
--- a/auto3dseg/notebooks/auto3dseg_hello_world.ipynb
+++ b/auto3dseg/notebooks/auto3dseg_hello_world.ipynb
@@ -54,7 +54,6 @@
     "import nibabel as nib\n",
     "import numpy as np\n",
     "import matplotlib.pyplot as plt\n",
-    "import torch\n",
     "\n",
     "from monai.apps.auto3dseg import AutoRunner\n",
     "from monai.config import print_config\n",
@@ -64,6 +63,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -71,10 +71,7 @@
     "\n",
     "It is well known that AI takes time to train. To provide the \"Hello World!\" experience of Auto3D in this notebook, we will simulate a small dataset and run training only for multiple epochs. Due to the nature of AI, the performance shouldn't be highly expected, but the entire pipeline will be completed within minutes!\n",
     "\n",
-    "`sim_datalist` provides the information of the simulated datasets. It lists 12 training and 2 testing images and labels. The training data are split into 3 folds. Each fold will use 8 images to train and 4 images to validate. The size of the dimension is defined by the `sim_dim` .\n",
-    "\n",
-    "> NOTE: Each validation set only has 4 images in one fold of training.\n",
-    "> Therefore, we need to set a limit on the total number of GPUs we're using in this notebook."
+    "`sim_datalist` provides the information of the simulated datasets. It lists 12 training and 2 testing images and labels. The training data are split into 3 folds. Each fold will use 8 images to train and 4 images to validate. The size of the dimension is defined by the `sim_dim` ."
    ]
   },
   {
@@ -104,11 +101,7 @@
     "    ],\n",
     "}\n",
     "\n",
-    "sim_dim = (64, 64, 64)\n",
-    "\n",
-    "if torch.cuda.device_count() > 4:\n",
-    "    os.environ[\"CUDA_DEVICE_ORDER\"] = \"PCI_BUS_ID\"\n",
-    "    os.environ[\"CUDA_VISIBLE_DEVICES\"] = \"0,1,2,3\""
+    "sim_dim = (64, 64, 64)"
    ]
   },
   {
@@ -216,10 +209,15 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Override the training parameters so that we can complete the pipeline in minutes"
+    "## Override the training parameters so that we can complete the pipeline in minutes\n",
+    "\n",
+    "For demo purposes, below is a code block to convert num_epoch to iteration style and override all algorithms with the same training parameters.\n",
+    "If users would like to use more than one GPU, they can change the `CUDA_VISIBLE_DEVICES`, or just remove the key to use all available devices.\n",
+    "Users also need to ensure the number of GPUs is not greater than the number that the training dataset can be partitioned."
    ]
   },
   {
@@ -230,16 +228,12 @@
    "source": [
     "max_epochs = 2\n",
     "\n",
-    "# safeguard to ensure max_epochs is greater or equal to 2\n",
-    "max_epochs = max(max_epochs, 2)\n",
-    "\n",
     "train_param = {\n",
     "    \"CUDA_VISIBLE_DEVICES\": [0],  # use only 1 gpu\n",
-    "    \"num_iterations\": 4 * max_epochs,\n",
-    "    \"num_iterations_per_validation\": 2 * max_epochs,\n",
+    "    \"num_epochs_per_validation\": 1,\n",
     "    \"num_images_per_batch\": 2,\n",
     "    \"num_epochs\": max_epochs,\n",
-    "    \"num_warmup_iterations\": 2 * max_epochs,\n",
+    "    \"num_warmup_epochs\": 1,\n",
     "}\n",
     "runner.set_training_params(train_param)\n",
     "runner.set_num_fold(num_fold=1)"
diff --git a/auto3dseg/notebooks/auto_runner.ipynb b/auto3dseg/notebooks/auto_runner.ipynb
index 2f23b55d77..37bb2444b6 100644
--- a/auto3dseg/notebooks/auto_runner.ipynb
+++ b/auto3dseg/notebooks/auto_runner.ipynb
@@ -60,28 +60,24 @@
    "source": [
     "import os\n",
     "import tempfile\n",
-    "import torch\n",
     "\n",
     "from monai.bundle.config_parser import ConfigParser\n",
     "from monai.apps import download_and_extract\n",
     "\n",
     "from monai.apps.auto3dseg import AutoRunner\n",
-    "from monai.auto3dseg import datafold_read\n",
     "from monai.config import print_config\n",
     "\n",
     "print_config()"
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "## Download dataset\n",
     "\n",
-    "We provide a toy datalist file that splits a subset of the downloaded datasets into five folds.\n",
-    "\n",
-    "> NOTE: Each validation set only has 6 images in one fold of training.\n",
-    "> Therefore, we need to set a limit on the total number of GPUs we're using in this notebook."
+    "We provide a toy datalist file that splits a subset of the downloaded datasets into five folds."
    ]
   },
   {
@@ -102,11 +98,7 @@
     "if not os.path.exists(dataroot):\n",
     "    download_and_extract(resource, compressed_file, root_dir)\n",
     "\n",
-    "datalist_file = os.path.join(\"..\", \"tasks\", \"msd\", msd_task, \"msd_\" + msd_task.lower() + \"_folds.json\")\n",
-    "\n",
-    "if torch.cuda.device_count() > 6:\n",
-    "    os.environ[\"CUDA_DEVICE_ORDER\"] = \"PCI_BUS_ID\"\n",
-    "    os.environ[\"CUDA_VISIBLE_DEVICES\"] = \"0,1,2,3,4,5\""
+    "datalist_file = os.path.join(\"..\", \"tasks\", \"msd\", msd_task, \"msd_\" + msd_task.lower() + \"_folds.json\")"
    ]
   },
   {
@@ -267,6 +259,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -274,11 +267,18 @@
     "\n",
     "`set_training_params` in `AutoRunner` provides an interface to change all algorithms' training parameters in one line. \n",
     "\n",
-    "> NOTE **Auto3DSeg** uses MONAI bundle templates to perform training, validation, and inference. The number of epochs/iterations of training is specified by the config files in each template.\n",
-    "> Users can override these these values in the bundle templates.\n",
-    "> But users should consider that some bundle templates may use `num_iterations` and other may use `num_epochs` to iterate.\n",
+    "NOTE: \n",
+    "**Auto3DSeg** uses MONAI bundle templates to perform training, validation, and inference.\n",
+    "The number of epochs/iterations of training is specified by the config files in each template.\n",
+    "Users can override these these values in the bundle templates.\n",
+    "But users should consider that some bundle templates may use `num_iterations` and other may use `num_epochs` to iterate.\n",
     "\n",
-    "For demo purpose, below is a code block to convert num_epoch to iteration style and override all algorithms with the same training parameters for 1-GPU/2-GPU machine. \n"
+    "For demo purposes, below is a code block to convert num_epoch to iteration style and override all algorithms with the same training parameters.\n",
+    "The setup works fine for a machine that has GPUs less than or equal to 8.\n",
+    "The datalist in this example is only using a subset of the original dataset.\n",
+    "Users need to ensure the number of GPUs is not greater than the number that the training dataset can be partitioned.\n",
+    "For example, the following code block is not suitable for a 16-GPU system.\n",
+    "In such cases, please change the code block accordingly.\n"
    ]
   },
   {
@@ -289,25 +289,13 @@
    "source": [
     "max_epochs = 2\n",
     "\n",
-    "# safeguard to ensure max_epochs is greater or equal to 2\n",
-    "max_epochs = max(max_epochs, 2)\n",
-    "\n",
-    "num_gpus = 1 if \"multigpu\" in input_cfg and not input_cfg[\"multigpu\"] else torch.cuda.device_count()\n",
-    "\n",
-    "num_epoch = max_epochs\n",
-    "num_images_per_batch = 2\n",
-    "files_train_fold0, _ = datafold_read(datalist_file, \"\", 0)\n",
-    "n_data = len(files_train_fold0)\n",
-    "n_iter = int(num_epoch * n_data / num_images_per_batch / num_gpus)\n",
-    "n_iter_val = int(n_iter / 2)\n",
-    "\n",
     "train_param = {\n",
-    "    \"num_iterations\": n_iter,\n",
-    "    \"num_iterations_per_validation\": n_iter_val,\n",
-    "    \"num_images_per_batch\": num_images_per_batch,\n",
-    "    \"num_epochs\": num_epoch,\n",
-    "    \"num_warmup_iterations\": n_iter_val,\n",
+    "    \"num_epochs_per_validation\": 1,\n",
+    "    \"num_images_per_batch\": 2,\n",
+    "    \"num_epochs\": max_epochs,\n",
+    "    \"num_warmup_epochs\": 1,\n",
     "}\n",
+    "\n",
     "runner = AutoRunner(input=input)\n",
     "runner.set_training_params(params=train_param)\n",
     "# runner.run()"
@@ -360,13 +348,14 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "## Train model with HPO\n",
     "\n",
     "**Auto3DSeg** supports hyper parameter optimization (HPO) via `NNI` and `Optuna` backends.\n",
-    "If you wound like to the use `Optuna`, please check the [notebook](hpo_optuna.ipynb) for detailed usage.\n",
+    "If you would like to the use `Optuna`, please check the [notebook](hpo_optuna.ipynb) for detailed usage.\n",
     "\n",
     "Here we demonstrate the HPO option with `NNI` by Microsoft.\n",
     "Please install it via `pip install nni` if you hope to execute HPO with it in tutorial and haven't done so in the beginning of the notebook.\n",
@@ -374,11 +363,11 @@
     "\n",
     "## Use `AutoRunner` with `NNI` backend to perform grid search\n",
     "\n",
-    "After `runner.run()` is executed, `nni` will attempt to start a web service using port 8088 by default. If you are running the tutorial in a remote host, please make sure the port is available on the system.\n",
+    "After `runner.run()` is executed, `nni` will attempt to start a web service using port 8088 by default. If you are running the tutorial in a remote host, please ensure the port is available on the system.\n",
     "\n",
     "> NOTE: it is recommended to turn off ensemble if the users are using HPO features.\n",
     "> By default, all the models are saved under the working directory, including the ones tuned by the HPO package.\n",
-    "> Users may want to read the HPO results before the taking the next step.\n",
+    "> Users may want to read the HPO results before taking the next step.\n",
     "> If the users want to ensemble all the models, the `ensemble` option can be set to True."
    ]
   },
@@ -395,6 +384,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -403,6 +393,7 @@
     "The default `NNI` config that `AutoRunner` looks like below. User can override some of the parameters via the `set_hpo_params` interface:\n",
     "\n",
     "```python\n",
+    "import torch\n",
     "default_nni_config = {\n",
     "    \"trialCodeDirectory\": \".\",\n",
     "    \"trialGpuNumber\": torch.cuda.device_count(),\n",
@@ -449,19 +440,19 @@
    "outputs": [],
    "source": [
     "runner = AutoRunner(input=input, hpo=True, ensemble=False)\n",
+    "num_epoch = 2\n",
     "hpo_params = {\n",
     "    \"maxTrialNumber\": 20,\n",
     "    \"maxExperimentDuration\": \"30m\",\n",
-    "    \"num_iterations\": n_iter,\n",
-    "    \"num_iterations_per_validation\": n_iter_val,\n",
-    "    \"num_images_per_batch\": num_images_per_batch,\n",
-    "    \"num_epochs\": num_epoch,\n",
-    "    \"num_warmup_iterations\": n_iter_val,\n",
-    "    \"training#num_iterations\": n_iter,\n",
-    "    \"training#num_iterations_per_validation\": n_iter_val,\n",
-    "    \"searching#num_iterations\": n_iter,\n",
-    "    \"searching#num_iterations_per_validation\": n_iter_val,\n",
-    "    \"searching#num_warmup_iterations\": n_iter,\n",
+    "    \"num_epochs_per_validation\": 1,\n",
+    "    \"num_images_per_batch\": 1,\n",
+    "    \"num_epochs\": 2,\n",
+    "    \"num_warmup_epochs\": 1,\n",
+    "    \"training#num_epochs\": 2,\n",
+    "    \"training#num_epochs_per_validation\": 1,\n",
+    "    \"searching#num_epochs\": 2,\n",
+    "    \"searching#num_epochs_per_validation\": 1,\n",
+    "    \"searching#num_warmup_epochs\": 1,\n",
     "}\n",
     "search_space = {\"learning_rate\": {\"_type\": \"choice\", \"_value\": [0.0001, 0.01]}}\n",
     "runner.set_num_fold(num_fold=1)\n",
diff --git a/auto3dseg/notebooks/data_analyzer.ipynb b/auto3dseg/notebooks/data_analyzer.ipynb
index 2a1f0cc337..5bf7be1008 100644
--- a/auto3dseg/notebooks/data_analyzer.ipynb
+++ b/auto3dseg/notebooks/data_analyzer.ipynb
@@ -214,6 +214,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -223,8 +224,8 @@
     "\n",
     "```bash\n",
     "python -m monai.apps.auto3dseg DataAnalyzer get_all_case_stats \\\n",
-    "            --datalist=\"{datalist file path}\" \\\n",
-    "            --dataroot=\"${dataroot path}\"\n",
+    "            --datalist=\"<datalist file path>\" \\\n",
+    "            --dataroot=\"<dataroot path>\"\n",
     "```\n"
    ]
   }
diff --git a/auto3dseg/notebooks/ensemble_byoc.ipynb b/auto3dseg/notebooks/ensemble_byoc.ipynb
index e243718761..9058dfb11b 100644
--- a/auto3dseg/notebooks/ensemble_byoc.ipynb
+++ b/auto3dseg/notebooks/ensemble_byoc.ipynb
@@ -56,7 +56,6 @@
     "import numpy as np\n",
     "import nibabel as nib\n",
     "import random\n",
-    "import torch\n",
     "\n",
     "from copy import deepcopy\n",
     "from pathlib import Path\n",
@@ -77,6 +76,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -84,10 +84,9 @@
     "\n",
     "It is well known that AI takes time to train. To provide the \"Hello World!\" experience of Auto3D in this notebook, we will simulate a small dataset and run training only for multiple epochs. Due to the nature of AI, the performance shouldn't be highly expected, but the entire pipeline will be completed within minutes!\n",
     "\n",
-    "`sim_datalist` provides the information of the simulated datasets. It lists 12 training and 2 testing images and labels. The training data are split into 3 folds. Each fold will use 8 images to train and 4 images to validate. The size of the dimension is defined by the `sim_dim` .\n",
-    "\n",
-    "> NOTE: Each validation set only has 4 images in one fold of training.\n",
-    "> Therefore, we need to set a limit on the total number of GPUs we're using in this notebook."
+    "`sim_datalist` provides the information of the simulated datasets. It lists 12 training and 2 testing images and labels.\n",
+    "The training data are split into 3 folds. Each fold will use 8 images to train and 4 images to validate.\n",
+    "The size of the dimension is defined by the `sim_dim`."
    ]
   },
   {
@@ -117,11 +116,7 @@
     "    ],\n",
     "}\n",
     "\n",
-    "sim_dim = (64, 64, 64)\n",
-    "\n",
-    "if torch.cuda.device_count() > 4:\n",
-    "    os.environ[\"CUDA_DEVICE_ORDER\"] = \"PCI_BUS_ID\"\n",
-    "    os.environ[\"CUDA_VISIBLE_DEVICES\"] = \"0,1,2,3\""
+    "sim_dim = (64, 64, 64)"
    ]
   },
   {
@@ -201,10 +196,15 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Run Auto3DSeg data analyzer, algo generation, and training"
+    "## Run Auto3DSeg data analyzer, algo generation, and training\n",
+    "\n",
+    "> NOTE: For demo purposes, below contains a snippet to convert num_epoch to iteration style and override all algorithms with the same training parameters `train_params`.\n",
+    "> If users would like to use more than one GPU, they can change the `CUDA_VISIBLE_DEVICES`, or just remove the key to use all available devices.\n",
+    "> Users also need to ensure the number of GPUs is not greater than the number that the training dataset can be partitioned."
    ]
   },
   {
@@ -231,16 +231,12 @@
     "\n",
     "max_epochs = 2\n",
     "\n",
-    "# safeguard to ensure max_epochs is greater or equal to 2\n",
-    "max_epochs = max(max_epochs, 2)\n",
-    "\n",
     "train_param = {\n",
     "    \"CUDA_VISIBLE_DEVICES\": [0],  # use only 1 gpu\n",
-    "    \"num_iterations\": 4 * max_epochs,\n",
-    "    \"num_iterations_per_validation\": 2 * max_epochs,\n",
+    "    \"num_epochs_per_validation\": 1,\n",
     "    \"num_images_per_batch\": 2,\n",
     "    \"num_epochs\": max_epochs,\n",
-    "    \"num_warmup_iterations\": 2 * max_epochs,\n",
+    "    \"num_warmup_epochs\": 1,\n",
     "}\n",
     "\n",
     "for h in history:\n",
diff --git a/auto3dseg/notebooks/hpo_nni.ipynb b/auto3dseg/notebooks/hpo_nni.ipynb
index c4141fb86b..449deab399 100644
--- a/auto3dseg/notebooks/hpo_nni.ipynb
+++ b/auto3dseg/notebooks/hpo_nni.ipynb
@@ -67,7 +67,6 @@
    "outputs": [],
    "source": [
     "import os\n",
-    "import torch\n",
     "import yaml\n",
     "\n",
     "import tempfile\n",
@@ -82,15 +81,13 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "## Download dataset\n",
     "\n",
-    "We provide a toy datalist file that splits a subset of the downloaded datasets into five folds.\n",
-    "\n",
-    "> NOTE: Each validation set only has 6 images in one fold of training.\n",
-    "> Therefore, we need to set a limit on the total number of GPUs we're using in this notebook."
+    "We provide a toy datalist file that splits a subset of the downloaded datasets into five folds."
    ]
   },
   {
@@ -111,11 +108,7 @@
     "if not os.path.exists(dataroot):\n",
     "    download_and_extract(resource, compressed_file, root_dir)\n",
     "\n",
-    "datalist_file = os.path.join(\"..\", \"tasks\", \"msd\", msd_task, \"msd_\" + msd_task.lower() + \"_folds.json\")\n",
-    "\n",
-    "if torch.cuda.device_count() > 6:\n",
-    "    os.environ[\"CUDA_DEVICE_ORDER\"] = \"PCI_BUS_ID\"\n",
-    "    os.environ[\"CUDA_VISIBLE_DEVICES\"] = \"0,1,2,3,4,5\""
+    "datalist_file = os.path.join(\"..\", \"tasks\", \"msd\", msd_task, \"msd_\" + msd_task.lower() + \"_folds.json\")"
    ]
   },
   {
@@ -363,30 +356,12 @@
     "\n",
     "max_epochs = 2\n",
     "\n",
-    "# safeguard to ensure max_epochs is greater or equal to 2\n",
-    "max_epochs = max(max_epochs, 2)\n",
-    "\n",
-    "num_gpus = 1 if \"multigpu\" in input_cfg and not input_cfg[\"multigpu\"] else torch.cuda.device_count()\n",
-    "\n",
-    "num_epoch = max_epochs\n",
-    "num_images_per_batch = 2\n",
-    "n_data = 24  # total is 30 images, hold out one set (6 images) for cross fold val.\n",
-    "n_iter = int(num_epoch * n_data / num_images_per_batch / num_gpus)\n",
-    "n_iter_val = int(n_iter / 2)\n",
-    "\n",
     "# for segresnet2d\n",
     "override_param = {\n",
-    "    \"num_iterations\": n_iter,\n",
-    "    \"num_iterations_per_validation\": n_iter_val,\n",
+    "    \"num_epochs_per_validation\": 1,\n",
+    "    \"num_epochs\": max_epochs,\n",
     "}\n",
     "\n",
-    "# if the system has more than 6 GPUs\n",
-    "# override_param = {\n",
-    "#     \"CUDA_VISIBLE_DEVICES\": [0, 1, 2, 3, 4, 5],\n",
-    "#     \"num_iterations\": n_iter,\n",
-    "#     \"num_iterations_per_validation\": n_iter_val,\n",
-    "# }\n",
-    "\n",
     "nni_gen = NNIGen(algo=algo, params=override_param)"
    ]
   }
diff --git a/auto3dseg/notebooks/hpo_optuna.ipynb b/auto3dseg/notebooks/hpo_optuna.ipynb
index b6a35f3d52..47cd562309 100644
--- a/auto3dseg/notebooks/hpo_optuna.ipynb
+++ b/auto3dseg/notebooks/hpo_optuna.ipynb
@@ -237,6 +237,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -247,7 +248,13 @@
     "The previously-generated algorithm will not be touched.\n",
     "`NNIGen` will create a copy of the algorithm and save the algorithm in a new folder named `{net}_{fold_index}_override`.\n",
     "\n",
-    "For more information about creating overriding parameters, please refer to the section \"Override Specific Parameters in the Algorithms before HPO\" in this [tutorial documentation](../docs/hpo.md)"
+    "For more information about creating overriding parameters, please refer to the section \"Override Specific Parameters in the Algorithms before HPO\" in this [tutorial documentation](../docs/hpo.md)\n",
+    "\n",
+    "> NOTE: The setup works fine for a machine that has GPUs less than or equal to 8.\n",
+    "> The datalist in this example is only using a subset of the original dataset.\n",
+    "> Users need to ensure the number of GPUs is not greater than the number that the training dataset can be partitioned.\n",
+    "> For example, the following code block is not suitable for a 16-GPU system.\n",
+    "> In such cases, please change the code block accordingly."
    ]
   },
   {
@@ -262,21 +269,10 @@
     "\n",
     "max_epochs = 2\n",
     "\n",
-    "# safeguard to ensure max_epochs is greater or equal to 2\n",
-    "max_epochs = max(max_epochs, 2)\n",
-    "\n",
-    "num_gpus = 1 if \"multigpu\" in input_cfg and not input_cfg[\"multigpu\"] else torch.cuda.device_count()\n",
-    "\n",
-    "num_epoch = max_epochs\n",
-    "num_images_per_batch = 2\n",
-    "n_data = 24  # total is 30 images, hold out one set (6 images) for cross fold val.\n",
-    "n_iter = int(num_epoch * n_data / num_images_per_batch / num_gpus)\n",
-    "n_iter_val = int(n_iter / 2)\n",
-    "\n",
     "# for segresnet2d\n",
     "override_param = {\n",
-    "    \"num_iterations\": n_iter,\n",
-    "    \"num_iterations_per_validation\": n_iter_val,\n",
+    "    \"num_epochs_per_validation\": 1,\n",
+    "    \"num_epochs\": max_epochs,\n",
     "}"
    ]
   },
diff --git a/auto3dseg/tasks/btcv/README.md b/auto3dseg/tasks/btcv/README.md
index 1602eff202..a1466e716d 100644
--- a/auto3dseg/tasks/btcv/README.md
+++ b/auto3dseg/tasks/btcv/README.md
@@ -10,7 +10,7 @@ For BTCV dataset, under Institutional Review Board (IRB) supervision, 50 abdomen
 - Target: 13 abdominal organs including
     1. Spleen
     2. Right Kidney
-    3. Left Kideny
+    3. Left Kidney
     4. Gallbladder
     5. Esophagus
     6. Liver
diff --git a/auto3dseg/tasks/instance22/README.md b/auto3dseg/tasks/instance22/README.md
index 4587d0e628..513978c3de 100644
--- a/auto3dseg/tasks/instance22/README.md
+++ b/auto3dseg/tasks/instance22/README.md
@@ -21,4 +21,4 @@ The complete command of **Auto3DSeg** can be found [here](../../README.md#refere
 | DiNTS      | 3 | 4 | 2 | 0.6467 | 0.7491 | 0.7306	| 0.6638 | 0.6779 | 0.6936 |
 | **SegResNet2d** | 2 | 4 | 2 | 0.6320 | 0.7778 | 0.7607 | 0.7006 | 0.7613 | **0.7265** |
 
-The winning solution is fully based on 2D SegResNet because the network clearly has better average validation Dice score comparing to other networks.
+The winning solution is fully based on 2D SegResNet because the network has a better average validation Dice score compared to other networks.
diff --git a/auto3dseg/tasks/msd/Task04_Hippocampus/README.md b/auto3dseg/tasks/msd/Task04_Hippocampus/README.md
index ac12811e72..63c627cc91 100644
--- a/auto3dseg/tasks/msd/Task04_Hippocampus/README.md
+++ b/auto3dseg/tasks/msd/Task04_Hippocampus/README.md
@@ -1,3 +1,3 @@
 # MSD Dataset Task04 Hippocampus
 
-This repository provides a benmarking guide and recipe to train the template algorithms, validation performance, and is tested and maintained by NVIDIA.
+This repository provides a recipe to train the template algorithms, and validation performance, and is tested and maintained by NVIDIA.
diff --git a/auto3dseg/tasks/msd/Task05_Prostate/README.md b/auto3dseg/tasks/msd/Task05_Prostate/README.md
index 44172dc95f..f810f24222 100644
--- a/auto3dseg/tasks/msd/Task05_Prostate/README.md
+++ b/auto3dseg/tasks/msd/Task05_Prostate/README.md
@@ -1,11 +1,11 @@
 # MSD Dataset Task05 Prostate
 
-This repository provides a benmarking guide and recipe to train the template algorithms, validation performance, and is tested and maintained by NVIDIA.
+This repository provides a benchmarking guide and recipe to train the template algorithms, and validation performance, and is tested and maintained by NVIDIA.
 
 
 ## Task Overview
 
-The task is the volumetric (3D) segmentation of the prostate central gland and peripheral zone from the multi-contrast MRI (T2, ADC). The segmentation of prostate region is formulated as the voxel-wise 3-class classification. Each voxel is predicted as either foreground (prostate central gland, peripheral zone) or background. And the model is optimized with gradient descent method minimizing soft dice loss between the predicted mask and ground truth segmentation. The dataset is from the 2018 MICCAI challenge [Medical Image Segmentation (MSD)](http://medicaldecathlon.com/).
+The task is the volumetric (3D) segmentation of the prostate central gland and peripheral zone from the multi-contrast MRI (T2, ADC). The segmentation of the prostate region is formulated as the voxel-wise 3-class classification. Each voxel is predicted as either foreground (prostate central gland, peripheral zone) or background. And the model is optimized with a gradient descent method minimizing soft dice loss between the predicted mask and ground truth segmentation. The dataset is from the 2018 MICCAI challenge [Medical Image Segmentation (MSD)](http://medicaldecathlon.com/).
 
 - Target:
     1. Prostate central gland
diff --git a/auto3dseg/tasks/msd/Task09_Spleen/README.md b/auto3dseg/tasks/msd/Task09_Spleen/README.md
index 2266842b6a..1d21a9e177 100644
--- a/auto3dseg/tasks/msd/Task09_Spleen/README.md
+++ b/auto3dseg/tasks/msd/Task09_Spleen/README.md
@@ -1,11 +1,11 @@
 # MSD Dataset Task09 Spleen
 
-This repository provides a benmarking guide and recipe to train the template algorithms, validation performance, and is tested and maintained by NVIDIA.
+This repository provides a benchmarking guide and recipe to train the template algorithms, and validation performance, and is tested and maintained by NVIDIA.
 
 
 ## Task Overview
 
-The task is the volumetric (3D) segmentation of the spleen from CT image. The segmentation of spleen is formulated as the voxel-wise 2-class classification. Each voxel is predicted as either foreground (spleen) or background. And the model is optimized with both Dice loss and Cross Entropy loss between the predicted mask and ground truth segmentation. The dataset is from the 2018 MICCAI challenge [Medical Image Segmentation (MSD)](http://medicaldecathlon.com/).
+The task is the volumetric (3D) segmentation of the spleen from a CT image. The segmentation of the spleen is formulated as the voxel-wise 2-class classification. Each voxel is predicted as either foreground (spleen) or background. And the model is optimized with both Dice loss and Cross Entropy loss between the predicted mask and ground truth segmentation. The dataset is from the 2018 MICCAI challenge [Medical Image Segmentation (MSD)](http://medicaldecathlon.com/).
 
 - Target:
     1. spleen