diff --git a/PARAMETERS.md b/PARAMETERS.md
new file mode 100644
index 0000000000..94d6379897
--- /dev/null
+++ b/PARAMETERS.md
@@ -0,0 +1,87 @@
+## LoraConfig Parameters
+
+Adjusting the `LoraConfig` parameters allows you to balance model performance and computational efficiency in Low-Rank Adaptation (LoRA). Here’s a concise breakdown of key parameters:
+
+**r**
+- **Description**: Rank of the low-rank decomposition for factorizing weight matrices.
+- **Impact**:
+  - **Higher**: Retains more information, increases computational load.
+  - **Lower**: Fewer parameters, more efficient training, potential performance drop if too small.
+
+
+**lora_alpha**
+- **Description**: Scaling factor for the low-rank matrices' contribution.
+- **Impact**:
+  - **Higher**: Increases influence, speeds up convergence, risks instability or overfitting.
+  - **Lower**: Subtler effect, may require more training steps.
+
+**lora_dropout**
+- **Description**: Probability of zeroing out elements in low-rank matrices for regularization.
+- **Impact**:
+  - **Higher**: More regularization, prevents overfitting, may slow training and degrade performance.
+  - **Lower**: Less regularization, may speed up training, risks overfitting.
+
+**loftq_config**
+- **Description**: Configuration for LoftQ, a quantization method for the backbone weights and initialization of LoRA layers.
+- **Impact**:
+  - **Not None**: If specified, LoftQ will quantize the backbone weights and initialize the LoRA layers. It requires setting `init_lora_weights='loftq'`.
+  - **None**: LoftQ quantization is not applied.
+  - **Note**: Do not pass an already quantized model when using LoftQ as LoftQ handles the quantization process itself.
+
+
+**use_rslora**
+- **Description**: Enables Rank-Stabilized LoRA (RSLora).
+- **Impact**:
+  - **True**: Uses Rank-Stabilized LoRA, setting the adapter scaling factor to `lora_alpha/math.sqrt(r)`, which has been proven to work better as per the [Rank-Stabilized LoRA paper](https://doi.org/10.48550/arXiv.2312.03732).
+  - **False**: Uses the original default scaling factor `lora_alpha/r`.
+
+**gradient_accumulation_steps**
+- **Default**: 1
+- **Description**: The number of steps to accumulate gradients before performing a backpropagation update.
+- **Impact**: 
+  - **Higher**: Accumulate gradients over multiple steps, effectively increasing the batch size without requiring additional memory. This can improve training stability and convergence, especially with large models and limited hardware.
+  - **Lower**: Faster updates but may require more memory per step and can be less stable.
+
+**weight_decay**
+- **Default**: 0.01
+- **Description**: Regularization technique that applies a small penalty to the weights during training.
+- **Impact**:
+  - **Non-zero Value (e.g., 0.01)**: Adds a penalty proportional to the magnitude of the weights to the loss function, helping to prevent overfitting by discouraging large weights.
+  - **Zero**: No weight decay is applied, which can lead to overfitting, especially in large models or with small datasets.
+
+**learning_rate**
+- **Default**: 2e-4
+- **Description**: The rate at which the model updates its parameters during training.
+- **Impact**:
+  - **Higher**: Faster convergence but risks overshooting optimal parameters and causing instability in training.
+  - **Lower**: More stable and precise updates but may slow down convergence, requiring more training steps to achieve good performance.
+
+## Target Modules 
+
+**q_proj (query projection)**
+- **Description**: Part of the attention mechanism in transformer models, responsible for projecting the input into the query space.
+- **Impact**: Transforms the input into query vectors that are used to compute attention scores.
+
+**k_proj (key projection)**
+- **Description**: Projects the input into the key space in the attention mechanism.
+- **Impact**: Produces key vectors that are compared with query vectors to determine attention weights.
+
+**v_proj (value projection)**
+- **Description**: Projects the input into the value space in the attention mechanism.
+- **Impact**: Produces value vectors that are weighted by the attention scores and combined to form the output.
+
+**o_proj (output projection)**
+- **Description**: Projects the output of the attention mechanism back into the original space.
+- **Impact**: Transforms the combined weighted value vectors back to the input dimension, integrating attention results into the model.
+
+**gate_proj (gate projection)**
+- **Description**: Typically used in gated mechanisms within neural networks, such as gating units in gated recurrent units (GRUs) or other gating mechanisms.
+- **Impact**: Controls the flow of information through the gate, allowing selective information passage based on learned weights.
+
+**up_proj (up projection)**
+- **Description**: Used for up-projection, typically increasing the dimensionality of the input.
+- **Impact**: Expands the input to a higher-dimensional space, often used in feedforward layers or when transitioning between different layers with differing dimensionalities.
+
+**down_proj (down projection)**
+- **Description**: Used for down-projection, typically reducing the dimensionality of the input.
+- **Impact**: Compresses the input to a lower-dimensional space, useful for reducing computational complexity and controlling the model size.