Skip to content

QDP Studio is a unified framework for deep learning model compression. It combines quantization, pruning, and decomposition to reduce model size, improve inference speed, and maintain accuracy. Its streamlined pipeline for training, compressing, and evaluating models optimizes deployments in resource-constrained environments.

License

Notifications You must be signed in to change notification settings

jaicdev/QDPStudio

Repository files navigation

QDP Studio

QDP Studio is a comprehensive model compression framework designed to optimize deep learning models through multiple advanced techniques: Quantization, Decomposition, Pruning, and Knowledge Distillation. With support for hybrid compression, QDP Studio enables you to significantly reduce model size, accelerate inference, and maintain high accuracy—all while streamlining deployment across various devices.


Features

  • Quantization
    Leverage different quantization strategies to convert high-precision models into lower-bit representations for faster, more efficient inference. Available modes include:

    • default: Standard quantization pipeline.
    • dynamic: Dynamic quantization for runtime optimizations.
    • static: Static quantization using calibration data.
    • qat: Quantization-aware training for higher accuracy.
  • Pruning
    Reduce model complexity by removing redundant weights using various pruning techniques. Available modes include:

    • default: Standard pruning procedure.
    • unstructured: Prune individual weights without structure.
    • structured: Remove entire neurons or filters for hardware efficiency.
    • iterative: Apply pruning in iterative steps with fine-tuning after each step.
  • Decomposition
    Simplify model layers by decomposing weight matrices or tensors. Available modes include:

    • default: Standard decomposition approach.
    • truncatedSVD: Use truncated Singular Value Decomposition to approximate layers.
    • tensorDecomposition: Apply tensor-based decomposition techniques to compress multi-dimensional weights.
  • Knowledge Distillation
    Transfer knowledge from a large pre-trained network (teacher) to a smaller network (student). Available modes include:

    • default: Standard distillation procedure.
    • teacher_assisted: Enhanced teacher assistance through additional supervision.
    • temperature_scaling: Use temperature scaling to soften outputs and improve transfer.
  • Hybrid Compression Pipeline
    Apply all supported compression techniques sequentially in one unified pipeline. This hybrid approach maximizes the benefits of each method, ensuring optimal trade-offs between efficiency and accuracy.

  • Comprehensive Evaluation
    Evaluate models using key metrics—including accuracy, inference time, and model size—to directly compare the original and compressed versions.

  • Custom Model & Dataset Support
    Import and utilize your own custom models and datasets. Provide a custom model file path or a custom Python dataset module (which must implement a get_custom_dataset() function returning (train_dataset, val_dataset)).


Getting Started

Prerequisites

Installation

  1. Clone the Repository:

    git clone https://github.com/jaicdev/QDPStudio.git
    cd QDPStudio
  2. Create a Virtual Environment:

    python -m venv venv
    source venv/bin/activate  # On Windows use: venv\Scripts\activate
  3. Install Dependencies:

    pip install -r requirements.txt
  4. Configuration:

    Edit the config.yaml file to set model parameters, device preference, batch size, learning rate, number of epochs, and compression settings (e.g., prune ratio). Example:

    device: "cuda"      # Options: "cuda", "cpu", "mps"
    model_name: "resnet18"
    pretrained: true
    hf_model_name: null
    timm_model_name: null
    prune_ratio: 0.2

Usage

Command-Line Interface

QDP Studio is controlled via main.py, which provides a command-line interface to select the dataset and compression techniques.

Example Command:

python main.py --dataset CIFAR10 --prune --quantize --decompose

This command will:

  • Train a model (default: ResNet18) on the CIFAR10 dataset.
  • Apply pruning, quantization, and decomposition using the selected modes.
  • Evaluate and compare the performance of the original and compressed model variants.

Key Arguments:

  • --dataset: Specify the standard dataset (e.g., CIFAR10, MNIST, ImageNet).
  • --custom_dataset: Python module name for a custom dataset (must implement a get_custom_dataset() function).
  • --batch_size: Define the batch size for training and evaluation.
  • --custom_model: Path to a custom model file (overrides standard model loading via --model_name).
  • --model_name: Name of a torchvision model (default: "resnet18").
  • --prune: Apply pruning.
  • --quantize: Apply quantization.
  • --decompose: Apply decomposition.
  • --all: Run all compression techniques sequentially (hybrid approach).
  • --num_epochs: Number of training epochs.
  • --quantization_mode: Set quantization mode (default | dynamic | static | qat).
  • --pruning_mode: Set pruning mode (default | unstructured | structured | iterative).
  • --decomposition_mode: Set decomposition mode (default | truncatedSVD | tensorDecomposition).
  • --kd_mode: Set knowledge distillation mode (default | teacher_assisted | temperature_scaling).

Using a Custom Model

If you have a custom model file, use the --custom_model argument:

python main.py --custom_model path/to/your/custom_model.pth --dataset CIFAR10 --prune --quantize --decompose --num_epochs 5

Using a Custom Dataset

Create a Python module (e.g., my_dataset.py) that implements a get_custom_dataset() function. For example:

def get_custom_dataset():
    from torchvision.datasets import FakeData
    from torchvision.transforms import ToTensor
    train_dataset = FakeData(transform=ToTensor())
    val_dataset = FakeData(transform=ToTensor())
    return train_dataset, val_dataset

Then run:

python main.py --custom_dataset my_dataset --custom_model path/to/your/custom_model.pth --prune --quantize --decompose --num_epochs 5

Hybrid Compression Pipeline

Hybrid compression applies all supported techniques sequentially:

  1. Model Training:
    Train the base model on your chosen dataset to ensure a strong initial performance.

  2. Sequential Compression:

    • Pruning: Remove redundant weights using the selected pruning mode.
    • Quantization: Convert model weights to lower precision with the chosen quantization strategy.
    • Decomposition: Simplify model layers using the preferred decomposition method.
    • Knowledge Distillation: Optionally, further compress the model by transferring knowledge using the selected distillation approach.
  3. Post-Compression Fine-Tuning:
    Fine-tune after each compression step to mitigate any loss in accuracy.

  4. Evaluation:
    Compare key metrics—including accuracy, inference time, and model size—between the original and compressed models.

Run the Hybrid Pipeline using the --all flag:

python main.py --dataset CIFAR10 --all

Logging & Evaluation

  • Logging is implemented via Python’s logging module, with optional integration using Weights & Biases (wandb) for comprehensive experimental tracking.
  • The framework evaluates models on metrics including accuracy, precision, recall, F1-score, and inference latency.
  • Detailed logging enables monitoring the impact of each compression technique and mode.

Troubleshooting & Tips

  • Configuration Issues:
    Ensure your config.yaml is properly formatted. Invalid configurations may result in runtime errors.

  • Custom Module Integration:
    Verify that any custom dataset module is on your Python path and implements the required get_custom_dataset() function.

  • Fusion Configurations:
    For optimal quantization, consider defining a custom fusion configuration mapping if the default does not meet your model's needs.

  • Testing:
    It is recommended to perform end-to-end tests to ensure that all components integrate seamlessly within the compression pipeline.


Acknowledgements

  • Supported by the Science, Technology, and Innovation (STI) Policy of Gujarat Council of Science and Technology, Department of Science and Technology, Government of Gujarat, India (Grant Number: GUJCOST/STI/2021-22/3858).
  • Special thanks to the communities behind PyTorch, Torchvision, TIMM, and Transformers.
  • If you use this repository in your work, please cite:
@misc{jaicdev2025qdpstudio,
  author       = {jaicdev},
  title        = {QDPStudio: A Comprehensive Model Compression Framework},
  year         = {2025},
  publisher    = {GitHub},
  howpublished = {\url{https://github.com/jaicdev/QDPStudio}},
  note         = {Released: February 18, 2025}
}
@ARTICLE{chaudhari2025onboard,
  author={Chaudhari, Jay N. and Galiyawala, Hiren and Sharma, Paawan and Shukla, Pancham and Raval, Mehul S.},
  journal={IEEE Access}, 
  title={Onboard Person Retrieval System With Model Compression: A Case Study on Nvidia Jetson Orin AGX}, 
  year={2025},
  volume={13},
  number={},
  pages={8257-8269},
  doi={10.1109/ACCESS.2025.3527134},
  ISSN={2169-3536},
  month={}
}

Contributing

Contributions are welcome! To contribute:

  1. Fork the repository.
  2. Create a new branch:
    git checkout -b feature/my-new-feature
  3. Commit your changes:
    git commit -am 'Add new feature'
  4. Push the branch:
    git push origin feature/my-new-feature
  5. Open a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

QDP Studio is a unified framework for deep learning model compression. It combines quantization, pruning, and decomposition to reduce model size, improve inference speed, and maintain accuracy. Its streamlined pipeline for training, compressing, and evaluating models optimizes deployments in resource-constrained environments.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages