Skip to content

XingtongGe/PreprocessingICM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Preprocessing Enhanced Image Compression for Machine Vision

License

[paper] [code]

Guo Lu, Xingtong Ge, Tianxiong Zhong, Qiang Hu, Jing Geng

IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2024

Abstract

Recently, more and more images are compressed and sent to the back-end devices for machine analysis tasks (e.g., object detection) instead of being purely watched by humans. However, most traditional or learned image codecs are designed to minimize the distortion of the human visual system without considering the increased demand from machine vision systems. In this work, we propose a preprocessing enhanced image compression method for machine vision tasks to address this challenge. Instead of relying on the learned image codecs for end-to-end optimization, our framework is built upon the traditional non-differential codecs, which means it is standard compatible and can be easily deployed in practical applications. Specifically, we propose a neural preprocessing module before the encoder to maintain the useful semantic information for the downstream tasks and suppress the irrelevant information for bitrate saving. Furthermore, our neural preprocessing module is quantization adaptive and can be used in different compression ratios. More importantly, to jointly optimize the preprocessing module with the downstream machine vision tasks, we introduce the proxy network for the traditional non-differential codecs in the back-propagation stage. We provide extensive experiments by evaluating our compression method for several representative downstream tasks with different backbone networks. Experimental results show our method achieves a better trade-off between the coding bitrate and the performance of the downstream machine vision tasks by saving about 20% bitrate.

visual

News

  • 2024/8: πŸ”₯ Our paper has been accepted by IEEE TCSVT !

  • 2022/7: πŸŽ‰ I won the 2022 Outstanding Graduation Thesis Award of Beijing Institute of Technology(BIT) for this work !

Quick Started

Requirements

libbpg

mmcv-1.7.0, mmdetection-2.19.1, mmclassification, etc.

Folders

mmdetection_toward

This is the complete model training and testing framework without the QA module.

To set up the environment, follow the official installation guide from MMDetection v2.19.1:
Prerequisites β€” MMDetection 2.19.1 documentation

After installing mmcv and installing mmdetection_toward locally, one modification is required in the mmcv source code:

In /mmcv/image/photometric.py, update the imnormalize_ function by replacing the image normalization logic with simple normalization (scaling to [0, 1]). Modify the function as follows:

def imnormalize_(img, mean, std, to_rgb=True):
    """Inplace normalize an image with mean and std."""
    assert img.dtype != np.uint8
    mean = np.float64(mean.reshape(1, -1))
    stdinv = 1 / np.float64(std.reshape(1, -1))
    if to_rgb:
        cv2.cvtColor(img, cv2.COLOR_BGR2RGB, img)
    # cv2.subtract(img, mean, img)
    # cv2.multiply(img, stdinv, img)
    img /= 255.0
    return img

Once this change is made, training and testing can proceed as described below.

Training and Testing

Training and evaluation are based on the MMDetection framework.

Using FCOS as an example, we start with the config file:
configs/fcos/fcos_x101_64x4d_fpn_gn-head_mstrain_640-800_2x_coco.py

This config file should be customized for NPP training, including:

  • Loading a pretrained model
  • Setting the number of training steps
  • Choosing optimizer and learning rate
  • Adjusting the loss weights (inherited from ./fcos_r50_caffe_fpn_gn-head_1x_coco.py)

The total loss function is defined as:
Loss = Bpp + 0.5 * Loss_pre + Ξ» * L_FCOS
You should set the Ξ» value in the config file before training.

Training

CUDA_VISIBLE_DEVICES=0 python tools/train.py \
  configs/fcos/fcos_x101_64x4d_fpn_gn-head_mstrain_640-800_2x_coco.py \
  -p proxy_network_path \
  -pre npp_pretrained_path(Optional) \
  --work-dir work_dir_path 

The trained .pth model will be saved in the specified working directory.

Testing

CUDA_VISIBLE_DEVICES=0 python tools/test.py \
  configs/fcos/fcos_x101_64x4d_fpn_gn-head_mstrain_640-800_2x_coco.py \
  pretrained_model_path \
  --eval bbox  # Evaluated using COCO mAP metrics

Core Source Code Overview

mmdet
β”œβ”€β”€ core
β”œβ”€β”€ datasets
β”œβ”€β”€ models
β”‚   β”œβ”€β”€ backbones
β”‚   β”‚   β”œβ”€β”€ towards
β”‚   β”‚   β”‚   β”œβ”€β”€ ImageCompression (Proxy Network)
β”‚   β”‚   β”‚   β”œβ”€β”€ models (NPP)
β”‚   β”‚   β”œβ”€β”€ resnet

The NPP_Codec is integrated into the MMDetection framework.
Key logic is implemented in the ResNet class of mmdetection_toward/mmdet/models/backbones/resnet.py.

In the __init__ method, both the proxy network and NPP are initialized. The gradients of the proxy network are frozen:

self.pre_processor = Preprocessor()
self.image_compressor = ImageCompressor(192, 192)
for param in self.image_compressor.parameters():
    param.requires_grad = False

The forward pass includes preprocessing, encoding/decoding using the proxy network and BPG.

    def forward(self, x):
        # Training
        # Preprocessing and Proxy Network
        img_num = x.size(0) # εΎ—εˆ°batch_size
        ori_imgs = x
        x = self.pre_processor(x)
        x = torch.clamp(x, 0., 1.)
        learned_img = x
        nips_img, mse_loss, bpp_feature, bpp_z, bpp = self.image_compressor(learned_img)
        nips_bpp = bpp
        new_imgs = []
        bpg_bpps = []
        for img_i in range(img_num):
            now_img = learned_img[img_i]
            img_name = str(img_i)
            jpg_path = '/Path/to/mmdetection_toward/testimg/temppng/'+img_name+'_oldtrain.png'
            torchvision.utils.save_image(now_img,jpg_path)
            # BPG encoding/decoding
            bpg_path = '/Path/to/mmdetection_toward/testimg/tempbpg/'+img_name+'train.bpg'
            convert_jpg_bpg ='bpgenc -q 34 -o ' +bpg_path +' '+jpg_path
            os.system(convert_jpg_bpg)
            new_jpg_path = '/Path/to/mmdetection_toward/testimg/temppng/'+img_name+'_newtrain.png'
            convert_bpg_jpg = 'bpgdec -o '+new_jpg_path +" "+ bpg_path
            os.system(convert_bpg_jpg)
            new_img =  torch.from_numpy(np.asarray(Image.open(new_jpg_path))).cuda(x.device).permute(2,0,1)
            cmd = 'cat ' + bpg_path + ' |wc -c'
            byte = os.popen(cmd).read()
            byte = int(byte)
            bpp = (float)(byte*8)/(float)(new_img.size(1)*new_img.size(2))
            # print(bpp)
            bpg_bpps.append(bpp)
            new_imgs += [new_img]
            del new_img
        new_imgs = torch.stack(new_imgs,dim=0).float()
        bpg_bpps = torch.tensor(bpg_bpps).cuda()
        nips_img *= 255.0
        nips_img.data = new_imgs
        nips_bpp.data = bpg_bpps
        x = transform(nips_img)
        

        """Forward function."""
        if self.deep_stem:
            x = self.stem(x)
        else:
            x = self.conv1(x)
            x = self.norm1(x)
            x = self.relu(x)
        x = self.maxpool(x)
        outs = []
        for i, layer_name in enumerate(self.res_layers):
            res_layer = getattr(self, layer_name)
            x = res_layer(x)
            if i in self.out_indices:
                outs.append(x)
        # forward the learned_img, ori_imgs and bpp
        outs.append(learned_img)
        outs.append(ori_imgs)
        outs.append(nips_bpp)
        return tuple(outs)

mmdetection_variable

This version builds on the previous setup by incorporating the QA module into the NPP.
Key training logic is also updated to support QA-aware learning.

mmdet
β”œβ”€β”€ core
β”œβ”€β”€ datasets
β”œβ”€β”€ models
β”‚   β”œβ”€β”€ backbones
β”‚   β”‚   β”œβ”€β”€ towards
β”‚   β”‚   β”‚   β”œβ”€β”€ ImageCompression (Proxy Network)
β”‚   β”‚   β”‚   β”œβ”€β”€ models (NPP) with Scaling_Net
β”‚   β”‚   β”œβ”€β”€ resnet

The training process remains the same, but there is no need to manually set the loss weight for Ξ».

NPP_Codec

This directory contains the standalone NPP module (without the QA module) along with test code for image processing using the NPP module.

NPP_Codec_Variable

This version includes the QA (Quantization-Aware) module integrated into the NPP source code.

Acknowledgments

Our code was developed based on mmcv. This is a concise and easily extensible Computer Vision library.

Citation

If you find our work useful or relevant to your research, please kindly cite our paper:

@ARTICLE{10632166,
  author={Lu, Guo and Ge, Xingtong and Zhong, Tianxiong and Hu, Qiang and Geng, Jing},
  journal={IEEE Transactions on Circuits and Systems for Video Technology}, 
  title={Preprocessing Enhanced Image Compression for Machine Vision}, 
  year={2024},
  volume={34},
  number={12},
  pages={13556-13568},
  keywords={Image coding;Task analysis;Codecs;Machine vision;Bit rate;Optimization;Neural networks;Image compression;machine vision;preprocessing;deep learning},
  doi={10.1109/TCSVT.2024.3441049}}

About

[IEEE TCSVT 2024] Preprocessing Enhanced Image Compression for Machine Vision

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published