Multi-modality Unified Image-sequence Classifier (MUIsC)

Paper: Automatic Generation of Product-Image Sequence in E-commerce
In KDD 2022 ADS

This repo includes the inference code for MUIsC, a main module of our AGPIS (Automatic Generation of Product-Image Sequence) framework for JD.com.

Environment setup

The main code is at ./muisc_inference.py. You can find input data processing code in function build_data; model loading code in function load_model (instantiation of language tower and image tower and their interaction are also here); and inference code is in function inference.
muisc_model.py contains data iterator and image-tower modeling.
Main changes in Hugging Face transformers:
- Make GPT2 support optional cross-attention.
- Fixed a bug in GPT2DoubleHeadsModel's classification head
The model checkpoint is not available for some reason, sorry.

Performance could be further improved if
- Simultaneously predicting product title in the training stage
- Adding product category to its title
- Training lm_head (a linear layer) from scratch

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
example_images_012		example_images_012
transformers		transformers
LICENSE		LICENSE
README.md		README.md
muisc_inference.py		muisc_inference.py
muisc_model.py		muisc_model.py