Skip to content

Code release for "EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone" [ICCV, 2023]

License

Notifications You must be signed in to change notification settings

facebookresearch/EgoVLPv2

Repository files navigation

EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone

PWC PWC PWC PWC PWC

EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone
Shraman Pramanick, Yale Song, Sayan Nag, Kevin Qinghong Lin, Hardik Shah, Mike Z. Shou, Rama Chellappa, Pengchuan Zhang
ICCV, 2023
arxiv | project page

TL;DR: We introduce the second generation of egocentric video-language pre-training (EgoVLPv2), a significant improvement from the previous generation, by incorporating cross-modal fusion directly into the video and language backbones.

EgoVLPv2

📢 News

  • [June, 2024] EgoVLPv2 is awarded as an EgoVis (Egocentric Vision) 2022/2023 Distinguished Paper (news).
  • [November, 2023] EgoVLPv2 is a strong baseline in several tasks of Ego-Exo4D.
  • [September, 2023] We release the EgoVLPv2 codebase, checkpoints and features.
  • [July, 2023] EgoVLPv2 is accepted in ICCV 2023.

📁 Repository Structure

The contents of this repository are structured as follows:

EgoVLPv2
    ├── EgoVLPv2
    │   ├── Pre-training on EgoClip version of Ego4D
    │   ├── Validation on EgoMCQ 
    │   ├── Zero-Shot and fine-tuning on EK-100 MIR
    │   ├── Zero-shot and fine-tuning on Charades-Ego
    │   └── Feature extraction on EgoMQ
    ├── EgoTaskQA
    │   └── Fine-tuning on EgoTaskQA direct and indirect splits
    ├── EgoNLQ
    │   └── Feature extraction and head-tuning on EgoNLQ 
    ├── QFVS
    │   └── Feature extraction and head-tuning on QFVS
    └── EgoMQ
        └── Head-tuning on EgoMQ 

Each directory contains data settings, training/inference scripts, and checkpoints. Notably, we provided pre-extracted video and text features to power Ego4D NLQ & MQ challenges.

🛠️ Environment Preparation

conda create -n python=3.8.13 egovlpv2 pip
conda activate egovlpv2
pip install -r requirements.txt

✉️ Contact

This repository is created and maintained by Shraman. Questions and discussions are welcome via [email protected]. We are willing to merge results if EgoVLPv2 is transferred to other egocentric tasks or datasets.

🙏 Acknowledgements

The codebase for this work is built on the EgoVLP, LAVILA, FIBER, and VSLNet repository. We would like to thank the respective authors for their contribution, and the Meta AI team for discussions and feedback.

📄 License

EgoVLPv2 is licensed under a MIT License.

🎓 Citing EgoVLPv2

@article{pramanick2023egovlpv2,
  title={EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone},
  author={Pramanick, Shraman and Song, Yale and Nag, Sayan and Lin, Kevin Qinghong and Shah, Hardik and Shou, Mike Zheng and Chellappa, Rama and Zhang, Pengchuan},
  journal={arXiv preprint arXiv:2307.05463},
  year={2023}
}

About

Code release for "EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone" [ICCV, 2023]

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published