EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone

EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone
Shraman Pramanick, Yale Song, Sayan Nag, Kevin Qinghong Lin, Hardik Shah, Mike Z. Shou, Rama Chellappa, Pengchuan Zhang
ICCV, 2023
arxiv | project page

TL;DR: We introduce the second generation of egocentric video-language pre-training (EgoVLPv2), a significant improvement from the previous generation, by incorporating cross-modal fusion directly into the video and language backbones.

📢 News

[June, 2024] EgoVLPv2 is awarded as an EgoVis (Egocentric Vision) 2022/2023 Distinguished Paper (news).
[November, 2023] EgoVLPv2 is a strong baseline in several tasks of Ego-Exo4D.
[September, 2023] We release the EgoVLPv2 codebase, checkpoints and features.
[July, 2023] EgoVLPv2 is accepted in ICCV 2023.

📁 Repository Structure

The contents of this repository are structured as follows:

EgoVLPv2
    ├── EgoVLPv2
    │   ├── Pre-training on EgoClip version of Ego4D
    │   ├── Validation on EgoMCQ 
    │   ├── Zero-Shot and fine-tuning on EK-100 MIR
    │   ├── Zero-shot and fine-tuning on Charades-Ego
    │   └── Feature extraction on EgoMQ
    ├── EgoTaskQA
    │   └── Fine-tuning on EgoTaskQA direct and indirect splits
    ├── EgoNLQ
    │   └── Feature extraction and head-tuning on EgoNLQ 
    ├── QFVS
    │   └── Feature extraction and head-tuning on QFVS
    └── EgoMQ
        └── Head-tuning on EgoMQ

Each directory contains data settings, training/inference scripts, and checkpoints. Notably, we provided pre-extracted video and text features to power Ego4D NLQ & MQ challenges.

🛠️ Environment Preparation

conda create -n python=3.8.13 egovlpv2 pip
conda activate egovlpv2
pip install -r requirements.txt

✉️ Contact

This repository is created and maintained by Shraman. Questions and discussions are welcome via [email protected]. We are willing to merge results if EgoVLPv2 is transferred to other egocentric tasks or datasets.

🙏 Acknowledgements

The codebase for this work is built on the EgoVLP, LAVILA, FIBER, and VSLNet repository. We would like to thank the respective authors for their contribution, and the Meta AI team for discussions and feedback.

📄 License

EgoVLPv2 is licensed under a MIT License.

🎓 Citing EgoVLPv2

@article{pramanick2023egovlpv2,
  title={EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone},
  author={Pramanick, Shraman and Song, Yale and Nag, Sayan and Lin, Kevin Qinghong and Shah, Hardik and Shou, Mike Zheng and Chellappa, Rama and Zhang, Pengchuan},
  journal={arXiv preprint arXiv:2307.05463},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
EgoMQ		EgoMQ
EgoNLQ		EgoNLQ
EgoTaskQA		EgoTaskQA
EgoVLPv2		EgoVLPv2
Figures		Figures
QFVS		QFVS
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone

📢 News

📁 Repository Structure

🛠️ Environment Preparation

✉️ Contact

🙏 Acknowledgements

📄 License

🎓 Citing EgoVLPv2

About

Releases

Packages

Contributors 5

Languages

License

facebookresearch/EgoVLPv2

Folders and files

Latest commit

History

Repository files navigation

EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone

📢 News

📁 Repository Structure

🛠️ Environment Preparation

✉️ Contact

🙏 Acknowledgements

📄 License

🎓 Citing EgoVLPv2

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages