Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try OV-DINO, a more powerful open-vocabulary detector. #172

Open
wanghao9610 opened this issue Jul 30, 2024 · 2 comments
Open

Try OV-DINO, a more powerful open-vocabulary detector. #172

wanghao9610 opened this issue Jul 30, 2024 · 2 comments

Comments

@wanghao9610
Copy link

wanghao9610 commented Jul 30, 2024

Thanks for the awesome GLIP, I share our recent work 🦖OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion.

  • OV-DINO is a novel unified open vocabulary detection approach that offers superior performance and effectiveness for practical real-world application.

  • OV-DINO entails a Unified Data Integration pipeline that integrates diverse data sources for end-to-end pre-training, and a Language-Aware Selective Fusion module to improve the vision-language understanding of the model.

  • OV-DINO shows significant performance improvement on COCO and LVIS benchmarks compared to previous methods, achieving relative improvements of +4.3% AP on COCO and +14.1% AP on LVIS compared to GLIP in zero-shot evaluation.

We have released the evaluation, fine-tuning, demo code in our project, feel free to try our model for your application.

Project: https://wanghao9610.github.io/OV-DINO

Paper: https://arxiv.org/abs/2407.07844

Code: https://github.com/wanghao9610/OV-DINO

Demo: http://47.115.200.157:7860/

Welcome everyone to try our model and feel free to raise issue if you encounter any problem.

@crazness
Copy link

crazness commented Aug 1, 2024

How much data did you use to train the model, and what is the number of parameters?

@wanghao9610
Copy link
Author

@crazness OV-DINO is pre-trained on diverse data sources within a unified framework, including O365, GoldG, CC1M‡ datasets. O365 and GoldG datasets are same with GLIP, CC1M‡ is only 1M image-text pairs much less than Cap4M / Cap24M in GLIP, but OV-DINO achieve better performance. And OV-DINO has 166M paprameters, while GLIP is 232M paprameters. You could find more detail in our Paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants