You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for the awesome GLIP, I share our recent work 🦖OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion.
OV-DINO is a novel unified open vocabulary detection approach that offers superior performance and effectiveness for practical real-world application.
OV-DINO entails a Unified Data Integration pipeline that integrates diverse data sources for end-to-end pre-training, and a Language-Aware Selective Fusion module to improve the vision-language understanding of the model.
OV-DINO shows significant performance improvement on COCO and LVIS benchmarks compared to previous methods, achieving relative improvements of +4.3% AP on COCO and +14.1% AP on LVIS compared to GLIP in zero-shot evaluation.
We have released the evaluation, fine-tuning, demo code in our project, feel free to try our model for your application.
@crazness OV-DINO is pre-trained on diverse data sources within a unified framework, including O365, GoldG, CC1M‡ datasets. O365 and GoldG datasets are same with GLIP, CC1M‡ is only 1M image-text pairs much less than Cap4M / Cap24M in GLIP, but OV-DINO achieve better performance. And OV-DINO has 166M paprameters, while GLIP is 232M paprameters. You could find more detail in our Paper.
Thanks for the awesome GLIP, I share our recent work 🦖OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion.
OV-DINO is a novel unified open vocabulary detection approach that offers superior performance and effectiveness for practical real-world application.
OV-DINO entails a Unified Data Integration pipeline that integrates diverse data sources for end-to-end pre-training, and a Language-Aware Selective Fusion module to improve the vision-language understanding of the model.
OV-DINO shows significant performance improvement on COCO and LVIS benchmarks compared to previous methods, achieving relative improvements of +4.3% AP on COCO and +14.1% AP on LVIS compared to GLIP in zero-shot evaluation.
We have released the evaluation, fine-tuning, demo code in our project, feel free to try our model for your application.
Project: https://wanghao9610.github.io/OV-DINO
Paper: https://arxiv.org/abs/2407.07844
Code: https://github.com/wanghao9610/OV-DINO
Demo: http://47.115.200.157:7860/
Welcome everyone to try our model and feel free to raise issue if you encounter any problem.
The text was updated successfully, but these errors were encountered: