A Image Captioning system capable of generating caption for single images.
- Uses Transformer architecture.
- Integrates EfficientNetB0 for image embeddings and employs a vocabulary size of 10,000.
- Achieving 73% accuracy on the Flicker8k dataset
- Deployed using Gradio
- Install all the packages
python3 -m pip install -r requirements.txt
- Run the Gradio app
gradio app.py