Automated Image Captioning with AI

Project Overview

The Automated Image Captioning with AI project is designed to automatically generate descriptive captions for images extracted from a given website URL or a local folder. Leveraging state-of-the-art machine learning models, this project provides an interactive web interface using Gradio, making it easy for users to input a URL or select images from a folder, generate captions, and save the results in any format to enable modifying later as well.

Key Features

Image Extraction: Automatically extracts image URLs from the provided website URL.
Caption Generation: Utilizes the Salesforce/blip-image-captioning-large model to generate descriptive captions for each image.
Interactive Interface: Provides a user-friendly interface using Gradio for easy interaction.
Modify Captions: Allows users to edit the generated captions.
Save Captions: Allows users to save the generated captions to a text file.
Clear Interface: Includes a "Clear" button to reset the interface and clear all data.

Practical Applications

Enhanced Accessibility: Helps visually impaired individuals understand visual content through descriptive captions.
Improved SEO: Assists search engines in identifying the content of images, improving the article's SEO.
Content Discovery: Enables efficient analysis and categorization of large image databases.
Social Media and Advertising: Automates engaging description generation for visual content.
Education and Research: Assists in understanding and interpreting visual materials.
Data Organization: Helps manage and categorize large sets of visual data.
Time-Saving: Automated captioning is more efficient than manual efforts.
User Engagement: Detailed captions can make visual content more engaging and informative.

Development Process

Situation

Business scenario on news and media:

A news agency publishes hundreds of articles daily, each containing several images relevant to the story. Writing appropriate and descriptive captions for each image manually is a tedious task and might slow down the publication process. The agency needed a solution to expedite this process while ensuring the captions were accurate and contextually relevant.

Task

Our task was to develop an automated image captioning tool that could generate descriptive captions for images extracted from a given website URL or a folder. The tool needed to be user-friendly, efficient, and capable of producing high-quality captions that enhance accessibility and improve SEO.

Action

We implemented an automated image captioning program that works directly from a URL or a folder. The user provides the URL or selects images from a folder, and the code generates captions for the images found. The output is a text file that includes all the image URLs or image file names along with their respective captions.

Result

By integrating this automated image captioning tool, the news agency is able to expedite its publication process significantly. The tool ensures that all images come with appropriate descriptions, enhancing accessibility for visually impaired readers and improving the website's SEO. This broadens the agency's reach and engagement with a more diverse audience base.

Usage Instructions

Installation

Clone the Repository:

git clone https://github.com/prgrmcode/caption_photos_with_GenAI.git
cd caption_photos_with_GenAI

Install Dependencies:
```
pip install -r requirements.txt
```
Run the Application:
```
python image_captioner_advanced.py
```

Step-by-Step Guide

Enter Website URL:
- Input the URL of the website containing images you wish to caption.
Generate Captions:
- Click the "Generate Captions" button to start the process.
Review and Edit Captions:
- Once processing is complete, images and their captions will be displayed. You can modify the captions as needed.
Save Captions:
- Click the "Save Captions" button to save the modified captions to a text file.
Clear Interface:
- Click the "Clear" button to reset the interface and clear all data.

Visuals

URL Captioning Interface

Folder Captioning Interface

Contributing

Contributions are welcome! Please feel free to submit a Pull Request or open an Issue to improve the project.

How to Contribute

Fork the Repository:
- Click the "Fork" button at the top right corner of the repository page.

Clone the Forked Repository:

git clone <your_forked_repository_url>
cd <repository_name>

Create a New Branch:
```
git checkout -b <branch_name>
```
Make Your Changes:
- Implement your changes and commit them with a descriptive message.
Push Your Changes:
```
git push origin <branch_name>
```
Create a Pull Request:
- Go to the original repository and click the "New Pull Request" button.

Future Enhancements

Multilingual Support

Expanding the tool to support multiple languages can make it more versatile and useful for international audiences.

Real-Time Captioning

Integrating real-time captioning capabilities can enhance the tool's applicability in live events and streaming platforms.

Advanced Customization

Allowing users to customize the captioning model and parameters can provide more control over the generated captions, catering to specific needs and preferences.

Acknowledgements

Gradio: For providing an easy-to-use interface for machine learning applications.
Transformers: For the powerful pre-trained models.
Pillow: For image processing capabilities.
Requests: For handling HTTP requests.
BeautifulSoup: For parsing HTML and extracting image URLs.

License

This project is licensed under the MIT License.

Deploy the app with IBM Cloud and IBM Code Engine

Benefits of Using IBM Code Engine

Scalability: Automatically scales your application based on demand.
Simplicity: Simplifies the deployment process with a fully managed, serverless platform.
Serverless Features: Eliminates the need to manage infrastructure, allowing you to focus on your application.

Deploying your application on IBM Cloud using IBM Code Engine allows you to run your containerized workloads seamlessly. Follow these steps to deploy your app:

Step-by-Step Deployment Guide

Step 1: Create the Container Image

Create a Directory for Your App:
```
mkdir myapp
cd myapp
```

Create Required Files:

touch demo.py Dockerfile requirements.txt

Create requirements.txt: List all the dependencies your app needs. You can use pip freeze to generate this file.
```
pip freeze > requirements.txt
```
Create demo.py: Write a simple Gradio web application in this file.

Create Dockerfile:

FROM python:3.10

WORKDIR /app
COPY requirements.txt requirements.txt
RUN pip3 install --no-cache-dir -r requirements.txt

COPY . .
CMD ["python", "demo.py"]

Step 2: Build and Push the Container Image

Log in to IBM Cloud:
```
ibmcloud login
```
Target the appropriate region and resource group:
```
ibmcloud target --cf
```

Create a Code Engine project:

ibmcloud ce project create --name myproject

Set the project context:

ibmcloud ce project select --name myproject

Build the container image:

ibmcloud ce build create --name build-local-dockerfile --build-type local --size large --image us.icr.io/${SN_ICR_NAMESPACE}/myapp --registry-secret icr-secret

Submit and run the build configuration:

ibmcloud ce buildrun submit --name buildrun-local-dockerfile --build build-local-dockerfile --source .

Step 3: Deploy the Containerized App

Create the application:

ibmcloud ce application create --name demo --image us.icr.io/${SN_ICR_NAMESPACE}/myapp --registry-secret icr-secret --es 2G --port 7860 --minscale 1

Access your application:

ibmcloud ce app get --name demo --output url

This means your app has been deployed and you can access it now! To obtain the URL of your app, run ibmcloud ce app get --name demo1 --output url.

Click on the URL returned, and you should be able to see your app running in your browser!

Troubleshooting Tips

Check Build Status: Use ibmcloud ce buildrun get -n buildrun-local-dockerfile to monitor the build progress.
Verify Application Logs: Use ibmcloud ce application logs --name demo to check the application logs for any errors.
Consult IBM Documentation: Refer to the IBM Code Engine Documentation for detailed guidance and troubleshooting.

Conclusion

Deploying your application on IBM Cloud using Gradio and IBM Code Engine is a seamless process that leverages the power of containerization and serverless technology. This comprehensive guide has walked you through creating a container image, building and pushing the image, and deploying the application on IBM Code Engine. By following these steps, you can ensure a scalable, efficient, and professional deployment for your AI-powered image captioning tool.

This comprehensive process, from local testing to cloud deployment, highlights the seamless integration of Gradio, Docker, and IBM Code Engine, culminating in a successful application deployment on IBM Cloud.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
myapp		myapp
test_scripts		test_scripts
.gitignore		.gitignore
BuildrunSucceeded.png		BuildrunSucceeded.png
Folder-captioning.PNG		Folder-captioning.PNG
Give Meaningful Names to Your Photos with AI.pdf		Give Meaningful Names to Your Photos with AI.pdf
PROJECT.PNG		PROJECT.PNG
README.md		README.md
URL-caption-editing.PNG		URL-caption-editing.PNG
URL-caption-saving.PNG		URL-caption-saving.PNG
URL-captioning.PNG		URL-captioning.PNG
advanced_image_captioner.gif		advanced_image_captioner.gif
advanced_image_captioner_final.gif		advanced_image_captioner_final.gif
automated_image_captioner.py		automated_image_captioner.py
automated_url_captioner.py		automated_url_captioner.py
caption_with_BLIP.py		caption_with_BLIP.py
captions_corrected.txt		captions_corrected.txt
captions_folder_images.txt		captions_folder_images.txt
captions_url.txt		captions_url.txt
demo-codeengine.png		demo-codeengine.png
demo-codeengine2.png		demo-codeengine2.png
example_img.jpg		example_img.jpg
gradio_interface.py		gradio_interface.py
image_captioner_advanced.py		image_captioner_advanced.py
image_captioning.py		image_captioning.py
image_captioning_app.py		image_captioning_app.py
requirements.txt		requirements.txt
url_captions.csv		url_captions.csv
visual_QA_with_BLIP.py		visual_QA_with_BLIP.py
wiki_captions_url.txt		wiki_captions_url.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automated Image Captioning with AI

Table of Contents

Project Overview

Key Features

Practical Applications

Development Process

Situation

Task

Action

Result

Usage Instructions

Installation

Step-by-Step Guide

Visuals

URL Captioning Interface

Folder Captioning Interface

Contributing

How to Contribute

Future Enhancements

Multilingual Support

Real-Time Captioning

Advanced Customization

Acknowledgements

License

Deploy the app with IBM Cloud and IBM Code Engine

Benefits of Using IBM Code Engine

Step-by-Step Deployment Guide

Step 1: Create the Container Image

Step 2: Build and Push the Container Image

Step 3: Deploy the Containerized App

Troubleshooting Tips

Conclusion

About

Releases

Packages

Languages

prgrmcode/caption_photos_with_GenAI

Folders and files

Latest commit

History

Repository files navigation

Automated Image Captioning with AI

Table of Contents

Project Overview

Key Features

Practical Applications

Development Process

Situation

Task

Action

Result

Usage Instructions

Installation

Step-by-Step Guide

Visuals

URL Captioning Interface

Folder Captioning Interface

Contributing

How to Contribute

Future Enhancements

Multilingual Support

Real-Time Captioning

Advanced Customization

Acknowledgements

License

Deploy the app with IBM Cloud and IBM Code Engine

Benefits of Using IBM Code Engine

Step-by-Step Deployment Guide

Step 1: Create the Container Image

Step 2: Build and Push the Container Image

Step 3: Deploy the Containerized App

Troubleshooting Tips

Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages