- Project Overview
- Key Features
- Practical Applications
- Development Process
- Usage Instructions
- Visuals
- Contributing
- Future Enhancements
- Acknowledgements
- License
- Deploy the App with IBM Cloud and IBM Code Engine
The Automated Image Captioning with AI project is designed to automatically generate descriptive captions for images extracted from a given website URL or a local folder. Leveraging state-of-the-art machine learning models, this project provides an interactive web interface using Gradio, making it easy for users to input a URL or select images from a folder, generate captions, and save the results in any format to enable modifying later as well.
- Image Extraction: Automatically extracts image URLs from the provided website URL.
- Caption Generation: Utilizes the
Salesforce/blip-image-captioning-large
model to generate descriptive captions for each image. - Interactive Interface: Provides a user-friendly interface using Gradio for easy interaction.
- Modify Captions: Allows users to edit the generated captions.
- Save Captions: Allows users to save the generated captions to a text file.
- Clear Interface: Includes a "Clear" button to reset the interface and clear all data.
- Enhanced Accessibility: Helps visually impaired individuals understand visual content through descriptive captions.
- Improved SEO: Assists search engines in identifying the content of images, improving the article's SEO.
- Content Discovery: Enables efficient analysis and categorization of large image databases.
- Social Media and Advertising: Automates engaging description generation for visual content.
- Education and Research: Assists in understanding and interpreting visual materials.
- Data Organization: Helps manage and categorize large sets of visual data.
- Time-Saving: Automated captioning is more efficient than manual efforts.
- User Engagement: Detailed captions can make visual content more engaging and informative.
Business scenario on news and media:
A news agency publishes hundreds of articles daily, each containing several images relevant to the story. Writing appropriate and descriptive captions for each image manually is a tedious task and might slow down the publication process. The agency needed a solution to expedite this process while ensuring the captions were accurate and contextually relevant.
Our task was to develop an automated image captioning tool that could generate descriptive captions for images extracted from a given website URL or a folder. The tool needed to be user-friendly, efficient, and capable of producing high-quality captions that enhance accessibility and improve SEO.
We implemented an automated image captioning program that works directly from a URL or a folder. The user provides the URL or selects images from a folder, and the code generates captions for the images found. The output is a text file that includes all the image URLs or image file names along with their respective captions.
By integrating this automated image captioning tool, the news agency is able to expedite its publication process significantly. The tool ensures that all images come with appropriate descriptions, enhancing accessibility for visually impaired readers and improving the website's SEO. This broadens the agency's reach and engagement with a more diverse audience base.
-
Clone the Repository:
git clone https://github.com/prgrmcode/caption_photos_with_GenAI.git cd caption_photos_with_GenAI
-
Install Dependencies:
pip install -r requirements.txt
-
Run the Application:
python image_captioner_advanced.py
-
Enter Website URL:
-
Generate Captions:
- Click the "Generate Captions" button to start the process.
-
Review and Edit Captions:
-
Save Captions:
-
Clear Interface:
- Click the "Clear" button to reset the interface and clear all data.
Contributions are welcome! Please feel free to submit a Pull Request or open an Issue to improve the project.
-
Fork the Repository:
- Click the "Fork" button at the top right corner of the repository page.
-
Clone the Forked Repository:
git clone <your_forked_repository_url> cd <repository_name>
-
Create a New Branch:
git checkout -b <branch_name>
-
Make Your Changes:
- Implement your changes and commit them with a descriptive message.
-
Push Your Changes:
git push origin <branch_name>
-
Create a Pull Request:
- Go to the original repository and click the "New Pull Request" button.
Expanding the tool to support multiple languages can make it more versatile and useful for international audiences.
Integrating real-time captioning capabilities can enhance the tool's applicability in live events and streaming platforms.
Allowing users to customize the captioning model and parameters can provide more control over the generated captions, catering to specific needs and preferences.
- Gradio: For providing an easy-to-use interface for machine learning applications.
- Transformers: For the powerful pre-trained models.
- Pillow: For image processing capabilities.
- Requests: For handling HTTP requests.
- BeautifulSoup: For parsing HTML and extracting image URLs.
This project is licensed under the MIT License.
- Scalability: Automatically scales your application based on demand.
- Simplicity: Simplifies the deployment process with a fully managed, serverless platform.
- Serverless Features: Eliminates the need to manage infrastructure, allowing you to focus on your application.
Deploying your application on IBM Cloud using IBM Code Engine allows you to run your containerized workloads seamlessly. Follow these steps to deploy your app:
-
Create a Directory for Your App:
mkdir myapp cd myapp
-
Create Required Files:
touch demo.py Dockerfile requirements.txt
-
Create requirements.txt: List all the dependencies your app needs. You can use
pip freeze
to generate this file.pip freeze > requirements.txt
-
Create
demo.py
: Write a simple Gradio web application in this file. -
Create
Dockerfile
:FROM python:3.10 WORKDIR /app COPY requirements.txt requirements.txt RUN pip3 install --no-cache-dir -r requirements.txt COPY . . CMD ["python", "demo.py"]
-
Log in to IBM Cloud:
ibmcloud login
-
Target the appropriate region and resource group:
ibmcloud target --cf
-
Create a Code Engine project:
ibmcloud ce project create --name myproject
-
Set the project context:
ibmcloud ce project select --name myproject
-
Build the container image:
ibmcloud ce build create --name build-local-dockerfile --build-type local --size large --image us.icr.io/${SN_ICR_NAMESPACE}/myapp --registry-secret icr-secret
-
Submit and run the build configuration:
ibmcloud ce buildrun submit --name buildrun-local-dockerfile --build build-local-dockerfile --source .
-
Create the application:
ibmcloud ce application create --name demo --image us.icr.io/${SN_ICR_NAMESPACE}/myapp --registry-secret icr-secret --es 2G --port 7860 --minscale 1
-
Access your application:
ibmcloud ce app get --name demo --output url
This means your app has been deployed and you can access it now! To obtain the URL of your app, run ibmcloud ce app get --name demo1 --output url
.
Click on the URL returned, and you should be able to see your app running in your browser!
- Check Build Status: Use
ibmcloud ce buildrun get -n buildrun-local-dockerfile
to monitor the build progress. - Verify Application Logs: Use
ibmcloud ce application logs --name demo
to check the application logs for any errors. - Consult IBM Documentation: Refer to the IBM Code Engine Documentation for detailed guidance and troubleshooting.
Deploying your application on IBM Cloud using Gradio and IBM Code Engine is a seamless process that leverages the power of containerization and serverless technology. This comprehensive guide has walked you through creating a container image, building and pushing the image, and deploying the application on IBM Code Engine. By following these steps, you can ensure a scalable, efficient, and professional deployment for your AI-powered image captioning tool.
This comprehensive process, from local testing to cloud deployment, highlights the seamless integration of Gradio, Docker, and IBM Code Engine, culminating in a successful application deployment on IBM Cloud.