Skip to content

AnanthaRajuC/LLM-Vision-Capabilities

🌾📸🎙🧠 Voice-Enabled Semantic Crop Intelligence

Search, identify, and explore crops using images, voice, and natural language.

Banner GIF

An intelligent, multi-modal crop analysis and search system built using advanced Vision-Language Models (VLMs) and semantic search techniques.

This tool can detect crops from images, analyze their characteristics, and search similar crop types using natural language or voice.

🧪 Example Use Cases

  • Upload photo → Get crop analysis and description
  • “Show me crops with red flowers and green stems” → Voice or text → View matching images

🚀 Features

🌿 Crop Detection & Analysis

  • Automatically identifies crops from images using Qwen 2.5 Vision, a powerful multimodal large language model.

🔍 Semantic Search

  • Search for similar crops using:
    • Text-based queries (e.g., “green leafy crop with wide leaves”)
    • Powered by CLIP-like embeddings and cosine similarity search using ClickHouse

🎙️ Voice-Based Querying

  • Allows users to speak their search queries naturally instead of typing
  • Voice is recorded using sounddevice and saved via scipy.io.wavfile
  • Audio is transcribed using OpenAI's Whisper via the whisper Python package
  • Transcribed text is passed to the semantic search engine for matching crop images

🧠 Technology Stack

Component Technology
Vision-Language Model Qwen 2.5 Vision
Local Model Serving Ollama (self-hosted model runner)
Text-to-Speech / STT Whisper
Embeddings CLIP-style vectors
Database ClickHouse (vector search + metadata)
Storage local FS (for image storage)
Backend Logic Python

contributions welcome Tweet Twitter Follow

Built with ❤︎ by Anantha Raju C and contributors

Explore the docs »

Report Bug · Request Feature

Service Badge Badge Badge Badge Badge
GitHub GitHub last commit GitHub pull requests GitHub issues GitHub forks GitHub stars
GitHub GitHub top language License
Table of Contents
  1. About The Project
  2. Model Recommendation
  3. Smple Output
  4. Contributing
  5. License
  6. Contact

LLM-Vision-Capabilities

The system is designed to:

  • Analyze crop images for health, growth stage, field characteristics, and environmental conditions
  • Generate comprehensive text descriptions for semantic search
  • Create embeddings for similarity search using both text and image features
  • Store everything in ClickHouse for efficient querying

This Python script allows you to identify crops in an image using Ollama server to run vision-enabled LLMs locally, such as llama3.2-vision or qwen2.5vl, without relying on the Hugging Face Transformers library or cloud-based APIs.

It sends an image and a predefined JSON-format prompt to a selected vision model running locally via Ollama, and returns structured information about the crop detected in the image.

By default, it uses a basic prompt, but more detailed prompts (e.g., for disease detection or richer output) can be saved as .txt files inside the assets/ directory. You can create multiple prompt types such as:

These prompts are dynamically loaded and sent to the model, allowing customization without modifying code.

Example JSON Prompt Template

Identify the crop in this image and respond ONLY in the following JSON format:

{
  "crop": "<primary crop name>",
  "alternate_names": ["<alternate name 1>", "<alternate name 2>"],
  "color": ["<color 1>", "<color 2>"],
  "confidence": <confidence score from 0 to 1>
}

If any field is not known, return an empty list or null value as appropriate. Do not include any other text.

(back to top)

Model Recommendation

While the script has been briefly tested with qwen2.5vl:latest and llama3.2-vision:latest, qwen2.5vl:latest is recommended based on local testing due to:

  • Reasonable inference times
  • Reliable structured JSON responses
  • Decent resource usage on a typical commodity laptop

⚠️ Note: These observations are based on running the models locally on a standard laptop. Performance and accuracy may vary depending on your system's hardware (CPU, GPU, RAM, etc.).

(back to top)

Details

(back to top)

Features

  • Uses models like llama3.2-vision and qwen2.5vl via the Ollama API
  • Accepts a local image and outputs structured JSON including:
    • Crop name
    • Alternate crop names
    • Color details
    • Confidence score
    • Metadata like inference time

(back to top)

Demo Image

Demo Image

(back to top)

Output

The result is a structured JSON response, like:

Crop Detection

{
  "crop": "Sugarcane",
  "alternate_names": [
    "Sugar cane",
    "Cane"
  ],
  "color": [
    "Green",
    "Brown"
  ],
  "confidence": 0.95,
  "metadata": {
    "startDateTime": "2025-06-07T20:58:35.196729",
    "endDateTime": "2025-06-07T21:00:36.916434",
    "duration": 121.72
  }
}

Crop Analysis

{
  "crop": "Sugarcane",
  "alternate_names": [
    "Sugar cane",
    "Saccharum officinarum"
  ],
  "color": [
    "green",
    "brown"
  ],
  "confidence": 0.95,
  "overall_description": "The image shows a field of sugarcane with tall, green stalks growing in rows. The field appears to be in a vegetative growth stage, with no visible signs of flowering or fruiting. The soil is visible and appears to be well-tended, indicating a managed agricultural setting.",
  "growth_stage": {
    "stage": "vegetative",
    "estimated_age_months": 6,
    "description": "The sugarcane plants are tall and have a uniform height, indicating they are in the vegetative stage of growth. The presence of young leaves suggests they are not yet mature enough to flower or bear fruit."
  },
  "health_assessment": {
    "overall_health": "good",
    "vigor_score": 0.85,
    "disease_indicators": [
      "empty list"
    ],
    "pest_indicators": [
      "empty list"
    ],
    "stress_indicators": [
      "none_detected"
    ],
    "health_description": "The sugarcane plants appear healthy with no visible signs of disease or pest damage. The leaves are green and there are no signs of yellowing or wilting, indicating good vigor and health."
  },
  "field_characteristics": {
    "planting_pattern": "rows",
    "plant_density": "medium",
    "field_size_estimate": "medium_field",
    "crop_uniformity": "uniform",
    "weed_presence": "none",
    "field_description": "The sugarcane is planted in neat rows, with a consistent spacing between plants. The field appears to be well-maintained, with no visible weeds or other vegetation competing for resources."
  },
  "environmental_context": {
    "setting": "rural",
    "terrain": "flat",
    "surrounding_vegetation": "trees",
    "infrastructure_visible": [
      "irrigation"
    ],
    "weather_conditions": "clear",
    "environment_description": "The field is located in a rural area with a flat terrain and surrounded by trees. There is evidence of irrigation infrastructure, suggesting the field is well-supplied with water. The weather appears clear, indicating favorable growing conditions."
  },
  "growing_conditions": {
    "moisture_level": "adequate",
    "soil_visibility": "clearly_visible",
    "irrigation_evidence": "irrigation",
    "season_indication": "growing_season",
    "conditions_description": "The soil is clearly visible and appears to be well-moistened, indicating adequate irrigation. The growing conditions suggest it is the growing season, with no signs of drought or waterlogging."
  },
  "agricultural_insights": {
    "farming_type": "commercial",
    "management_quality": "good",
    "harvest_readiness": "not_ready",
    "estimated_months_to_harvest": null,
    "management_description": "The sugarcane field is managed with a focus on irrigation, as evidenced by the visible infrastructure. The uniform planting and healthy appearance suggest a good level of management. The field is not yet ready for harvest, as the plants are still in the vegetative stage."
  },
  "recommendations": [
    "Continue with current irrigation practices to ensure adequate moisture levels.",
    "Monitor the field for any signs of pests or diseases and take preventive measures if necessary.",
    "Prepare the field for harvest when the sugarcane reaches the mature stage."
  ],
  "recommendations_summary": "The sugarcane field is in good health and well-managed, with adequate irrigation and uniform planting. The field is not yet ready for harvest, and continued monitoring and irrigation practices are recommended to ensure optimal growth and yield.",
  "image_metadata": {
    "image_quality": "good",
    "lighting_conditions": "natural_daylight",
    "viewing_angle": "ground_level",
    "coverage_area": "field_overview",
    "visual_description": "The image provides a clear overview of the sugarcane field, showing the rows of plants and the surrounding environment."
  },
  "semantic_tags": [
    "sugarcane",
    "vegetative_stage",
    "agricultural_management",
    "irrigation",
    "rural_setting"
  ],
  "search_context": "Sugarcane field in vegetative stage, good health, irrigation managed, rural setting, clear weather",
  "metadata": {
    "startDateTime": "2025-06-08T20:44:27.957857",
    "endDateTime": "2025-06-08T20:50:47.604600",
    "duration": 379.65
  },
  "text_description": "The image shows a Sugarcane crop with colors green, brown. It is in the vegetative stage and approximately 6 months old. Overall health is good, with stress indicators such as none_detected. The field is located in a rural area with flat terrain. Irrigation type is irrigation, and it's currently the growing_season."
}

(back to top)

Contributing

Contribution Areas

Component Primary Files Testing Requirements Expertise Needed
Image Analysis main.py, image_utils.py VLM model validation, JSON output parsing Python, AI/ML, Computer Vision
Search Systems CropSemanticSearch.py, speech_input.py Vector similarity testing, audio processing Python, NLP, Speech Processing
Database Integration clickhouse_client.py, schema files Database connectivity, embedding storage Python, ClickHouse, Vector Databases
AI Model Integration ollama_client.py, config.py Model inference testing, prompt validation Python, LLM Integration, API Design
Configuration .env, assets/prompts/ Environment setup, template parsing DevOps, Configuration Management

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

Kindly refer to CONTRIBUTING.md for important Pull Request Process details

  1. In the top-right corner of this page, click Fork.

  2. Clone a copy of your fork on your local, replacing YOUR-USERNAME with your GitHub username.

    git clone https://github.com/YOUR-USERNAME/LLM-Vision-Capabilities.git

  3. Create a branch:

    git checkout -b <my-new-feature-or-fix>

  4. Make necessary changes and commit those changes:

    git add .

    git commit -m "new feature or fix"

  5. Push changes, replacing <add-your-branch-name> with the name of the branch you created earlier at step #3. :

    git push origin <add-your-branch-name>

  6. Submit your changes for review. Go to your repository on GitHub, you'll see a Compare & pull request button. Click on that button. Now submit the pull request.

That's it! Soon I'll be merging your changes into the master branch of this project. You will get a notification email once the changes have been merged. Thank you for your contribution.

Kindly follow Conventional Commits to create an explicit commit history. Kindly prefix the commit message with one of the following type's.

build : Changes that affect the build system or external dependencies (example scopes: gulp, broccoli, npm)
ci : Changes to our CI configuration files and scripts (example scopes: Travis, Circle, BrowserStack, SauceLabs)
docs : Documentation only changes
feat : A new feature
fix : A bug fix
perf : A code change that improves performance
refactor: A code change that neither fixes a bug nor adds a feature
style : Changes that do not affect the meaning of the code (white-space, formatting, missing semicolons, etc.)
test : Adding missing tests or correcting existing tests

(back to top)

Reporting Issues/Suggest Improvements

This Project uses GitHub's integrated issue tracking system to record bugs and feature requests. If you want to raise an issue, please follow the recommendations below:

  • Before you log a bug, please search the issue tracker to see if someone has already reported the problem.
  • If the issue doesn't already exist, create a new issue
  • Please provide as much information as possible with the issue report.
  • If you need to paste code, or include a stack trace use Markdown +++```+++ escapes before and after your text.

(back to top)

License

Distributed under the MIT License. See LICENSE.md for more information.

(back to top)

Contact Channels

  • GitHub Issues: Primary channel for bug reports and feature requests
  • Pull Request Discussions: Technical discussions during code review
  • Email Contact: For code of conduct violations or sensitive issues: Anantha Raju C - @anantharajuc - [email protected]

(back to top)

Star History

Star History Chart

(back to top)

About

LLM-Vision-Capabilities

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages