feat: add Unstract MCP integration for document processing#185
feat: add Unstract MCP integration for document processing#185marcusquinn merged 5 commits intomainfrom
Conversation
Add support for Unstract (https://github.com/Zipstack/unstract) as an on-demand MCP server for LLM-powered structured data extraction from unstructured documents. Configured for OpenCode with disabled-by-default pattern, loading only when document processing tasks are detected. - New subagent: services/document-processing/unstract.md - New MCP template: configs/mcp-templates/unstract.json (Docker-based) - Updated mcp-integrations.md with setup instructions - Updated subagent-index.toon with new entry
Summary of ChangesHello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the system's capabilities by integrating Unstract, a powerful LLM-driven platform for extracting structured data from diverse document types. This integration allows AI agents to perform complex document analysis tasks, such as invoice processing or KYC, by dynamically enabling the Unstract service only when needed, ensuring efficient resource utilization and expanding the range of automated workflows. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughThis pull request introduces the Unstract MCP (Model Context Protocol) integration to the aidevops framework, adding comprehensive documentation, configuration templates, and subagent registry entries to enable LLM-powered document processing capabilities across the platform. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request successfully adds a new integration for Unstract MCP, enabling LLM-powered document processing. The changes include new documentation, configuration templates, and updates to the subagent index, all following the established on-demand loading pattern. My review focuses on improving the clarity and consistency of the documentation to ensure a smooth setup experience for users. I've identified a couple of minor inconsistencies in environment variable naming and placeholder values within the configuration examples that should be addressed.
| - **Purpose**: Extract structured data from unstructured documents (PDFs, images, DOCX, etc.) | ||
| - **MCP Server**: `unstract/mcp-server` (Docker) or `@unstract/mcp-server` (npx) | ||
| - **Tool**: `unstract_tool` - submits files to Unstract API, polls for completion, returns structured JSON | ||
| - **Credentials**: `UNSTRACT_API_KEY` + `API_BASE_URL` in `~/.config/aidevops/mcp-env.sh` |
There was a problem hiding this comment.
There's an inconsistency in the environment variable name for the Unstract API base URL. This line mentions API_BASE_URL, but the configuration example in lines 78-79 and the PR description specify UNSTRACT_API_BASE_URL. To avoid confusion for users, it's best to consistently use UNSTRACT_API_BASE_URL in the user-facing documentation. The OpenCode configuration correctly handles mapping this to API_BASE_URL for the container, but the documentation should be consistent.
| - **Credentials**: `UNSTRACT_API_KEY` + `API_BASE_URL` in `~/.config/aidevops/mcp-env.sh` | |
| - **Credentials**: UNSTRACT_API_KEY + UNSTRACT_API_BASE_URL in ~/.config/aidevops/mcp-env.sh |
| "unstract" | ||
| ], | ||
| "env": { | ||
| "UNSTRACT_API_KEY": "", |
There was a problem hiding this comment.
The placeholder for UNSTRACT_API_KEY is an empty string, which could be confusing as it's not immediately obvious that it requires a value. Using a more descriptive placeholder like your_api_key_here would make it clearer for the user that they need to replace it with their actual key. This aligns with the placeholder style used elsewhere in the documentation (e.g., line 78).
| "UNSTRACT_API_KEY": "", | |
| "UNSTRACT_API_KEY": "your_api_key_here", |
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Sat Jan 24 23:02:11 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
- Document both cloud and self-hosted (local Docker Compose) deployment options - Pass DISABLE_TELEMETRY=true in MCP Docker command - Add analytics/telemetry section confirming MCP server has no telemetry - Note REACT_APP_ENABLE_POSTHOG=false for self-hosted frontend
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @.agent/aidevops/mcp-integrations.md:
- Around line 212-261: Update .agent/aidevops/mcp-integrations.md to explicitly
document the host-to-container env var remapping and list the UNSTRACT env vars:
add a short note under the OpenCode example stating that UNSTRACT_API_BASE_URL
on the host is passed into the container as API_BASE_URL (reference the OpenCode
JSON block and the API_BASE_URL/UNSTRACT_API_BASE_URL symbols), and add the two
exports (UNSTRACT_API_KEY and UNSTRACT_API_BASE_URL) to the Environment
Variables section (the section that lists other MCP env exports and references
services/document-processing/unstract.md) so the variables are consistent with
the Claude Desktop mcpServers example and the unstract_tool usage.
In @.agent/services/document-processing/unstract.md:
- Around line 23-27: Update the Setup section to use the official environment
variable name API_BASE_URL (instead of UNSTRACT_API_BASE_URL) so it matches the
Quick Reference and Unstract MCP spec; adjust any example export/usage lines
that reference UNSTRACT_API_BASE_URL to API_BASE_URL and ensure UNSTRACT_API_KEY
remains unchanged, and verify references to unstract_tool and the MCP Server
description still reference the corrected variable name.
🧹 Nitpick comments (2)
.agent/services/document-processing/unstract.md (1)
74-80: Add a file-permissions hardening step for API keys.
Given the plaintext API key storage, a simplechmod 600keeps credentials from leaking via permissive perms.🔐 Proposed doc addition
export UNSTRACT_API_KEY="your_api_key_here" export UNSTRACT_API_BASE_URL="https://us-central.unstract.com/deployment/api/your-deployment-id/" +chmod 600 ~/.config/aidevops/mcp-env.shconfigs/mcp-templates/unstract.json (1)
4-8: Pin the Docker image tag for supply-chain reproducibility.
Relying on the implicitlatesttag introduces unpredictability into automation. Unstract does not publish semver versions in public docs, so pin either by digest (OCI best practice) or parameterize with an environment override.🔧 Proposed refactor (environment-parameterized fallback)
- "source ~/.config/aidevops/mcp-env.sh && docker run -i --rm -v /tmp:/tmp -e UNSTRACT_API_KEY -e API_BASE_URL=\"$UNSTRACT_API_BASE_URL\" unstract/mcp-server unstract" + "source ~/.config/aidevops/mcp-env.sh && docker run -i --rm -v /tmp:/tmp -e UNSTRACT_API_KEY -e API_BASE_URL=\"$UNSTRACT_API_BASE_URL\" unstract/mcp-server:${UNSTRACT_MCP_IMAGE_TAG:-latest} unstract"Alternatively, pull a known-good digest locally and reference it by SHA256 for guaranteed reproducibility across deployments.
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Sat Jan 24 23:08:32 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
- New helper script: install/start/stop/status/logs/uninstall/configure-llm - Installs to ~/.aidevops/unstract/ with analytics disabled by default - configure-llm command detects existing API keys and shows how to add as adapters - Supports Ollama for fully local/offline operation (no cloud keys needed) - Added to setup-mcp-integrations.sh as 'unstract' target - Updated MCP template to default to local self-hosted endpoint - Updated subagent docs with adapter key table and management commands
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Sat Jan 24 23:13:25 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
- Use API_BASE_URL (not UNSTRACT_API_BASE_URL) to match Unstract MCP spec - Add chmod 600 for mcp-env.sh in setup examples - Add UNSTRACT_IMAGE_TAG env var for Docker image pinning - Add DISABLE_TELEMETRY=true to Claude Desktop example - Update Environment Variables section with Unstract entries - Fix helper script to use correct env var names
|
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Sun Jan 25 00:23:51 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai resolve |
✅ Actions performedComments resolved and changes approved. |



Summary
unstract/mcp-servercontainerChanges
.agent/services/document-processing/unstract.mdconfigs/mcp-templates/unstract.json.agent/aidevops/mcp-integrations.md.agent/subagent-index.toonDesign Decisions
enabled: falseglobally and activated per-agent via subagent frontmatterunstract/mcp-serverDocker image for isolation and easy updatesUNSTRACT_API_KEY+UNSTRACT_API_BASE_URLstored in~/.config/aidevops/mcp-env.shservices/document-processing/for document extraction toolsTesting
Summary by CodeRabbit
New Features
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.