feat: add Unstract MCP integration for document processing by marcusquinn · Pull Request #185 · marcusquinn/aidevops

marcusquinn · 2026-01-24T23:00:20Z

Summary

Adds Unstract as an on-demand MCP server for LLM-powered structured data extraction from unstructured documents (PDF, images, DOCX, etc.)
MCP is disabled globally in OpenCode and loads only when document processing tasks are detected (on-demand pattern)
Docker-based execution via unstract/mcp-server container

Changes

File	Purpose
`.agent/services/document-processing/unstract.md`	New subagent with full setup docs, tool reference, and use cases
`configs/mcp-templates/unstract.json`	OpenCode MCP config template (disabled by default)
`.agent/aidevops/mcp-integrations.md`	Added Unstract to integrations list and quick reference
`.agent/subagent-index.toon`	Added document-processing service entry

Design Decisions

On-demand only: Follows the established pattern (like FluentCRM) where MCP is enabled: false globally and activated per-agent via subagent frontmatter
Docker-based: Uses official unstract/mcp-server Docker image for isolation and easy updates
Credential pattern: UNSTRACT_API_KEY + UNSTRACT_API_BASE_URL stored in ~/.config/aidevops/mcp-env.sh
New service category: Created services/document-processing/ for document extraction tools

Testing

Preflight linters pass (pre-existing ShellCheck/secretlint issues unrelated)
TOON syntax validation passes
Config follows established JSON template pattern

Summary by CodeRabbit

New Features
- Added Unstract MCP integration for document processing with support for invoices, statements, claims, KYC, and contract analysis.
Documentation
- Added comprehensive setup guides, configuration templates, and deployment instructions for the new document processing integration.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Add support for Unstract (https://github.com/Zipstack/unstract) as an on-demand MCP server for LLM-powered structured data extraction from unstructured documents. Configured for OpenCode with disabled-by-default pattern, loading only when document processing tasks are detected. - New subagent: services/document-processing/unstract.md - New MCP template: configs/mcp-templates/unstract.json (Docker-based) - Updated mcp-integrations.md with setup instructions - Updated subagent-index.toon with new entry

gemini-code-assist · 2026-01-24T23:00:34Z

Summary of Changes

Hello @marcusquinn, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the system's capabilities by integrating Unstract, a powerful LLM-driven platform for extracting structured data from diverse document types. This integration allows AI agents to perform complex document analysis tasks, such as invoice processing or KYC, by dynamically enabling the Unstract service only when needed, ensuring efficient resource utilization and expanding the range of automated workflows.

Highlights

Unstract MCP Integration: Introduces Unstract as a new MCP (Managed Compute Provider) for LLM-powered structured data extraction from various unstructured documents (PDFs, images, DOCX, etc.).
On-Demand Loading Pattern: The Unstract MCP is configured to be disabled globally and loads only when document processing tasks are detected, following an on-demand pattern similar to existing integrations like FluentCRM.
Docker-Based Execution: Utilizes the official unstract/mcp-server Docker container for isolated and easily updateable execution of the Unstract service.
New Service Category: A new services/document-processing/ category has been created to house tools related to document extraction, with Unstract being the first entry.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-01-24T23:00:41Z

Warning

Rate limit exceeded

@marcusquinn has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 3 minutes and 8 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

This pull request introduces the Unstract MCP (Model Context Protocol) integration to the aidevops framework, adding comprehensive documentation, configuration templates, and subagent registry entries to enable LLM-powered document processing capabilities across the platform.

Changes

Cohort / File(s)	Summary
Unstract MCP Integration Documentation `.agent/aidevops/mcp-integrations.md`, `.agent/services/document-processing/unstract.md`	Introduces Unstract MCP as a new document processing integration with setup instructions, API key/base URL environment variables, deployment configurations (Docker), and usage examples for OpenCode and Claude Desktop environments. Includes supported file types, workflow details, and integration use cases.
Subagent Registry Update `.agent/subagent-index.toon`	Updates the TOON subagents index count from [31] to [32] and registers the new document-processing subagent with key file reference to unstract.
MCP Configuration Template `configs/mcp-templates/unstract.json`	Adds a local MCP configuration template for running the Unstract service via Docker, disabled by default, with environment variable sourcing and volume mount specifications.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

#140: Adds MCP on-demand/indexing tooling, docs, and generator hooks that complement the MCP integration surface expanded by this PR.
#64: Adds a different MCP integration (FluentCRM) with parallel updates to MCP integrations documentation and mcp-templates configuration area.

Poem

📄✨ Unstract arrives, with documents to parse,
LLM-powered extraction, intelligent and fast,
Templates configured, subagents aligned,
Processing flows that beautifully intertwine! 🚀

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely describes the main change: adding Unstract MCP integration for document processing, which directly aligns with all file additions and modifications.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request successfully adds a new integration for Unstract MCP, enabling LLM-powered document processing. The changes include new documentation, configuration templates, and updates to the subagent index, all following the established on-demand loading pattern. My review focuses on improving the clarity and consistency of the documentation to ensure a smooth setup experience for users. I've identified a couple of minor inconsistencies in environment variable naming and placeholder values within the configuration examples that should be addressed.

gemini-code-assist · 2026-01-24T23:01:42Z

.agent/services/document-processing/unstract.md

+- **Purpose**: Extract structured data from unstructured documents (PDFs, images, DOCX, etc.)
+- **MCP Server**: `unstract/mcp-server` (Docker) or `@unstract/mcp-server` (npx)
+- **Tool**: `unstract_tool` - submits files to Unstract API, polls for completion, returns structured JSON
+- **Credentials**: `UNSTRACT_API_KEY` + `API_BASE_URL` in `~/.config/aidevops/mcp-env.sh`


There's an inconsistency in the environment variable name for the Unstract API base URL. This line mentions API_BASE_URL, but the configuration example in lines 78-79 and the PR description specify UNSTRACT_API_BASE_URL. To avoid confusion for users, it's best to consistently use UNSTRACT_API_BASE_URL in the user-facing documentation. The OpenCode configuration correctly handles mapping this to API_BASE_URL for the container, but the documentation should be consistent.

Suggested change

- **Credentials**: `UNSTRACT_API_KEY` + `API_BASE_URL` in `~/.config/aidevops/mcp-env.sh`

- **Credentials**: UNSTRACT_API_KEY + UNSTRACT_API_BASE_URL in ~/.config/aidevops/mcp-env.sh

gemini-code-assist · 2026-01-24T23:01:42Z

.agent/services/document-processing/unstract.md

+        "unstract"
+      ],
+      "env": {
+        "UNSTRACT_API_KEY": "",


The placeholder for UNSTRACT_API_KEY is an empty string, which could be confusing as it's not immediately obvious that it requires a value. Using a more descriptive placeholder like your_api_key_here would make it clearer for the user that they need to replace it with their actual key. This aligns with the placeholder style used elsewhere in the documentation (e.g., line 78).

Suggested change

"UNSTRACT_API_KEY": "",

"UNSTRACT_API_KEY": "your_api_key_here",

github-actions · 2026-01-24T23:02:12Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 447 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Sat Jan 24 23:00:47 UTC 2026: Code review monitoring started
Sat Jan 24 23:00:47 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 447
Sat Jan 24 23:00:47 UTC 2026: Qlty - 0 issues found, auto-formatting applied
Sat Jan 24 23:00:49 UTC 2026: Codacy analysis completed with auto-fixes

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 447
VULNERABILITIES: 0

Generated on: Sat Jan 24 23:02:11 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

- Document both cloud and self-hosted (local Docker Compose) deployment options - Pass DISABLE_TELEMETRY=true in MCP Docker command - Add analytics/telemetry section confirming MCP server has no telemetry - Note REACT_APP_ENABLE_POSTHOG=false for self-hosted frontend

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In @.agent/aidevops/mcp-integrations.md:
- Around line 212-261: Update .agent/aidevops/mcp-integrations.md to explicitly
document the host-to-container env var remapping and list the UNSTRACT env vars:
add a short note under the OpenCode example stating that UNSTRACT_API_BASE_URL
on the host is passed into the container as API_BASE_URL (reference the OpenCode
JSON block and the API_BASE_URL/UNSTRACT_API_BASE_URL symbols), and add the two
exports (UNSTRACT_API_KEY and UNSTRACT_API_BASE_URL) to the Environment
Variables section (the section that lists other MCP env exports and references
services/document-processing/unstract.md) so the variables are consistent with
the Claude Desktop mcpServers example and the unstract_tool usage.

In @.agent/services/document-processing/unstract.md:
- Around line 23-27: Update the Setup section to use the official environment
variable name API_BASE_URL (instead of UNSTRACT_API_BASE_URL) so it matches the
Quick Reference and Unstract MCP spec; adjust any example export/usage lines
that reference UNSTRACT_API_BASE_URL to API_BASE_URL and ensure UNSTRACT_API_KEY
remains unchanged, and verify references to unstract_tool and the MCP Server
description still reference the corrected variable name.

🧹 Nitpick comments (2)

.agent/services/document-processing/unstract.md (1)
74-80: Add a file-permissions hardening step for API keys.
Given the plaintext API key storage, a simple chmod 600 keeps credentials from leaking via permissive perms.
🔐 Proposed doc addition
 export UNSTRACT_API_KEY="your_api_key_here"
 export UNSTRACT_API_BASE_URL="https://us-central.unstract.com/deployment/api/your-deployment-id/"
+chmod 600 ~/.config/aidevops/mcp-env.sh
configs/mcp-templates/unstract.json (1)
4-8: Pin the Docker image tag for supply-chain reproducibility.
Relying on the implicit latest tag introduces unpredictability into automation. Unstract does not publish semver versions in public docs, so pin either by digest (OCI best practice) or parameterize with an environment override.
🔧 Proposed refactor (environment-parameterized fallback)
-      "source ~/.config/aidevops/mcp-env.sh && docker run -i --rm -v /tmp:/tmp -e UNSTRACT_API_KEY -e API_BASE_URL=\"$UNSTRACT_API_BASE_URL\" unstract/mcp-server unstract"
+      "source ~/.config/aidevops/mcp-env.sh && docker run -i --rm -v /tmp:/tmp -e UNSTRACT_API_KEY -e API_BASE_URL=\"$UNSTRACT_API_BASE_URL\" unstract/mcp-server:${UNSTRACT_MCP_IMAGE_TAG:-latest} unstract"
Alternatively, pull a known-good digest locally and reference it by SHA256 for guaranteed reproducibility across deployments.

.agent/aidevops/mcp-integrations.md

.agent/services/document-processing/unstract.md

github-actions · 2026-01-24T23:08:33Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 447 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Sat Jan 24 23:07:10 UTC 2026: Code review monitoring started
Sat Jan 24 23:07:10 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 447
Sat Jan 24 23:07:10 UTC 2026: Qlty - 0 issues found, auto-formatting applied
Sat Jan 24 23:07:12 UTC 2026: Codacy analysis completed with auto-fixes

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 447
VULNERABILITIES: 0

Generated on: Sat Jan 24 23:08:32 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

- New helper script: install/start/stop/status/logs/uninstall/configure-llm - Installs to ~/.aidevops/unstract/ with analytics disabled by default - configure-llm command detects existing API keys and shows how to add as adapters - Supports Ollama for fully local/offline operation (no cloud keys needed) - Added to setup-mcp-integrations.sh as 'unstract' target - Updated MCP template to default to local self-hosted endpoint - Updated subagent docs with adapter key table and management commands

github-actions · 2026-01-24T23:13:26Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 447 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Sat Jan 24 23:12:03 UTC 2026: Code review monitoring started
Sat Jan 24 23:12:03 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 447
Sat Jan 24 23:12:03 UTC 2026: Qlty - 0 issues found, auto-formatting applied
Sat Jan 24 23:12:05 UTC 2026: Codacy analysis completed with auto-fixes
Sat Jan 24 23:12:06 UTC 2026: Applied 1 automatic fixes

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 447
VULNERABILITIES: 0

Generated on: Sat Jan 24 23:13:25 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

- Use API_BASE_URL (not UNSTRACT_API_BASE_URL) to match Unstract MCP spec - Add chmod 600 for mcp-env.sh in setup examples - Add UNSTRACT_IMAGE_TAG env var for Docker image pinning - Add DISABLE_TELEMETRY=true to Claude Desktop example - Update Environment Variables section with Unstract entries - Fix helper script to use correct env var names

sonarqubecloud · 2026-01-25T00:22:55Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

github-actions · 2026-01-25T00:23:52Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 447 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Sun Jan 25 00:22:28 UTC 2026: Code review monitoring started
Sun Jan 25 00:22:28 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 447
Sun Jan 25 00:22:28 UTC 2026: Qlty - 0 issues found, auto-formatting applied
Sun Jan 25 00:22:30 UTC 2026: Codacy analysis completed with auto-fixes
Sun Jan 25 00:22:31 UTC 2026: Applied 1 automatic fixes

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 447
VULNERABILITIES: 0

Generated on: Sun Jan 25 00:23:51 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

marcusquinn · 2026-01-25T00:24:06Z

@coderabbitai review

coderabbitai · 2026-01-25T00:24:14Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

marcusquinn · 2026-01-25T00:29:21Z

@coderabbitai resolve

coderabbitai · 2026-01-25T00:29:35Z

✅ Actions performed

Comments resolved and changes approved.

gemini-code-assist bot reviewed Jan 24, 2026

View reviewed changes

coderabbitai bot requested changes Jan 24, 2026

View reviewed changes

.agent/aidevops/mcp-integrations.md Show resolved Hide resolved

.agent/services/document-processing/unstract.md Show resolved Hide resolved

marcusquinn added 2 commits January 25, 2026 00:20

fix: resolve merge conflict with main (serper removal)

2ee6967

coderabbitai bot approved these changes Jan 25, 2026

View reviewed changes

marcusquinn merged commit b7adf90 into main Jan 25, 2026
9 checks passed

marcusquinn mentioned this pull request Jan 25, 2026

docs: update README with recent PR features #193

Merged

3 tasks

	- Credentials: `UNSTRACT_API_KEY` + `API_BASE_URL` in `~/.config/aidevops/mcp-env.sh`
	- Credentials: UNSTRACT_API_KEY + UNSTRACT_API_BASE_URL in ~/.config/aidevops/mcp-env.sh

	"UNSTRACT_API_KEY": "",
	"UNSTRACT_API_KEY": "your_api_key_here",

Conversation

marcusquinn commented Jan 24, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Design Decisions

Testing

Summary by CodeRabbit

Uh oh!

gemini-code-assist bot commented Jan 24, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Other AI code review bot(s) detected

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 24, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jan 24, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

github-actions bot commented Jan 24, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

sonarqubecloud bot commented Jan 25, 2026

Quality Gate passed

Uh oh!

github-actions bot commented Jan 25, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

marcusquinn commented Jan 25, 2026

Uh oh!

coderabbitai bot commented Jan 25, 2026

Uh oh!

marcusquinn commented Jan 25, 2026

Uh oh!

coderabbitai bot commented Jan 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

marcusquinn commented Jan 24, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 24, 2026 •

edited

Loading