Documorpher is an intelligent document transformation tool that leverages AI to extract, transform, and structure data across different document formats and modalities. Whether you need to convert DOCX to Excel while preserving context or extract key information from PDFs, Documorpher ensures accurate and structured transformations.
- 📄 Support for DOCX, and Excel formats (other in progress)
- 🎯 AI-powered data transformation ensuring contextual accuracy
- 💡 Custom schema definition with multiple data types
- 🔄 Batch processing of multiple documents
- 🎨 Interactive UI for schema building
- 📊 Real-time preview of extracted data
- 🔍 Source text highlighting for extracted values
- ⬇️ Export results in structured JSON, CSV, or Excel format
- 🔒 Bring your own OpenAI API key for AI-driven extraction
Need to transform a set of DOCX documents into an Excel spreadsheet? With Documorpher, you can easily define a mapping of DOCX files to structured Excel columns while preserving contextual meaning. This makes it ideal for:
- Extracting invoices, reports, and structured documents into Excel
- Transforming product descriptions into a structured sections
- Ensuring context-aware data extraction from unstructured documents
- Node.js 18+
- npm or yarn
- OpenAI API key
- Clone the repository:
git clone [email protected]:Bukareszt/Documorpher.git
cd DocuExtract-AI
- Install dependencies:
npm install
- Start the development server:
npm run dev
- Open http://localhost:3000 in your browser
-
Define Your Schema
- Create fields with various data types (string, number, boolean, date, enum, array)
- Set fields as required or optional
- Define nested object structures for arrays
- Preview the expected data structure in real-time
-
Upload Documents
- Drag and drop or select files
- Support for multiple documents
- Instant file type validation
-
Process Documents
- Enter your OpenAI API key
- Process all documents against your defined schema
- View extraction progress in real-time
-
Review and Export
- Interactive view of extracted data
- Side-by-side comparison with source text
- Highlight matching text on hover
- Download individual or all results as JSON
string
: Text valuesnumber
: Numeric valuesboolean
: True/false valuesdate
: Date valuesenum
: Predefined set of valuesarray
: Lists of values or objects- Simple arrays (strings, numbers, booleans)
- Complex arrays of objects with custom schemas
- No server-side data storage
- Client-side document processing
- Your OpenAI API key is never stored
- Documents are processed locally
- Secure data handling with type validation
We welcome contributions! Here's how you can help:
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Please make sure to update tests as appropriate and follow the existing code style.
Found a bug? Please open an issue with:
- Clear bug description
- Steps to reproduce
- Expected vs actual behavior
- Screenshots if applicable
- Your environment details
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with Next.js
- AI powered by OpenAI
- UI styled with Tailwind CSS
- Document parsing:
- PDF: pdf-parse
- DOCX: officeparser
- Create an issue for bug reports
- Start a discussion for feature requests
- Star the project if you find it useful!