diff --git a/.gitignore b/.gitignore index 8d10c05a7a..bda69c34f5 100644 --- a/.gitignore +++ b/.gitignore @@ -33,3 +33,6 @@ coverage/ *.local .vercel + + + diff --git a/README.md b/README.md index b222753ab0..49622042e0 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,9 @@ > Privacy-focused, zero-server knowledge graph generator that runs entirely in your browser. -Transform codebases into interactive knowledge graphs using AST parsing, Web Workers, and an embedded KuzuDB WASM database. All processing happens locally - your code never leaves your machine. Next step -> settng up AI Layer : An embedings pipeline using a very small embedings model that can run in browser and a Graph RAG tool using LLMs to generate and execute cyfer queries. Aiming to give rich and complete retrieved context enabling Agent to detect unused code, perform security audits, do a BLAST RADIUS analyses of code changes and for overall codebase understanding and explaination. +Transform codebases into interactive knowledge graphs using AST parsing, Web Workers, and an embedded KuzuDB WASM database. All processing happens locally - your code never leaves your machine. + +**Next up:** Browser-based embeddings + Graph RAG. The cool part? KuzuDB supports native vector indexing, so I can do semantic search AND graph traversal in a single Cypher query. No separate vector DB needed. See [Work in Progress](#-current-work-in-progress) for the full plan. @@ -19,11 +21,116 @@ https://github.com/user-attachments/assets/f375b00a-78cd-4f93-a96c-9ba924455f49 **Actively Building:** - [ ] **Graph RAG Agent** - AI chat with Cypher query generation for intelligent code exploration -- [ ] **Browser Embeddings** - Small embedding model (e.g., gte-small) for semantic node search + LLM-driven RAG +- [ ] **Browser Embeddings** - Small embedding model for semantic node search (see below!) - [ ] **Multi-Worker Pool** - Parallel parsing across multiple Web Workers (currently using single worker) - [ ] **Ollama Support** - Local LLM integration - [ ] **CSV Export** - Export node/relationship tables +### ๐Ÿง  Graph RAG: The Plan + +Here's what I'm building for the AI layer. The goal: ask questions in plain English, get answers backed by actual graph traversal + semantic understanding. + +**The Problem:** A regular LLM doesn't know your codebase. It can't tell you what calls `handleAuth` or what breaks if you change `UserService`. I need to give it tools to explore the graph. + +**The Solution:** Combine embeddings (for "find relevant code by meaning") with graph queries (for "trace connections"). + +```mermaid +flowchart TD + Q[Your Question] --> EMB[Embed with transformers.js] + EMB --> VS[Vector Search in KuzuDB] + VS --> ENTRY[Entry Point Nodes] + ENTRY --> EXPAND[Graph Traversal via Cypher] + EXPAND --> CTX[Rich Context] + CTX --> LLM[LLM Generates Answer] +``` + +**Embedding Model:** I'm going with `snowflake-arctic-embed-xs` - a tiny 22M parameter model that runs entirely in the browser via [transformers.js](https://huggingface.co/docs/transformers.js). It outputs 384-dimensional vectors and scores 50.15 on MTEB (comparable to models 5x its size). The model downloads once (~90MB), gets cached, and runs locally forever. Privacy intact. โœ… + +**The Pipeline:** + +```mermaid +flowchart LR + subgraph Main["Main Pipeline (Blocking)"] + P1[Extract] --> P2[Structure] --> P3[Parse] --> P4[Imports] --> P5[Calls] + end + + P5 --> READY[Graph Ready!
User can explore] + READY --> BG + + subgraph BG["Background (Non-blocking)"] + E1[Load Model] --> E2[Embed Nodes] --> E3[Create Vector Index] + end + + E3 --> AI[AI Search Ready!] +``` + +The idea: you can start exploring the graph immediately after Phase 5. Meanwhile, embeddings are generated in the background. Once done, semantic search unlocks. + +### ๐Ÿ’ก A Fun Discovery: Unified Vector + Graph = Superpowers + +While designing this, I stumbled onto something cool. Most Graph RAG systems use **separate databases** - a vector DB (Pinecone, Qdrant) for semantic search and a graph DB (Neo4j) for traversal. This means the LLM has to: + +1. Call vector search โ†’ get IDs +2. Take those IDs โ†’ call graph DB +3. Coordinate between two systems + +But KuzuDB WASM supports **native vector indexing** (HNSW). Which means it's possible to do vector search AND graph traversal **in a single Cypher query**: + +```cypher +-- Find code similar to "authentication" AND trace what calls it +-- ALL IN ONE QUERY! ๐Ÿคฏ +CALL QUERY_VECTOR_INDEX('CodeNode', 'embedding_idx', $queryVector, 10) +WITH node AS match, distance +WHERE distance < 0.4 +MATCH (caller:CodeNode)-[r:CodeRelation {type: 'CALLS'}]->(match) +RETURN match.name AS found, + caller.name AS called_by, + distance AS relevance +ORDER BY distance +``` + +This is kind of a big deal. Here's why: + +**Traditional approach (2 queries, 2 systems):** +``` +semantic_search("auth") โ†’ ["id1", "id2", "id3"] + โ†“ +graph_query("MATCH ... WHERE id IN [...]") โ†’ results +``` + +**Unified KuzuDB approach (1 query, 1 system):** +``` +cypher("CALL QUERY_VECTOR_INDEX(...) WITH node MATCH (node)-[...]->() ...") โ†’ results +``` + +And because `distance` comes back with every result, this provides **built-in reranking for free**: + +```cypher +-- The LLM can dynamically control relevance thresholds! +CALL QUERY_VECTOR_INDEX('CodeNode', 'idx', $vec, 20) +WITH node, distance, + CASE + WHEN distance < 0.15 THEN 'exact_match' + WHEN distance < 0.30 THEN 'highly_relevant' + ELSE 'related' + END AS tier +WHERE distance < 0.5 +MATCH (node)-[*1..2]-(context) +RETURN node.name, tier, collect(context.name) AS related +ORDER BY distance +``` + +**What this enables:** +- ๐ŸŽฏ **Single query execution** - No round trips between systems +- ๐Ÿ“Š **Hierarchical relevance** - LLM sees exact matches vs related vs weak +- ๐ŸŒณ **Weighted expansion** - Traverse further from better matches +- โšก **Dynamic thresholds** - LLM adjusts `WHERE distance < X` per question type +- ๐Ÿ”„ **No reranker needed** - Distance IS the relevance score + +Basically, the LLM gets to write one smart query that does semantic search, filters by relevance, expands via graph relationships, and returns ranked results. No separate reranker model, no vector DB API calls, no coordination logic. Just Cypher. + +Still wrapping my head around all the query patterns this unlocks, but I'm pretty excited about it. + --- ## โšก What's New in V2 @@ -34,13 +141,13 @@ V2 is a major refactor focused on **performance** and **scalability**. Here's wh V1 used D3.js force simulation which worked great for small graphs, but started choking around 2-3k nodes. The browser would freeze, fans would spin, and you'd be staring at a loading spinner. -**V2 uses Sigma.js with WebGL rendering.** This means the GPU does the heavy lifting instead of JavaScript. We've tested graphs with 10k+ nodes and they render smoothly. Pan, zoom, click - all buttery smooth. +**V2 uses Sigma.js with WebGL rendering.** This means the GPU does the heavy lifting instead of JavaScript. I've tested graphs with 10k+ nodes and they render smoothly. Pan, zoom, click - all buttery smooth. The layout algorithm also moved to **ForceAtlas2 running in a Web Worker**, so your UI stays responsive while the graph positions itself. ### ๐Ÿ—‚๏ธ Dual HashMap Symbol Table (Goodbye Trie, Hello Speed) -In V1, we used a **Trie** (prefix tree) to store function/class definitions. It was clever - you could do fuzzy lookups and autocomplete. But it was also slow and memory-hungry for large codebases. +In V1, I used a **Trie** (prefix tree) to store function/class definitions. It was clever - you could do fuzzy lookups and autocomplete. But it was also slow and memory-hungry for large codebases. V2 uses a simpler but faster **Dual HashMap** approach: @@ -49,17 +156,17 @@ File-Scoped Index: Map> Global Index: Map ``` -**Why two maps?** When resolving a function call like `handleAuth()`, we first check if it's defined in a file we imported (high confidence). If not, we check the current file. As a last resort, we search globally (useful for framework magic like FastAPI's `@app.get` decorators where the connection isn't explicit in imports). +**Why two maps?** When resolving a function call like `handleAuth()`, the system first checks if it's defined in a file that was imported (high confidence). If not, it checks the current file. As a last resort, it searches globally (useful for framework magic like FastAPI's `@app.get` decorators where the connection isn't explicit in imports). -This change alone gave us **~2x speedup** on the parsing phase. +This change alone provided a **~2x speedup** on the parsing phase. ### ๐Ÿ’พ LRU Cache for AST Trees (Memory That Cleans Itself) -Tree-sitter generates AST (Abstract Syntax Tree) objects that live in WASM memory. In V1, we'd keep all of them around, which meant memory usage grew linearly with file count. Parse 5000 files? That's 5000 AST objects eating RAM. +Tree-sitter generates AST (Abstract Syntax Tree) objects that live in WASM memory. In V1, I kept all of them around, which meant memory usage grew linearly with file count. Parse 5000 files? That's 5000 AST objects eating RAM. -V2 uses an **LRU (Least Recently Used) cache** with a cap of 50 entries. When we need to parse file #51, the oldest unused AST gets evicted and we call `tree.delete()` to free the WASM memory. +V2 uses an **LRU (Least Recently Used) cache** with a cap of 50 entries. When the system needs to parse file #51, the oldest unused AST gets evicted and `tree.delete()` is called to free the WASM memory. -The clever part: we parse files in Phase 3, then reuse those ASTs in Phase 4 (imports) and Phase 5 (calls). The LRU cache keeps recently-parsed files hot, so we rarely need to re-parse. +The clever part: files are parsed in Phase 3, then those ASTs are reused in Phase 4 (imports) and Phase 5 (calls). The LRU cache keeps recently-parsed files hot, so re-parsing is rarely needed. ### ๐Ÿ“Š Overall Results @@ -211,21 +318,21 @@ flowchart TD ### What Each Phase Does -**Phase 1: Extract** - We use JSZip to decompress your ZIP file and store all file contents in a Map. Simple but necessary. +**Phase 1: Extract** - JSZip is used to decompress your ZIP file and store all file contents in a Map. Simple but necessary. -**Phase 2: Structure** - We walk through all file paths and build a tree of folders and files. A path like `src/components/Button.tsx` creates nodes for `src`, `components`, and `Button.tsx` with `CONTAINS` relationships connecting them. +**Phase 2: Structure** - The system walks through all file paths and builds a tree of folders and files. A path like `src/components/Button.tsx` creates nodes for `src`, `components`, and `Button.tsx` with `CONTAINS` relationships connecting them. -**Phase 3: Parsing** - This is where the magic happens. Tree-sitter parses each file into an AST, and we extract all the interesting bits: functions, classes, interfaces, methods. These get stored in our Symbol Table for later lookup. +**Phase 3: Parsing** - This is where the magic happens. Tree-sitter parses each file into an AST, and extracts all the interesting bits: functions, classes, interfaces, methods. These get stored in the Symbol Table for later lookup. -**Phase 4: Imports** - We find all `import` and `require` statements and figure out which files they point to. `import { foo } from './utils'` might resolve to `./utils.ts`, `./utils/index.ts`, etc. We try common extensions until we find a match. +**Phase 4: Imports** - The pipeline finds all `import` and `require` statements and determines which files they point to. `import { foo } from './utils'` might resolve to `./utils.ts`, `./utils/index.ts`, etc. Common extensions are tried until a match is found. -**Phase 5: Calls** - The trickiest phase. We find all function calls and try to figure out what they're calling. We use our resolution strategy (import map โ†’ local โ†’ global) to link calls to their definitions. +**Phase 5: Calls** - The trickiest phase. The pipeline finds all function calls and determines what they're calling. It uses a resolution strategy (import map โ†’ local โ†’ global) to link calls to their definitions. --- ## Symbol Resolution: How We Link Function Calls -When we see code like this: +When the system encounters code like this: ```typescript import { validateUser } from './auth'; @@ -235,7 +342,7 @@ function login() { } ``` -We need to figure out that `validateUser()` refers to the function defined in `./auth.ts`. Here's our strategy: +The system needs to figure out that `validateUser()` refers to the function defined in `./auth.ts`. Here's the strategy: ```mermaid flowchart TD @@ -266,13 +373,13 @@ def get_users(): return db.query(User) # Where does 'db' come from? ``` -The `db` object might be injected by the framework, not explicitly imported. Our global search catches these cases (with lower confidence). +The `db` object might be injected by the framework, not explicitly imported. The global search catches these cases (with lower confidence). --- ## LRU AST Cache -Parsing files into ASTs is expensive, and AST objects live in WASM memory (which doesn't get garbage collected like regular JS objects). We use an LRU cache to keep memory bounded: +Parsing files into ASTs is expensive, and AST objects live in WASM memory (which doesn't get garbage collected like regular JS objects). An LRU cache is used to keep memory bounded: ```mermaid flowchart LR @@ -334,7 +441,7 @@ flowchart LR ## KuzuDB Integration -We load the graph into KuzuDB (an embedded graph database) so you can run Cypher queries: +The graph is loaded into KuzuDB (an embedded graph database) so you can run Cypher queries: ```mermaid flowchart TD @@ -363,6 +470,7 @@ RETURN f.name - โœ… Polymorphic schema (single node/edge tables) - โœ… CSV generation and bulk loading - โœ… Cypher query execution +- ๐Ÿšง Vector embeddings + HNSW index (WIP) - ๐Ÿšง Graph RAG agent (WIP) --- @@ -372,9 +480,10 @@ RETURN f.name - **Frontend**: React 18 + TypeScript + Vite + Tailwind CSS v4 - **Visualization**: Sigma.js + Graphology + ForceAtlas2 (WebGL) - **Parsing**: Tree-sitter WASM (TypeScript, JavaScript, Python) -- **Database**: KuzuDB WASM (in-browser graph database) +- **Database**: KuzuDB WASM (in-browser graph database + vector index) - **Concurrency**: Web Worker + Comlink - **Caching**: lru-cache with WASM memory management +- **AI (WIP)**: transformers.js for browser embeddings, LangChain for agent orchestration --- @@ -428,27 +537,132 @@ Open http://localhost:5173 ### Graph RAG Agent (WIP) -The idea: ask questions in plain English, get answers backed by graph queries. +The idea: ask questions in plain English, get answers backed by graph queries + semantic understanding. ```mermaid -flowchart LR +flowchart TD USER[Your Question] --> LLM[LLM] - LLM --> TOOLS[Pick a Tool] - TOOLS --> CYPHER[Run Cypher] - TOOLS --> SEARCH[Semantic Search] - CYPHER --> CONTEXT[Gather Context] - SEARCH --> CONTEXT - CONTEXT --> LLM + LLM --> |Generates| CYPHER[Unified Cypher Query] + + subgraph KUZU[KuzuDB WASM] + CYPHER --> VEC[Vector Search] + VEC --> GRAPH[Graph Traversal] + GRAPH --> RANK[Ranked Results] + end + + RANK --> CTX[Rich Context + Code Snippets] + CTX --> LLM LLM --> ANSWER[Your Answer] ``` **Example interactions:** -- "What functions call `handleAuth`?" โ†’ Generates Cypher, returns list -- "Show me the blast radius if I change `UserService`" โ†’ Traverses dependencies -- "Find all files that import from `utils/`" โ†’ Pattern matching query +- "What functions call `handleAuth`?" โ†’ Vector search finds `handleAuth`, Cypher traces callers +- "Show me the blast radius if I change `UserService`" โ†’ Finds service, traverses 3 hops of dependencies +- "How does authentication work in this codebase?" โ†’ Semantic search for auth-related code, returns connected components + +**Why dynamic Cypher generation?** Originally I planned to use pre-built query templates (because LLMs can be... creative with syntax). But with the unified vector + graph approach, the LLM just needs to learn one pattern: + +```cypher +CALL QUERY_VECTOR_INDEX(...) WITH node, distance +WHERE distance < [threshold] +MATCH (node)-[relationship pattern]->(connected) +RETURN [what you need] +ORDER BY distance +``` + +Give the LLM the schema, a few examples, and let it compose queries. The schema is simple enough that modern LLMs (GPT-4, Claude) handle it well. And if a query fails? The error message is usually clear enough for the LLM to self-correct. + +--- + +## ๐Ÿ”ฌ Deep Dive: Copy-on-Write Woes with In-Memory WASM Databases + +While building the embedding pipeline, I hit an interesting memory problem. Documenting it here because it's a non-obvious gotcha for anyone doing vector storage in browser-side databases. + +### The Setup + +I wanted to store 384-dimensional embeddings alongside the code nodes. Natural instinct: add an `embedding FLOAT[384]` column to the existing `CodeNode` table, bulk load the graph, then `UPDATE` each node with its embedding. + +```cypher +-- Seemed reasonable, right? +MATCH (n:CodeNode {id: $id}) SET n.embedding = $vec +``` + +### The Problem + +Worked fine for ~20 nodes. Exploded at ~1000 nodes with: + +``` +Buffer manager exception: Unable to allocate memory! The buffer pool is full! +``` + +I configured a 512MB buffer pool. 1000 embeddings ร— 384 floats ร— 4 bytes = ~1.5MB. Where did 512MB go? + +**Answer: Copy-on-Write (COW).** + +Most databases don't modify records in place. When you `UPDATE`, they create a new version of the record (for transaction rollback, MVCC, etc.). The old version sticks around until commit. + +Our `CodeNode` table had a `content` field averaging ~2KB per node (code snippets). So each `UPDATE`: + +1. Reads the entire node (~2KB) +2. Creates a new copy with the embedding (~3.5KB) +3. Keeps the old version around + +For 1000 nodes: `1000 ร— 2KB (old) + 1000 ร— 3.5KB (new) = ~5.5MB`... but that's just user data. KuzuDB's internal structures (indexes, hash tables, page management) multiply this significantly. And since it's an in-memory database, the buffer pool IS the storage - there's no disk to spill to. + +```mermaid +flowchart LR + subgraph Before["Before UPDATE"] + N1[CodeNode
id + name + content
~2KB] + end + + subgraph During["During UPDATE (COW)"] + N1_OLD[Old Version
~2KB] + N1_NEW[New Version
+ embedding
~3.5KB] + end + + subgraph Problem["ร— 1000 nodes"] + BOOM[๐Ÿ’ฅ Buffer Pool Exhausted] + end + + Before --> During --> Problem +``` + +### The Fix: Separate Table Architecture + +Don't `UPDATE` wide tables. `INSERT` into a narrow one. + +```mermaid +flowchart TD + subgraph Old["โŒ Original Design"] + CN1[CodeNode
id, name, content, embedding
~3.5KB per UPDATE copy] + end + + subgraph New["โœ… New Design"] + CN2[CodeNode
id, name, content] + CE[CodeEmbedding
nodeId, embedding
~1.5KB INSERT only] + end + + Old -->|"COW copies entire 2KB+ node"| FAIL[Memory Explosion] + New -->|"INSERT into lightweight table"| WIN[Works at scale] +``` + +Now the process is: +1. Bulk load `CodeNode` (no embedding column) +2. `CREATE` rows in `CodeEmbedding` table (just `nodeId` + `embedding`) +3. Vector index lives on `CodeEmbedding` +4. Semantic search JOINs back to `CodeNode` for metadata + +**Trade-off:** Every semantic search needs a JOIN. But it's a primary key lookup (O(1)), so it's only ~1-5ms extra per query. Totally worth it to not explode at 1000 nodes. + +### Lessons Learned + +1. **In-memory WASM DBs have hard limits** - No disk spillover, buffer pool is everything +2. **COW amplifies record size** - That innocent `UPDATE` copies your whole row +3. **Normalize for bulk writes** - Especially for append-only data like embeddings +4. **Profile the pathological case** - 20 nodes worked, 1000 didn't. Always test at scale -**Why pre-built query templates?** LLMs are... creative with Cypher syntax. Instead of letting the LLM generate queries from scratch (and fail half the time), we're building a library of reliable query templates that the LLM can choose from and fill in. +This is one of those "obvious in hindsight" things. Most vector DB tutorials show single-table schemas because they're using databases with disk backing. In-browser WASM land plays by different rules. --- diff --git a/api/proxy.ts b/api/proxy.ts index f59a73e33e..76098144c0 100644 --- a/api/proxy.ts +++ b/api/proxy.ts @@ -78,9 +78,16 @@ export default async function handler(req: VercelRequest, res: VercelResponse) { res.setHeader('Access-Control-Allow-Origin', '*'); res.setHeader('Access-Control-Expose-Headers', '*'); - // Forward response headers + // Forward response headers (except ones that cause issues) + const skipHeaders = [ + 'content-encoding', + 'transfer-encoding', + 'connection', + 'www-authenticate', // IMPORTANT: Strip this to prevent browser's native auth popup! + ]; + response.headers.forEach((value, key) => { - if (!['content-encoding', 'transfer-encoding', 'connection'].includes(key.toLowerCase())) { + if (!skipHeaders.includes(key.toLowerCase())) { res.setHeader(key, value); } }); diff --git a/package-lock.json b/package-lock.json index 84f04a83ae..5cdccdc15f 100644 --- a/package-lock.json +++ b/package-lock.json @@ -8,6 +8,7 @@ "name": "gitnexus", "version": "0.0.0", "dependencies": { + "@huggingface/transformers": "^3.0.0", "@isomorphic-git/lightning-fs": "^4.6.2", "@sigma/edge-curve": "^3.1.0", "@tailwindcss/vite": "^4.1.18", @@ -428,6 +429,16 @@ "node": ">=16" } }, + "node_modules/@emnapi/runtime": { + "version": "1.8.1", + "resolved": "https://registry.npmjs.org/@emnapi/runtime/-/runtime-1.8.1.tgz", + "integrity": "sha512-mehfKSMWjjNol8659Z8KxEMrdSJDDot5SXMq00dM8BN4o+CLNXQ0xH2V7EchNHV4RmbZLmmPdEaXZc5H2FXmDg==", + "license": "MIT", + "optional": true, + "dependencies": { + "tslib": "^2.4.0" + } + }, "node_modules/@esbuild/aix-ppc64": { "version": "0.21.5", "resolved": "https://registry.npmjs.org/@esbuild/aix-ppc64/-/aix-ppc64-0.21.5.tgz", @@ -723,87 +734,573 @@ "cpu": [ "x64" ], - "license": "MIT", + "license": "MIT", + "optional": true, + "os": [ + "openbsd" + ], + "engines": { + "node": ">=12" + } + }, + "node_modules/@esbuild/sunos-x64": { + "version": "0.21.5", + "resolved": "https://registry.npmjs.org/@esbuild/sunos-x64/-/sunos-x64-0.21.5.tgz", + "integrity": "sha512-6+gjmFpfy0BHU5Tpptkuh8+uw3mnrvgs+dSPQXQOv3ekbordwnzTVEb4qnIvQcYXq6gzkyTnoZ9dZG+D4garKg==", + "cpu": [ + "x64" + ], + "license": "MIT", + "optional": true, + "os": [ + "sunos" + ], + "engines": { + "node": ">=12" + } + }, + "node_modules/@esbuild/win32-arm64": { + "version": "0.21.5", + "resolved": "https://registry.npmjs.org/@esbuild/win32-arm64/-/win32-arm64-0.21.5.tgz", + "integrity": "sha512-Z0gOTd75VvXqyq7nsl93zwahcTROgqvuAcYDUr+vOv8uHhNSKROyU961kgtCD1e95IqPKSQKH7tBTslnS3tA8A==", + "cpu": [ + "arm64" + ], + "license": "MIT", + "optional": true, + "os": [ + "win32" + ], + "engines": { + "node": ">=12" + } + }, + "node_modules/@esbuild/win32-ia32": { + "version": "0.21.5", + "resolved": "https://registry.npmjs.org/@esbuild/win32-ia32/-/win32-ia32-0.21.5.tgz", + "integrity": "sha512-SWXFF1CL2RVNMaVs+BBClwtfZSvDgtL//G/smwAc5oVK/UPu2Gu9tIaRgFmYFFKrmg3SyAjSrElf0TiJ1v8fYA==", + "cpu": [ + "ia32" + ], + "license": "MIT", + "optional": true, + "os": [ + "win32" + ], + "engines": { + "node": ">=12" + } + }, + "node_modules/@esbuild/win32-x64": { + "version": "0.21.5", + "resolved": "https://registry.npmjs.org/@esbuild/win32-x64/-/win32-x64-0.21.5.tgz", + "integrity": "sha512-tQd/1efJuzPC6rCFwEvLtci/xNFcTZknmXs98FYDfGE4wP9ClFV98nyKrzJKVPMhdDnjzLhdUyMX4PsQAPjwIw==", + "cpu": [ + "x64" + ], + "license": "MIT", + "optional": true, + "os": [ + "win32" + ], + "engines": { + "node": ">=12" + } + }, + "node_modules/@fastify/busboy": { + "version": "2.1.1", + "resolved": "https://registry.npmjs.org/@fastify/busboy/-/busboy-2.1.1.tgz", + "integrity": "sha512-vBZP4NlzfOlerQTnba4aqZoMhE/a9HY7HRqoOPaETQcSQuWEIyZMHGfVu6w9wGtGK5fED5qRs2DteVCjOH60sA==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=14" + } + }, + "node_modules/@huggingface/jinja": { + "version": "0.5.3", + "resolved": "https://registry.npmjs.org/@huggingface/jinja/-/jinja-0.5.3.tgz", + "integrity": "sha512-asqfZ4GQS0hD876Uw4qiUb7Tr/V5Q+JZuo2L+BtdrD4U40QU58nIRq3ZSgAzJgT874VLjhGVacaYfrdpXtEvtA==", + "license": "MIT", + "engines": { + "node": ">=18" + } + }, + "node_modules/@huggingface/transformers": { + "version": "3.8.1", + "resolved": "https://registry.npmjs.org/@huggingface/transformers/-/transformers-3.8.1.tgz", + "integrity": "sha512-tsTk4zVjImqdqjS8/AOZg2yNLd1z9S5v+7oUPpXaasDRwEDhB+xnglK1k5cad26lL5/ZIaeREgWWy0bs9y9pPA==", + "license": "Apache-2.0", + "dependencies": { + "@huggingface/jinja": "^0.5.3", + "onnxruntime-node": "1.21.0", + "onnxruntime-web": "1.22.0-dev.20250409-89f8206ba4", + "sharp": "^0.34.1" + } + }, + "node_modules/@img/colour": { + "version": "1.0.0", + "resolved": "https://registry.npmjs.org/@img/colour/-/colour-1.0.0.tgz", + "integrity": "sha512-A5P/LfWGFSl6nsckYtjw9da+19jB8hkJ6ACTGcDfEJ0aE+l2n2El7dsVM7UVHZQ9s2lmYMWlrS21YLy2IR1LUw==", + "license": "MIT", + "engines": { + "node": ">=18" + } + }, + "node_modules/@img/sharp-darwin-arm64": { + "version": "0.34.5", + "resolved": "https://registry.npmjs.org/@img/sharp-darwin-arm64/-/sharp-darwin-arm64-0.34.5.tgz", + "integrity": "sha512-imtQ3WMJXbMY4fxb/Ndp6HBTNVtWCUI0WdobyheGf5+ad6xX8VIDO8u2xE4qc/fr08CKG/7dDseFtn6M6g/r3w==", + "cpu": [ + "arm64" + ], + "license": "Apache-2.0", + "optional": true, + "os": [ + "darwin" + ], + "engines": { + "node": "^18.17.0 || ^20.3.0 || >=21.0.0" + }, + "funding": { + "url": "https://opencollective.com/libvips" + }, + "optionalDependencies": { + "@img/sharp-libvips-darwin-arm64": "1.2.4" + } + }, + "node_modules/@img/sharp-darwin-x64": { + "version": "0.34.5", + "resolved": "https://registry.npmjs.org/@img/sharp-darwin-x64/-/sharp-darwin-x64-0.34.5.tgz", + "integrity": "sha512-YNEFAF/4KQ/PeW0N+r+aVVsoIY0/qxxikF2SWdp+NRkmMB7y9LBZAVqQ4yhGCm/H3H270OSykqmQMKLBhBJDEw==", + "cpu": [ + "x64" + ], + "license": "Apache-2.0", + "optional": true, + "os": [ + "darwin" + ], + "engines": { + "node": "^18.17.0 || ^20.3.0 || >=21.0.0" + }, + "funding": { + "url": "https://opencollective.com/libvips" + }, + "optionalDependencies": { + "@img/sharp-libvips-darwin-x64": "1.2.4" + } + }, + "node_modules/@img/sharp-libvips-darwin-arm64": { + "version": "1.2.4", + "resolved": "https://registry.npmjs.org/@img/sharp-libvips-darwin-arm64/-/sharp-libvips-darwin-arm64-1.2.4.tgz", + "integrity": "sha512-zqjjo7RatFfFoP0MkQ51jfuFZBnVE2pRiaydKJ1G/rHZvnsrHAOcQALIi9sA5co5xenQdTugCvtb1cuf78Vf4g==", + "cpu": [ + "arm64" + ], + "license": "LGPL-3.0-or-later", + "optional": true, + "os": [ + "darwin" + ], + "funding": { + "url": "https://opencollective.com/libvips" + } + }, + "node_modules/@img/sharp-libvips-darwin-x64": { + "version": "1.2.4", + "resolved": "https://registry.npmjs.org/@img/sharp-libvips-darwin-x64/-/sharp-libvips-darwin-x64-1.2.4.tgz", + "integrity": "sha512-1IOd5xfVhlGwX+zXv2N93k0yMONvUlANylbJw1eTah8K/Jtpi15KC+WSiaX/nBmbm2HxRM1gZ0nSdjSsrZbGKg==", + "cpu": [ + "x64" + ], + "license": "LGPL-3.0-or-later", + "optional": true, + "os": [ + "darwin" + ], + "funding": { + "url": "https://opencollective.com/libvips" + } + }, + "node_modules/@img/sharp-libvips-linux-arm": { + "version": "1.2.4", + "resolved": "https://registry.npmjs.org/@img/sharp-libvips-linux-arm/-/sharp-libvips-linux-arm-1.2.4.tgz", + "integrity": "sha512-bFI7xcKFELdiNCVov8e44Ia4u2byA+l3XtsAj+Q8tfCwO6BQ8iDojYdvoPMqsKDkuoOo+X6HZA0s0q11ANMQ8A==", + "cpu": [ + "arm" + ], + "license": "LGPL-3.0-or-later", + "optional": true, + "os": [ + "linux" + ], + "funding": { + "url": "https://opencollective.com/libvips" + } + }, + "node_modules/@img/sharp-libvips-linux-arm64": { + "version": "1.2.4", + "resolved": "https://registry.npmjs.org/@img/sharp-libvips-linux-arm64/-/sharp-libvips-linux-arm64-1.2.4.tgz", + "integrity": "sha512-excjX8DfsIcJ10x1Kzr4RcWe1edC9PquDRRPx3YVCvQv+U5p7Yin2s32ftzikXojb1PIFc/9Mt28/y+iRklkrw==", + "cpu": [ + "arm64" + ], + "license": "LGPL-3.0-or-later", + "optional": true, + "os": [ + "linux" + ], + "funding": { + "url": "https://opencollective.com/libvips" + } + }, + "node_modules/@img/sharp-libvips-linux-ppc64": { + "version": "1.2.4", + "resolved": "https://registry.npmjs.org/@img/sharp-libvips-linux-ppc64/-/sharp-libvips-linux-ppc64-1.2.4.tgz", + "integrity": "sha512-FMuvGijLDYG6lW+b/UvyilUWu5Ayu+3r2d1S8notiGCIyYU/76eig1UfMmkZ7vwgOrzKzlQbFSuQfgm7GYUPpA==", + "cpu": [ + "ppc64" + ], + "license": "LGPL-3.0-or-later", + "optional": true, + "os": [ + "linux" + ], + "funding": { + "url": "https://opencollective.com/libvips" + } + }, + "node_modules/@img/sharp-libvips-linux-riscv64": { + "version": "1.2.4", + "resolved": "https://registry.npmjs.org/@img/sharp-libvips-linux-riscv64/-/sharp-libvips-linux-riscv64-1.2.4.tgz", + "integrity": "sha512-oVDbcR4zUC0ce82teubSm+x6ETixtKZBh/qbREIOcI3cULzDyb18Sr/Wcyx7NRQeQzOiHTNbZFF1UwPS2scyGA==", + "cpu": [ + "riscv64" + ], + "license": "LGPL-3.0-or-later", + "optional": true, + "os": [ + "linux" + ], + "funding": { + "url": "https://opencollective.com/libvips" + } + }, + "node_modules/@img/sharp-libvips-linux-s390x": { + "version": "1.2.4", + "resolved": "https://registry.npmjs.org/@img/sharp-libvips-linux-s390x/-/sharp-libvips-linux-s390x-1.2.4.tgz", + "integrity": "sha512-qmp9VrzgPgMoGZyPvrQHqk02uyjA0/QrTO26Tqk6l4ZV0MPWIW6LTkqOIov+J1yEu7MbFQaDpwdwJKhbJvuRxQ==", + "cpu": [ + "s390x" + ], + "license": "LGPL-3.0-or-later", + "optional": true, + "os": [ + "linux" + ], + "funding": { + "url": "https://opencollective.com/libvips" + } + }, + "node_modules/@img/sharp-libvips-linux-x64": { + "version": "1.2.4", + "resolved": "https://registry.npmjs.org/@img/sharp-libvips-linux-x64/-/sharp-libvips-linux-x64-1.2.4.tgz", + "integrity": "sha512-tJxiiLsmHc9Ax1bz3oaOYBURTXGIRDODBqhveVHonrHJ9/+k89qbLl0bcJns+e4t4rvaNBxaEZsFtSfAdquPrw==", + "cpu": [ + "x64" + ], + "license": "LGPL-3.0-or-later", + "optional": true, + "os": [ + "linux" + ], + "funding": { + "url": "https://opencollective.com/libvips" + } + }, + "node_modules/@img/sharp-libvips-linuxmusl-arm64": { + "version": "1.2.4", + "resolved": "https://registry.npmjs.org/@img/sharp-libvips-linuxmusl-arm64/-/sharp-libvips-linuxmusl-arm64-1.2.4.tgz", + "integrity": "sha512-FVQHuwx1IIuNow9QAbYUzJ+En8KcVm9Lk5+uGUQJHaZmMECZmOlix9HnH7n1TRkXMS0pGxIJokIVB9SuqZGGXw==", + "cpu": [ + "arm64" + ], + "license": "LGPL-3.0-or-later", + "optional": true, + "os": [ + "linux" + ], + "funding": { + "url": "https://opencollective.com/libvips" + } + }, + "node_modules/@img/sharp-libvips-linuxmusl-x64": { + "version": "1.2.4", + "resolved": "https://registry.npmjs.org/@img/sharp-libvips-linuxmusl-x64/-/sharp-libvips-linuxmusl-x64-1.2.4.tgz", + "integrity": "sha512-+LpyBk7L44ZIXwz/VYfglaX/okxezESc6UxDSoyo2Ks6Jxc4Y7sGjpgU9s4PMgqgjj1gZCylTieNamqA1MF7Dg==", + "cpu": [ + "x64" + ], + "license": "LGPL-3.0-or-later", + "optional": true, + "os": [ + "linux" + ], + "funding": { + "url": "https://opencollective.com/libvips" + } + }, + "node_modules/@img/sharp-linux-arm": { + "version": "0.34.5", + "resolved": "https://registry.npmjs.org/@img/sharp-linux-arm/-/sharp-linux-arm-0.34.5.tgz", + "integrity": "sha512-9dLqsvwtg1uuXBGZKsxem9595+ujv0sJ6Vi8wcTANSFpwV/GONat5eCkzQo/1O6zRIkh0m/8+5BjrRr7jDUSZw==", + "cpu": [ + "arm" + ], + "license": "Apache-2.0", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": "^18.17.0 || ^20.3.0 || >=21.0.0" + }, + "funding": { + "url": "https://opencollective.com/libvips" + }, + "optionalDependencies": { + "@img/sharp-libvips-linux-arm": "1.2.4" + } + }, + "node_modules/@img/sharp-linux-arm64": { + "version": "0.34.5", + "resolved": "https://registry.npmjs.org/@img/sharp-linux-arm64/-/sharp-linux-arm64-0.34.5.tgz", + "integrity": "sha512-bKQzaJRY/bkPOXyKx5EVup7qkaojECG6NLYswgktOZjaXecSAeCWiZwwiFf3/Y+O1HrauiE3FVsGxFg8c24rZg==", + "cpu": [ + "arm64" + ], + "license": "Apache-2.0", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": "^18.17.0 || ^20.3.0 || >=21.0.0" + }, + "funding": { + "url": "https://opencollective.com/libvips" + }, + "optionalDependencies": { + "@img/sharp-libvips-linux-arm64": "1.2.4" + } + }, + "node_modules/@img/sharp-linux-ppc64": { + "version": "0.34.5", + "resolved": "https://registry.npmjs.org/@img/sharp-linux-ppc64/-/sharp-linux-ppc64-0.34.5.tgz", + "integrity": "sha512-7zznwNaqW6YtsfrGGDA6BRkISKAAE1Jo0QdpNYXNMHu2+0dTrPflTLNkpc8l7MUP5M16ZJcUvysVWWrMefZquA==", + "cpu": [ + "ppc64" + ], + "license": "Apache-2.0", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": "^18.17.0 || ^20.3.0 || >=21.0.0" + }, + "funding": { + "url": "https://opencollective.com/libvips" + }, + "optionalDependencies": { + "@img/sharp-libvips-linux-ppc64": "1.2.4" + } + }, + "node_modules/@img/sharp-linux-riscv64": { + "version": "0.34.5", + "resolved": "https://registry.npmjs.org/@img/sharp-linux-riscv64/-/sharp-linux-riscv64-0.34.5.tgz", + "integrity": "sha512-51gJuLPTKa7piYPaVs8GmByo7/U7/7TZOq+cnXJIHZKavIRHAP77e3N2HEl3dgiqdD/w0yUfiJnII77PuDDFdw==", + "cpu": [ + "riscv64" + ], + "license": "Apache-2.0", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": "^18.17.0 || ^20.3.0 || >=21.0.0" + }, + "funding": { + "url": "https://opencollective.com/libvips" + }, + "optionalDependencies": { + "@img/sharp-libvips-linux-riscv64": "1.2.4" + } + }, + "node_modules/@img/sharp-linux-s390x": { + "version": "0.34.5", + "resolved": "https://registry.npmjs.org/@img/sharp-linux-s390x/-/sharp-linux-s390x-0.34.5.tgz", + "integrity": "sha512-nQtCk0PdKfho3eC5MrbQoigJ2gd1CgddUMkabUj+rBevs8tZ2cULOx46E7oyX+04WGfABgIwmMC0VqieTiR4jg==", + "cpu": [ + "s390x" + ], + "license": "Apache-2.0", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": "^18.17.0 || ^20.3.0 || >=21.0.0" + }, + "funding": { + "url": "https://opencollective.com/libvips" + }, + "optionalDependencies": { + "@img/sharp-libvips-linux-s390x": "1.2.4" + } + }, + "node_modules/@img/sharp-linux-x64": { + "version": "0.34.5", + "resolved": "https://registry.npmjs.org/@img/sharp-linux-x64/-/sharp-linux-x64-0.34.5.tgz", + "integrity": "sha512-MEzd8HPKxVxVenwAa+JRPwEC7QFjoPWuS5NZnBt6B3pu7EG2Ge0id1oLHZpPJdn3OQK+BQDiw9zStiHBTJQQQQ==", + "cpu": [ + "x64" + ], + "license": "Apache-2.0", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": "^18.17.0 || ^20.3.0 || >=21.0.0" + }, + "funding": { + "url": "https://opencollective.com/libvips" + }, + "optionalDependencies": { + "@img/sharp-libvips-linux-x64": "1.2.4" + } + }, + "node_modules/@img/sharp-linuxmusl-arm64": { + "version": "0.34.5", + "resolved": "https://registry.npmjs.org/@img/sharp-linuxmusl-arm64/-/sharp-linuxmusl-arm64-0.34.5.tgz", + "integrity": "sha512-fprJR6GtRsMt6Kyfq44IsChVZeGN97gTD331weR1ex1c1rypDEABN6Tm2xa1wE6lYb5DdEnk03NZPqA7Id21yg==", + "cpu": [ + "arm64" + ], + "license": "Apache-2.0", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": "^18.17.0 || ^20.3.0 || >=21.0.0" + }, + "funding": { + "url": "https://opencollective.com/libvips" + }, + "optionalDependencies": { + "@img/sharp-libvips-linuxmusl-arm64": "1.2.4" + } + }, + "node_modules/@img/sharp-linuxmusl-x64": { + "version": "0.34.5", + "resolved": "https://registry.npmjs.org/@img/sharp-linuxmusl-x64/-/sharp-linuxmusl-x64-0.34.5.tgz", + "integrity": "sha512-Jg8wNT1MUzIvhBFxViqrEhWDGzqymo3sV7z7ZsaWbZNDLXRJZoRGrjulp60YYtV4wfY8VIKcWidjojlLcWrd8Q==", + "cpu": [ + "x64" + ], + "license": "Apache-2.0", "optional": true, "os": [ - "openbsd" + "linux" ], "engines": { - "node": ">=12" + "node": "^18.17.0 || ^20.3.0 || >=21.0.0" + }, + "funding": { + "url": "https://opencollective.com/libvips" + }, + "optionalDependencies": { + "@img/sharp-libvips-linuxmusl-x64": "1.2.4" } }, - "node_modules/@esbuild/sunos-x64": { - "version": "0.21.5", - "resolved": "https://registry.npmjs.org/@esbuild/sunos-x64/-/sunos-x64-0.21.5.tgz", - "integrity": "sha512-6+gjmFpfy0BHU5Tpptkuh8+uw3mnrvgs+dSPQXQOv3ekbordwnzTVEb4qnIvQcYXq6gzkyTnoZ9dZG+D4garKg==", + "node_modules/@img/sharp-wasm32": { + "version": "0.34.5", + "resolved": "https://registry.npmjs.org/@img/sharp-wasm32/-/sharp-wasm32-0.34.5.tgz", + "integrity": "sha512-OdWTEiVkY2PHwqkbBI8frFxQQFekHaSSkUIJkwzclWZe64O1X4UlUjqqqLaPbUpMOQk6FBu/HtlGXNblIs0huw==", "cpu": [ - "x64" + "wasm32" ], - "license": "MIT", + "license": "Apache-2.0 AND LGPL-3.0-or-later AND MIT", "optional": true, - "os": [ - "sunos" - ], + "dependencies": { + "@emnapi/runtime": "^1.7.0" + }, "engines": { - "node": ">=12" + "node": "^18.17.0 || ^20.3.0 || >=21.0.0" + }, + "funding": { + "url": "https://opencollective.com/libvips" } }, - "node_modules/@esbuild/win32-arm64": { - "version": "0.21.5", - "resolved": "https://registry.npmjs.org/@esbuild/win32-arm64/-/win32-arm64-0.21.5.tgz", - "integrity": "sha512-Z0gOTd75VvXqyq7nsl93zwahcTROgqvuAcYDUr+vOv8uHhNSKROyU961kgtCD1e95IqPKSQKH7tBTslnS3tA8A==", + "node_modules/@img/sharp-win32-arm64": { + "version": "0.34.5", + "resolved": "https://registry.npmjs.org/@img/sharp-win32-arm64/-/sharp-win32-arm64-0.34.5.tgz", + "integrity": "sha512-WQ3AgWCWYSb2yt+IG8mnC6Jdk9Whs7O0gxphblsLvdhSpSTtmu69ZG1Gkb6NuvxsNACwiPV6cNSZNzt0KPsw7g==", "cpu": [ "arm64" ], - "license": "MIT", + "license": "Apache-2.0 AND LGPL-3.0-or-later", "optional": true, "os": [ "win32" ], "engines": { - "node": ">=12" + "node": "^18.17.0 || ^20.3.0 || >=21.0.0" + }, + "funding": { + "url": "https://opencollective.com/libvips" } }, - "node_modules/@esbuild/win32-ia32": { - "version": "0.21.5", - "resolved": "https://registry.npmjs.org/@esbuild/win32-ia32/-/win32-ia32-0.21.5.tgz", - "integrity": "sha512-SWXFF1CL2RVNMaVs+BBClwtfZSvDgtL//G/smwAc5oVK/UPu2Gu9tIaRgFmYFFKrmg3SyAjSrElf0TiJ1v8fYA==", + "node_modules/@img/sharp-win32-ia32": { + "version": "0.34.5", + "resolved": "https://registry.npmjs.org/@img/sharp-win32-ia32/-/sharp-win32-ia32-0.34.5.tgz", + "integrity": "sha512-FV9m/7NmeCmSHDD5j4+4pNI8Cp3aW+JvLoXcTUo0IqyjSfAZJ8dIUmijx1qaJsIiU+Hosw6xM5KijAWRJCSgNg==", "cpu": [ "ia32" ], - "license": "MIT", + "license": "Apache-2.0 AND LGPL-3.0-or-later", "optional": true, "os": [ "win32" ], "engines": { - "node": ">=12" + "node": "^18.17.0 || ^20.3.0 || >=21.0.0" + }, + "funding": { + "url": "https://opencollective.com/libvips" } }, - "node_modules/@esbuild/win32-x64": { - "version": "0.21.5", - "resolved": "https://registry.npmjs.org/@esbuild/win32-x64/-/win32-x64-0.21.5.tgz", - "integrity": "sha512-tQd/1efJuzPC6rCFwEvLtci/xNFcTZknmXs98FYDfGE4wP9ClFV98nyKrzJKVPMhdDnjzLhdUyMX4PsQAPjwIw==", + "node_modules/@img/sharp-win32-x64": { + "version": "0.34.5", + "resolved": "https://registry.npmjs.org/@img/sharp-win32-x64/-/sharp-win32-x64-0.34.5.tgz", + "integrity": "sha512-+29YMsqY2/9eFEiW93eqWnuLcWcufowXewwSNIT6UwZdUUCrM3oFjMWH/Z6/TMmb4hlFenmfAVbpWeup2jryCw==", "cpu": [ "x64" ], - "license": "MIT", + "license": "Apache-2.0 AND LGPL-3.0-or-later", "optional": true, "os": [ "win32" ], "engines": { - "node": ">=12" - } - }, - "node_modules/@fastify/busboy": { - "version": "2.1.1", - "resolved": "https://registry.npmjs.org/@fastify/busboy/-/busboy-2.1.1.tgz", - "integrity": "sha512-vBZP4NlzfOlerQTnba4aqZoMhE/a9HY7HRqoOPaETQcSQuWEIyZMHGfVu6w9wGtGK5fED5qRs2DteVCjOH60sA==", - "dev": true, - "license": "MIT", - "engines": { - "node": ">=14" + "node": "^18.17.0 || ^20.3.0 || >=21.0.0" + }, + "funding": { + "url": "https://opencollective.com/libvips" } }, "node_modules/@isaacs/balanced-match": { @@ -833,7 +1330,6 @@ "version": "4.0.1", "resolved": "https://registry.npmjs.org/@isaacs/fs-minipass/-/fs-minipass-4.0.1.tgz", "integrity": "sha512-wgm9Ehl2jpeqP3zw/7mo3kRHFp5MEDhqAdwy1fTGkHAwnkGOVsgpvQhL8B5n1qlb01jV3n/bI0ZfZp5lWA1k4w==", - "dev": true, "license": "ISC", "dependencies": { "minipass": "^7.0.4" @@ -981,6 +1477,70 @@ "node": ">= 8" } }, + "node_modules/@protobufjs/aspromise": { + "version": "1.1.2", + "resolved": "https://registry.npmjs.org/@protobufjs/aspromise/-/aspromise-1.1.2.tgz", + "integrity": "sha512-j+gKExEuLmKwvz3OgROXtrJ2UG2x8Ch2YZUxahh+s1F2HZ+wAceUNLkvy6zKCPVRkU++ZWQrdxsUeQXmcg4uoQ==", + "license": "BSD-3-Clause" + }, + "node_modules/@protobufjs/base64": { + "version": "1.1.2", + "resolved": "https://registry.npmjs.org/@protobufjs/base64/-/base64-1.1.2.tgz", + "integrity": "sha512-AZkcAA5vnN/v4PDqKyMR5lx7hZttPDgClv83E//FMNhR2TMcLUhfRUBHCmSl0oi9zMgDDqRUJkSxO3wm85+XLg==", + "license": "BSD-3-Clause" + }, + "node_modules/@protobufjs/codegen": { + "version": "2.0.4", + "resolved": "https://registry.npmjs.org/@protobufjs/codegen/-/codegen-2.0.4.tgz", + "integrity": "sha512-YyFaikqM5sH0ziFZCN3xDC7zeGaB/d0IUb9CATugHWbd1FRFwWwt4ld4OYMPWu5a3Xe01mGAULCdqhMlPl29Jg==", + "license": "BSD-3-Clause" + }, + "node_modules/@protobufjs/eventemitter": { + "version": "1.1.0", + "resolved": "https://registry.npmjs.org/@protobufjs/eventemitter/-/eventemitter-1.1.0.tgz", + "integrity": "sha512-j9ednRT81vYJ9OfVuXG6ERSTdEL1xVsNgqpkxMsbIabzSo3goCjDIveeGv5d03om39ML71RdmrGNjG5SReBP/Q==", + "license": "BSD-3-Clause" + }, + "node_modules/@protobufjs/fetch": { + "version": "1.1.0", + "resolved": "https://registry.npmjs.org/@protobufjs/fetch/-/fetch-1.1.0.tgz", + "integrity": "sha512-lljVXpqXebpsijW71PZaCYeIcE5on1w5DlQy5WH6GLbFryLUrBD4932W/E2BSpfRJWseIL4v/KPgBFxDOIdKpQ==", + "license": "BSD-3-Clause", + "dependencies": { + "@protobufjs/aspromise": "^1.1.1", + "@protobufjs/inquire": "^1.1.0" + } + }, + "node_modules/@protobufjs/float": { + "version": "1.0.2", + "resolved": "https://registry.npmjs.org/@protobufjs/float/-/float-1.0.2.tgz", + "integrity": "sha512-Ddb+kVXlXst9d+R9PfTIxh1EdNkgoRe5tOX6t01f1lYWOvJnSPDBlG241QLzcyPdoNTsblLUdujGSE4RzrTZGQ==", + "license": "BSD-3-Clause" + }, + "node_modules/@protobufjs/inquire": { + "version": "1.1.0", + "resolved": "https://registry.npmjs.org/@protobufjs/inquire/-/inquire-1.1.0.tgz", + "integrity": "sha512-kdSefcPdruJiFMVSbn801t4vFK7KB/5gd2fYvrxhuJYg8ILrmn9SKSX2tZdV6V+ksulWqS7aXjBcRXl3wHoD9Q==", + "license": "BSD-3-Clause" + }, + "node_modules/@protobufjs/path": { + "version": "1.1.2", + "resolved": "https://registry.npmjs.org/@protobufjs/path/-/path-1.1.2.tgz", + "integrity": "sha512-6JOcJ5Tm08dOHAbdR3GrvP+yUUfkjG5ePsHYczMFLq3ZmMkAD98cDgcT2iA1lJ9NVwFd4tH/iSSoe44YWkltEA==", + "license": "BSD-3-Clause" + }, + "node_modules/@protobufjs/pool": { + "version": "1.1.0", + "resolved": "https://registry.npmjs.org/@protobufjs/pool/-/pool-1.1.0.tgz", + "integrity": "sha512-0kELaGSIDBKvcgS4zkjz1PeddatrjYcmMWOlAuAPwAeccUrPHdUqo/J6LiymHHEiJT5NrF1UVwxY14f+fy4WQw==", + "license": "BSD-3-Clause" + }, + "node_modules/@protobufjs/utf8": { + "version": "1.1.0", + "resolved": "https://registry.npmjs.org/@protobufjs/utf8/-/utf8-1.1.0.tgz", + "integrity": "sha512-Vvn3zZrhQZkkBE8LSuW3em98c0FwgO4nxzv6OdSxPKJIEKY2bGbHn+mhGIPerzI4twdxaP8/0+06HBpwf345Lw==", + "license": "BSD-3-Clause" + }, "node_modules/@rolldown/pluginutils": { "version": "1.0.0-beta.47", "resolved": "https://registry.npmjs.org/@rolldown/pluginutils/-/pluginutils-1.0.0-beta.47.tgz", @@ -1991,7 +2551,6 @@ "version": "24.10.1", "resolved": "https://registry.npmjs.org/@types/node/-/node-24.10.1.tgz", "integrity": "sha512-GNWcUTRBgIRJD5zj+Tq0fKOJ5XZajIiBroOF0yvj2bSU1WvNdYS/dn9UxwsujGW4JX06dnHyjV2y9rRaybH0iQ==", - "devOptional": true, "license": "MIT", "dependencies": { "undici-types": "~7.16.0" @@ -2458,6 +3017,13 @@ "file-uri-to-path": "1.0.0" } }, + "node_modules/boolean": { + "version": "3.2.0", + "resolved": "https://registry.npmjs.org/boolean/-/boolean-3.2.0.tgz", + "integrity": "sha512-d0II/GO9uf9lfUHH2BQsjxzRJZBdsjgsBiW4BvhWk/3qoKwQFjIDVN19PfX8F2D/r9PCMTtLWjYVCFrpeYUzsw==", + "deprecated": "Package no longer supported. Contact Support at https://www.npmjs.com/support for more info.", + "license": "MIT" + }, "node_modules/brace-expansion": { "version": "1.1.12", "resolved": "https://registry.npmjs.org/brace-expansion/-/brace-expansion-1.1.12.tgz", @@ -2696,7 +3262,6 @@ "version": "3.0.0", "resolved": "https://registry.npmjs.org/chownr/-/chownr-3.0.0.tgz", "integrity": "sha512-+IxzY9BZOQd/XuYPRmrvEVjF/nqj5kgT4kEq7VofrDoM1MxoRjEWkrCC3EtLi59TVawxTAn+orJwFQcrqEN1+g==", - "dev": true, "license": "BlueOak-1.0.0", "engines": { "node": ">=18" @@ -3287,6 +3852,23 @@ "url": "https://github.com/sponsors/ljharb" } }, + "node_modules/define-properties": { + "version": "1.2.1", + "resolved": "https://registry.npmjs.org/define-properties/-/define-properties-1.2.1.tgz", + "integrity": "sha512-8QmQKqEASLd5nx0U1B1okLElbUuuttJ/AnYmRXbbbGDWh6uS208EjD4Xqq/I9wK7u0v6O08XhTWnt5XtEbR6Dg==", + "license": "MIT", + "dependencies": { + "define-data-property": "^1.0.1", + "has-property-descriptors": "^1.0.0", + "object-keys": "^1.1.1" + }, + "engines": { + "node": ">= 0.4" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, "node_modules/delaunator": { "version": "5.0.1", "resolved": "https://registry.npmjs.org/delaunator/-/delaunator-5.0.1.tgz", @@ -3323,6 +3905,12 @@ "node": ">=8" } }, + "node_modules/detect-node": { + "version": "2.1.0", + "resolved": "https://registry.npmjs.org/detect-node/-/detect-node-2.1.0.tgz", + "integrity": "sha512-T0NIuQpnTvFDATNuHN5roPwSBG83rFsuO+MXXH9/3N1eFbn4wcPjttvjMLEPWJ0RGUYgQE7cGgS3tNxbqCGM7g==", + "license": "MIT" + }, "node_modules/devlop": { "version": "1.1.0", "resolved": "https://registry.npmjs.org/devlop/-/devlop-1.1.0.tgz", @@ -3479,6 +4067,12 @@ "node": ">= 0.4" } }, + "node_modules/es6-error": { + "version": "4.1.1", + "resolved": "https://registry.npmjs.org/es6-error/-/es6-error-4.1.1.tgz", + "integrity": "sha512-Um/+FxMr9CISWh0bi5Zv0iOD+4cFh5qLeks1qhAopKVAJw3drgKbKySikp7wGhDL0HPeaja0P5ULZrxLkniUVg==", + "license": "MIT" + }, "node_modules/esbuild": { "version": "0.21.5", "resolved": "https://registry.npmjs.org/esbuild/-/esbuild-0.21.5.tgz", @@ -3867,6 +4461,18 @@ "node": ">=6" } }, + "node_modules/escape-string-regexp": { + "version": "4.0.0", + "resolved": "https://registry.npmjs.org/escape-string-regexp/-/escape-string-regexp-4.0.0.tgz", + "integrity": "sha512-TtpcNJ3XAzx3Gq8sWRzJaVajRs0uVxA2YAkdb1jm2YkPz4G6egUFAyA3n5vtEIZefPk5Wa4UXbKuS5fKkJWdgA==", + "license": "MIT", + "engines": { + "node": ">=10" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, "node_modules/esm": { "version": "3.2.25", "resolved": "https://registry.npmjs.org/esm/-/esm-3.2.25.tgz", @@ -4000,6 +4606,12 @@ "node": ">=8" } }, + "node_modules/flatbuffers": { + "version": "25.9.23", + "resolved": "https://registry.npmjs.org/flatbuffers/-/flatbuffers-25.9.23.tgz", + "integrity": "sha512-MI1qs7Lo4Syw0EOzUl0xjs2lsoeqFku44KpngfIduHBYvzm8h2+7K8YMQh1JtVVVrUvhLpNwqVi4DERegUJhPQ==", + "license": "Apache-2.0" + }, "node_modules/follow-redirects": { "version": "1.15.11", "resolved": "https://registry.npmjs.org/follow-redirects/-/follow-redirects-1.15.11.tgz", @@ -4160,6 +4772,51 @@ "node": ">= 6" } }, + "node_modules/global-agent": { + "version": "3.0.0", + "resolved": "https://registry.npmjs.org/global-agent/-/global-agent-3.0.0.tgz", + "integrity": "sha512-PT6XReJ+D07JvGoxQMkT6qji/jVNfX/h364XHZOWeRzy64sSFr+xJ5OX7LI3b4MPQzdL4H8Y8M0xzPpsVMwA8Q==", + "license": "BSD-3-Clause", + "dependencies": { + "boolean": "^3.0.1", + "es6-error": "^4.1.1", + "matcher": "^3.0.0", + "roarr": "^2.15.3", + "semver": "^7.3.2", + "serialize-error": "^7.0.1" + }, + "engines": { + "node": ">=10.0" + } + }, + "node_modules/global-agent/node_modules/semver": { + "version": "7.7.3", + "resolved": "https://registry.npmjs.org/semver/-/semver-7.7.3.tgz", + "integrity": "sha512-SdsKMrI9TdgjdweUSR9MweHA4EJ8YxHn8DFaDisvhVlUOe4BF1tLD7GAj0lIqWVl+dPb/rExr0Btby5loQm20Q==", + "license": "ISC", + "bin": { + "semver": "bin/semver.js" + }, + "engines": { + "node": ">=10" + } + }, + "node_modules/globalthis": { + "version": "1.0.4", + "resolved": "https://registry.npmjs.org/globalthis/-/globalthis-1.0.4.tgz", + "integrity": "sha512-DpLKbNU4WylpxJykQujfCcwYWiV/Jhm50Goo0wrVILAv5jOr9d+H+UR3PhSCD2rCCEIg0uc+G+muBTwD54JhDQ==", + "license": "MIT", + "dependencies": { + "define-properties": "^1.2.1", + "gopd": "^1.0.1" + }, + "engines": { + "node": ">= 0.4" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, "node_modules/gopd": { "version": "1.2.0", "resolved": "https://registry.npmjs.org/gopd/-/gopd-1.2.0.tgz", @@ -4242,6 +4899,12 @@ "graphology-types": ">=0.23.0" } }, + "node_modules/guid-typescript": { + "version": "1.0.9", + "resolved": "https://registry.npmjs.org/guid-typescript/-/guid-typescript-1.0.9.tgz", + "integrity": "sha512-Y8T4vYhEfwJOTbouREvG+3XDsjr8E3kIr7uf+JZ0BYloFsttiHU0WfvANVsR7TxNUJa/WpCnw/Ino/p+DeBhBQ==", + "license": "ISC" + }, "node_modules/has-property-descriptors": { "version": "1.0.2", "resolved": "https://registry.npmjs.org/has-property-descriptors/-/has-property-descriptors-1.0.2.tgz", @@ -4742,6 +5405,12 @@ "dev": true, "license": "MIT" }, + "node_modules/json-stringify-safe": { + "version": "5.0.1", + "resolved": "https://registry.npmjs.org/json-stringify-safe/-/json-stringify-safe-5.0.1.tgz", + "integrity": "sha512-ZClg6AaYvamvYEE82d3Iyd3vSSIjQ+odgjaTzRuO3s7toCdFKczob2i0zCh7JE8kWn17yvAWhUVxvqGwUalsRA==", + "license": "ISC" + }, "node_modules/json5": { "version": "2.2.3", "resolved": "https://registry.npmjs.org/json5/-/json5-2.2.3.tgz", @@ -5062,6 +5731,12 @@ "url": "https://opencollective.com/parcel" } }, + "node_modules/long": { + "version": "5.3.2", + "resolved": "https://registry.npmjs.org/long/-/long-5.3.2.tgz", + "integrity": "sha512-mNAgZ1GmyNhD7AuqnTG3/VQ26o760+ZYBPKjPvugO8+nLbYfX6TVpJPseBvopbdY+qpZ/lKUnmEc1LeZYS3QAA==", + "license": "Apache-2.0" + }, "node_modules/longest-streak": { "version": "3.1.0", "resolved": "https://registry.npmjs.org/longest-streak/-/longest-streak-3.1.0.tgz", @@ -5132,6 +5807,18 @@ "dev": true, "license": "ISC" }, + "node_modules/matcher": { + "version": "3.0.0", + "resolved": "https://registry.npmjs.org/matcher/-/matcher-3.0.0.tgz", + "integrity": "sha512-OkeDaAZ/bQCxeFAozM55PKcKU0yJMPGifLwV4Qgjitu+5MoAfSQN4lsLJeXZ1b8w0x+/Emda6MZgXS1jvsapng==", + "license": "MIT", + "dependencies": { + "escape-string-regexp": "^4.0.0" + }, + "engines": { + "node": ">=10" + } + }, "node_modules/math-intrinsics": { "version": "1.1.0", "resolved": "https://registry.npmjs.org/math-intrinsics/-/math-intrinsics-1.1.0.tgz", @@ -5831,7 +6518,6 @@ "version": "7.1.2", "resolved": "https://registry.npmjs.org/minipass/-/minipass-7.1.2.tgz", "integrity": "sha512-qOOzS1cBTWYF4BH8fVePDBOO9iptMnGUEZwNc/cMWnTV2nVLZ7VoNWEPHkYczZA0pdoA7dl6e7FL659nX9S2aw==", - "dev": true, "license": "ISC", "engines": { "node": ">=16 || 14 >=14.17" @@ -5841,7 +6527,6 @@ "version": "3.1.0", "resolved": "https://registry.npmjs.org/minizlib/-/minizlib-3.1.0.tgz", "integrity": "sha512-KZxYo1BUkWD2TVFLr0MQoM8vUUigWD3LlD83a/75BqC+4qE0Hb1Vo5v1FgcfaNXvfXzr+5EhQ6ing/CaBijTlw==", - "dev": true, "license": "MIT", "dependencies": { "minipass": "^7.1.2" @@ -5963,6 +6648,15 @@ "node": ">=0.10.0" } }, + "node_modules/object-keys": { + "version": "1.1.1", + "resolved": "https://registry.npmjs.org/object-keys/-/object-keys-1.1.1.tgz", + "integrity": "sha512-NuAESUOUMrlIXOfHKzD6bpPu3tYt3xvjNdRIQ+FeT0lNb4K8WR70CaDxhuNguS2XG+GjkyMwOzsN5ZktImfhLA==", + "license": "MIT", + "engines": { + "node": ">= 0.4" + } + }, "node_modules/observable-fns": { "version": "0.6.1", "resolved": "https://registry.npmjs.org/observable-fns/-/observable-fns-0.6.1.tgz", @@ -5978,6 +6672,49 @@ "wrappy": "1" } }, + "node_modules/onnxruntime-common": { + "version": "1.21.0", + "resolved": "https://registry.npmjs.org/onnxruntime-common/-/onnxruntime-common-1.21.0.tgz", + "integrity": "sha512-Q632iLLrtCAVOTO65dh2+mNbQir/QNTVBG3h/QdZBpns7mZ0RYbLRBgGABPbpU9351AgYy7SJf1WaeVwMrBFPQ==", + "license": "MIT" + }, + "node_modules/onnxruntime-node": { + "version": "1.21.0", + "resolved": "https://registry.npmjs.org/onnxruntime-node/-/onnxruntime-node-1.21.0.tgz", + "integrity": "sha512-NeaCX6WW2L8cRCSqy3bInlo5ojjQqu2fD3D+9W5qb5irwxhEyWKXeH2vZ8W9r6VxaMPUan+4/7NDwZMtouZxEw==", + "hasInstallScript": true, + "license": "MIT", + "os": [ + "win32", + "darwin", + "linux" + ], + "dependencies": { + "global-agent": "^3.0.0", + "onnxruntime-common": "1.21.0", + "tar": "^7.0.1" + } + }, + "node_modules/onnxruntime-web": { + "version": "1.22.0-dev.20250409-89f8206ba4", + "resolved": "https://registry.npmjs.org/onnxruntime-web/-/onnxruntime-web-1.22.0-dev.20250409-89f8206ba4.tgz", + "integrity": "sha512-0uS76OPgH0hWCPrFKlL8kYVV7ckM7t/36HfbgoFw6Nd0CZVVbQC4PkrR8mBX8LtNUFZO25IQBqV2Hx2ho3FlbQ==", + "license": "MIT", + "dependencies": { + "flatbuffers": "^25.1.24", + "guid-typescript": "^1.0.9", + "long": "^5.2.3", + "onnxruntime-common": "1.22.0-dev.20250409-89f8206ba4", + "platform": "^1.3.6", + "protobufjs": "^7.2.4" + } + }, + "node_modules/onnxruntime-web/node_modules/onnxruntime-common": { + "version": "1.22.0-dev.20250409-89f8206ba4", + "resolved": "https://registry.npmjs.org/onnxruntime-common/-/onnxruntime-common-1.22.0-dev.20250409-89f8206ba4.tgz", + "integrity": "sha512-vDJMkfCfb0b1A836rgHj+ORuZf4B4+cc2bASQtpeoJLueuFc5DuYwjIZUBrSvx/fO5IrLjLz+oTrB3pcGlhovQ==", + "license": "MIT" + }, "node_modules/p-map": { "version": "7.0.4", "resolved": "https://registry.npmjs.org/p-map/-/p-map-7.0.4.tgz", @@ -6099,6 +6836,12 @@ "node": ">=6" } }, + "node_modules/platform": { + "version": "1.3.6", + "resolved": "https://registry.npmjs.org/platform/-/platform-1.3.6.tgz", + "integrity": "sha512-fnWVljUchTro6RiCFvCXBbNhJc2NijN7oIQxbwsyL0buWJPG85v81ehlHI9fXrJsMNgTofEoWIQeClKpgxFLrg==", + "license": "MIT" + }, "node_modules/possible-typed-array-names": { "version": "1.1.0", "resolved": "https://registry.npmjs.org/possible-typed-array-names/-/possible-typed-array-names-1.1.0.tgz", @@ -6186,6 +6929,30 @@ "url": "https://github.com/sponsors/wooorm" } }, + "node_modules/protobufjs": { + "version": "7.5.4", + "resolved": "https://registry.npmjs.org/protobufjs/-/protobufjs-7.5.4.tgz", + "integrity": "sha512-CvexbZtbov6jW2eXAvLukXjXUW1TzFaivC46BpWc/3BpcCysb5Vffu+B3XHMm8lVEuy2Mm4XGex8hBSg1yapPg==", + "hasInstallScript": true, + "license": "BSD-3-Clause", + "dependencies": { + "@protobufjs/aspromise": "^1.1.2", + "@protobufjs/base64": "^1.1.2", + "@protobufjs/codegen": "^2.0.4", + "@protobufjs/eventemitter": "^1.1.0", + "@protobufjs/fetch": "^1.1.0", + "@protobufjs/float": "^1.0.2", + "@protobufjs/inquire": "^1.1.0", + "@protobufjs/path": "^1.1.2", + "@protobufjs/pool": "^1.1.0", + "@protobufjs/utf8": "^1.1.0", + "@types/node": ">=13.7.0", + "long": "^5.0.0" + }, + "engines": { + "node": ">=12.0.0" + } + }, "node_modules/proxy-from-env": { "version": "1.1.0", "resolved": "https://registry.npmjs.org/proxy-from-env/-/proxy-from-env-1.1.0.tgz", @@ -6413,6 +7180,23 @@ "node": ">=0.10.0" } }, + "node_modules/roarr": { + "version": "2.15.4", + "resolved": "https://registry.npmjs.org/roarr/-/roarr-2.15.4.tgz", + "integrity": "sha512-CHhPh+UNHD2GTXNYhPWLnU8ONHdI+5DI+4EYIAOaiD63rHeYlZvyh8P+in5999TTSFgUYuKUAjzRI4mdh/p+2A==", + "license": "BSD-3-Clause", + "dependencies": { + "boolean": "^3.0.1", + "detect-node": "^2.0.4", + "globalthis": "^1.0.1", + "json-stringify-safe": "^5.0.1", + "semver-compare": "^1.0.0", + "sprintf-js": "^1.1.2" + }, + "engines": { + "node": ">=8.0" + } + }, "node_modules/robust-predicates": { "version": "3.0.2", "resolved": "https://registry.npmjs.org/robust-predicates/-/robust-predicates-3.0.2.tgz", @@ -6521,6 +7305,27 @@ "semver": "bin/semver.js" } }, + "node_modules/semver-compare": { + "version": "1.0.0", + "resolved": "https://registry.npmjs.org/semver-compare/-/semver-compare-1.0.0.tgz", + "integrity": "sha512-YM3/ITh2MJ5MtzaM429anh+x2jiLVjqILF4m4oyQB18W7Ggea7BfqdH/wGMK7dDiMghv/6WG7znWMwUDzJiXow==", + "license": "MIT" + }, + "node_modules/serialize-error": { + "version": "7.0.1", + "resolved": "https://registry.npmjs.org/serialize-error/-/serialize-error-7.0.1.tgz", + "integrity": "sha512-8I8TjW5KMOKsZQTvoxjuSIa7foAwPWGOts+6o7sgjz41/qMD9VQHEDxi6PBvK2l0MXUmqZyNpUK+T2tQaaElvw==", + "license": "MIT", + "dependencies": { + "type-fest": "^0.13.1" + }, + "engines": { + "node": ">=10" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, "node_modules/set-function-length": { "version": "1.2.2", "resolved": "https://registry.npmjs.org/set-function-length/-/set-function-length-1.2.2.tgz", @@ -6584,6 +7389,62 @@ ], "license": "MIT" }, + "node_modules/sharp": { + "version": "0.34.5", + "resolved": "https://registry.npmjs.org/sharp/-/sharp-0.34.5.tgz", + "integrity": "sha512-Ou9I5Ft9WNcCbXrU9cMgPBcCK8LiwLqcbywW3t4oDV37n1pzpuNLsYiAV8eODnjbtQlSDwZ2cUEeQz4E54Hltg==", + "hasInstallScript": true, + "license": "Apache-2.0", + "dependencies": { + "@img/colour": "^1.0.0", + "detect-libc": "^2.1.2", + "semver": "^7.7.3" + }, + "engines": { + "node": "^18.17.0 || ^20.3.0 || >=21.0.0" + }, + "funding": { + "url": "https://opencollective.com/libvips" + }, + "optionalDependencies": { + "@img/sharp-darwin-arm64": "0.34.5", + "@img/sharp-darwin-x64": "0.34.5", + "@img/sharp-libvips-darwin-arm64": "1.2.4", + "@img/sharp-libvips-darwin-x64": "1.2.4", + "@img/sharp-libvips-linux-arm": "1.2.4", + "@img/sharp-libvips-linux-arm64": "1.2.4", + "@img/sharp-libvips-linux-ppc64": "1.2.4", + "@img/sharp-libvips-linux-riscv64": "1.2.4", + "@img/sharp-libvips-linux-s390x": "1.2.4", + "@img/sharp-libvips-linux-x64": "1.2.4", + "@img/sharp-libvips-linuxmusl-arm64": "1.2.4", + "@img/sharp-libvips-linuxmusl-x64": "1.2.4", + "@img/sharp-linux-arm": "0.34.5", + "@img/sharp-linux-arm64": "0.34.5", + "@img/sharp-linux-ppc64": "0.34.5", + "@img/sharp-linux-riscv64": "0.34.5", + "@img/sharp-linux-s390x": "0.34.5", + "@img/sharp-linux-x64": "0.34.5", + "@img/sharp-linuxmusl-arm64": "0.34.5", + "@img/sharp-linuxmusl-x64": "0.34.5", + "@img/sharp-wasm32": "0.34.5", + "@img/sharp-win32-arm64": "0.34.5", + "@img/sharp-win32-ia32": "0.34.5", + "@img/sharp-win32-x64": "0.34.5" + } + }, + "node_modules/sharp/node_modules/semver": { + "version": "7.7.3", + "resolved": "https://registry.npmjs.org/semver/-/semver-7.7.3.tgz", + "integrity": "sha512-SdsKMrI9TdgjdweUSR9MweHA4EJ8YxHn8DFaDisvhVlUOe4BF1tLD7GAj0lIqWVl+dPb/rExr0Btby5loQm20Q==", + "license": "ISC", + "bin": { + "semver": "bin/semver.js" + }, + "engines": { + "node": ">=10" + } + }, "node_modules/sigma": { "version": "3.0.2", "resolved": "https://registry.npmjs.org/sigma/-/sigma-3.0.2.tgz", @@ -6671,6 +7532,12 @@ "url": "https://github.com/sponsors/wooorm" } }, + "node_modules/sprintf-js": { + "version": "1.1.3", + "resolved": "https://registry.npmjs.org/sprintf-js/-/sprintf-js-1.1.3.tgz", + "integrity": "sha512-Oo+0REFV59/rz3gfJNKQiBlwfHaSESl1pcGyABQsnnIfWOFt6JNj5gCog2U6MLZ//IGYD+nA8nI+mTShREReaA==", + "license": "BSD-3-Clause" + }, "node_modules/string_decoder": { "version": "1.1.1", "resolved": "https://registry.npmjs.org/string_decoder/-/string_decoder-1.1.1.tgz", @@ -6735,7 +7602,6 @@ "version": "7.5.2", "resolved": "https://registry.npmjs.org/tar/-/tar-7.5.2.tgz", "integrity": "sha512-7NyxrTE4Anh8km8iEy7o0QYPs+0JKBTj5ZaqHg6B39erLg0qYXN3BijtShwbsNSvQ+LN75+KV+C4QR/f6Gwnpg==", - "dev": true, "license": "BlueOak-1.0.0", "dependencies": { "@isaacs/fs-minipass": "^4.0.0", @@ -6752,7 +7618,6 @@ "version": "5.0.0", "resolved": "https://registry.npmjs.org/yallist/-/yallist-5.0.0.tgz", "integrity": "sha512-YgvUTfwqyc7UXVMrB+SImsVYSmTS8X/tSrtdNZMImM+n7+QTriRXyXim0mBrTXNeqzVF0KWGgHPeiyViFFrNDw==", - "dev": true, "license": "BlueOak-1.0.0", "engines": { "node": ">=18" @@ -6998,6 +7863,18 @@ "license": "0BSD", "optional": true }, + "node_modules/type-fest": { + "version": "0.13.1", + "resolved": "https://registry.npmjs.org/type-fest/-/type-fest-0.13.1.tgz", + "integrity": "sha512-34R7HTnG0XIJcBSn5XhDd7nNFPRcXYRZrBB2O2jdKqYODldSzBAqzsWoZYYvduky73toYS/ESqxPvkDf/F0XMg==", + "license": "(MIT OR CC0-1.0)", + "engines": { + "node": ">=10" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, "node_modules/typed-array-buffer": { "version": "1.0.3", "resolved": "https://registry.npmjs.org/typed-array-buffer/-/typed-array-buffer-1.0.3.tgz", @@ -7058,7 +7935,6 @@ "version": "7.16.0", "resolved": "https://registry.npmjs.org/undici-types/-/undici-types-7.16.0.tgz", "integrity": "sha512-Zz+aZWSj8LE6zoxD+xrjh4VfkIG8Ya6LvYkZqtUQGJPZjYl53ypCaUwWqo7eI0x66KBGeRo+mlBEkMSeSZ38Nw==", - "devOptional": true, "license": "MIT" }, "node_modules/unified": { diff --git a/package.json b/package.json index 270f95ff7b..79910530e1 100644 --- a/package.json +++ b/package.json @@ -9,6 +9,7 @@ "preview": "vite preview" }, "dependencies": { + "@huggingface/transformers": "^3.0.0", "@isomorphic-git/lightning-fs": "^4.6.2", "@sigma/edge-curve": "^3.1.0", "@tailwindcss/vite": "^4.1.18", diff --git a/src/components/DropZone.tsx b/src/components/DropZone.tsx index fea369299b..b89dbdfbe8 100644 --- a/src/components/DropZone.tsx +++ b/src/components/DropZone.tsx @@ -91,8 +91,14 @@ export const DropZone = ({ onFileSelect, onGitClone }: DropZoneProps) => { console.error('Clone failed:', err); const message = err instanceof Error ? err.message : 'Failed to clone repository'; // Provide helpful error for auth failures - if (message.includes('401') || message.includes('403')) { - setError('Authentication failed. Check your token or ensure the repo is accessible.'); + if (message.includes('401') || message.includes('403') || message.includes('Authentication')) { + if (!githubToken) { + setError('๐Ÿ”’ This looks like a private repo. Add a GitHub PAT (Personal Access Token) to access it.'); + } else { + setError('๐Ÿ”‘ Authentication failed. Check your token permissions (needs repo access).'); + } + } else if (message.includes('404') || message.includes('not found')) { + setError('Repository not found. Check the URL or it might be private (needs PAT).'); } else { setError(message); } @@ -330,7 +336,7 @@ export const DropZone = ({ onFileSelect, onGitClone }: DropZoneProps) => { {/* Security note */} {githubToken && (

- ๐Ÿ”’ Token stays in your browser only, never sent to our servers + ๐Ÿ”’ Token stays in your browser only, never sent to any server

)} diff --git a/src/components/EmbeddingStatus.tsx b/src/components/EmbeddingStatus.tsx new file mode 100644 index 0000000000..2af94c2841 --- /dev/null +++ b/src/components/EmbeddingStatus.tsx @@ -0,0 +1,195 @@ +import { Brain, Loader2, Check, AlertCircle, Zap, FlaskConical } from 'lucide-react'; +import { useAppState } from '../hooks/useAppState'; +import { useState } from 'react'; +import { WebGPUFallbackDialog } from './WebGPUFallbackDialog'; + +/** + * Embedding status indicator and trigger button + * Shows in header when graph is loaded + */ +export const EmbeddingStatus = () => { + const { + embeddingStatus, + embeddingProgress, + startEmbeddings, + graph, + viewMode, + testArrayParams, + } = useAppState(); + + const [testResult, setTestResult] = useState(null); + const [showFallbackDialog, setShowFallbackDialog] = useState(false); + + // Only show when exploring a loaded graph + if (viewMode !== 'exploring' || !graph) return null; + + const nodeCount = graph.nodes.length; + + const handleStartEmbeddings = async (forceDevice?: 'webgpu' | 'wasm') => { + try { + await startEmbeddings(forceDevice); + } catch (error: any) { + // Check if it's a WebGPU not available error + if (error?.name === 'WebGPUNotAvailableError' || + error?.message?.includes('WebGPU not available')) { + setShowFallbackDialog(true); + } else { + console.error('Embedding failed:', error); + } + } + }; + + const handleUseCPU = () => { + setShowFallbackDialog(false); + handleStartEmbeddings('wasm'); + }; + + const handleSkipEmbeddings = () => { + setShowFallbackDialog(false); + // Just close - user can try again later if they want + }; + + const handleTestArrayParams = async () => { + setTestResult('Testing...'); + const result = await testArrayParams(); + if (result.success) { + setTestResult('โœ… Array params WORK!'); + console.log('โœ… Array params test passed!'); + } else { + setTestResult(`โŒ ${result.error}`); + console.error('โŒ Array params test failed:', result.error); + } + }; + + // WebGPU fallback dialog - rendered independently of state + const fallbackDialog = ( + setShowFallbackDialog(false)} + onUseCPU={handleUseCPU} + onSkip={handleSkipEmbeddings} + nodeCount={nodeCount} + /> + ); + + // Idle state - show button to start + if (embeddingStatus === 'idle') { + return ( + <> +
+ {/* Test button (dev only) */} + {import.meta.env.DEV && ( + + )} + + +
+ {fallbackDialog} + + ); + } + + // Loading model + if (embeddingStatus === 'loading') { + const downloadPercent = embeddingProgress?.modelDownloadPercent ?? 0; + return ( + <> +
+ +
+ Loading AI model... +
+
+
+
+
+ {fallbackDialog} + + ); + } + + // Embedding in progress + if (embeddingStatus === 'embedding') { + const processed = embeddingProgress?.nodesProcessed ?? 0; + const total = embeddingProgress?.totalNodes ?? 0; + const percent = embeddingProgress?.percent ?? 0; + + return ( +
+ +
+ + Embedding {processed}/{total} nodes + +
+
+
+
+
+ ); + } + + // Indexing + if (embeddingStatus === 'indexing') { + return ( +
+ + Creating vector index... +
+ ); + } + + // Ready + if (embeddingStatus === 'ready') { + return ( +
+ + Semantic Ready +
+ ); + } + + // Error + if (embeddingStatus === 'error') { + return ( + <> + + {fallbackDialog} + + ); + } + + return null; +}; + diff --git a/src/components/Header.tsx b/src/components/Header.tsx index 0857160526..89ab6267ff 100644 --- a/src/components/Header.tsx +++ b/src/components/Header.tsx @@ -2,6 +2,7 @@ import { Search, Settings, HelpCircle, Sparkles } from 'lucide-react'; import { useAppState } from '../hooks/useAppState'; import { useState, useMemo, useRef, useEffect } from 'react'; import { GraphNode } from '../core/graph/types'; +import { EmbeddingStatus } from './EmbeddingStatus'; // Color mapping for node types in search results const NODE_TYPE_COLORS: Record = { @@ -184,12 +185,15 @@ export const Header = ({ onFocusNode }: HeaderProps) => {
{/* Stats */} {graph && ( -
+
{nodeCount} nodes {edgeCount} edges
)} + {/* Embedding Status */} + + {/* Icon buttons */} + +
+ {/* Animated emoji */} +
setIsAnimating(false)} + onClick={() => setIsAnimating(true)} + > + ๐Ÿค” +
+
+

+ WebGPU said "nope" +

+

+ Your browser doesn't support GPU acceleration +

+
+
+
+ + {/* Content */} +
+

+ Couldn't create embeddings with WebGPU, so semantic search (Graph RAG) + won't be as smart. The graph still works fine though! +

+ +
+

+ Your options: +

+
    +
  • + + + Use CPU โ€” Works but {isSmallCodebase ? 'a bit' : 'way'} slower + {nodeCount > 0 && ( + (~{estimatedMinutes} min for {nodeCount} nodes) + )} + +
  • +
  • + + + Skip it โ€” Graph works, just no AI semantic search + +
  • +
+
+ + {isSmallCodebase && ( +

+ + Small codebase detected! CPU should be fine. +

+ )} + +

+ ๐Ÿ’ก Tip: Try Chrome or Edge for WebGPU support +

+
+ + {/* Actions */} +
+ + +
+
+
+ ); +}; + diff --git a/src/core/embeddings/embedder.ts b/src/core/embeddings/embedder.ts new file mode 100644 index 0000000000..1188945835 --- /dev/null +++ b/src/core/embeddings/embedder.ts @@ -0,0 +1,302 @@ +/** + * Embedder Module + * + * Singleton factory for transformers.js embedding pipeline. + * Handles model loading, caching, and both single and batch embedding operations. + * + * Uses snowflake-arctic-embed-xs by default (22M params, 384 dims, ~90MB) + */ + +import { pipeline, env, type FeatureExtractionPipeline } from '@huggingface/transformers'; +import { DEFAULT_EMBEDDING_CONFIG, type EmbeddingConfig, type ModelProgress } from './types'; + +// Module-level state for singleton pattern +let embedderInstance: FeatureExtractionPipeline | null = null; +let isInitializing = false; +let initPromise: Promise | null = null; +let currentDevice: 'webgpu' | 'wasm' | null = null; + +/** + * Progress callback type for model loading + */ +export type ModelProgressCallback = (progress: ModelProgress) => void; + +/** + * Custom error thrown when WebGPU is not available + * Allows UI to prompt user for fallback choice + */ +export class WebGPUNotAvailableError extends Error { + constructor(originalError?: Error) { + super('WebGPU not available in this browser'); + this.name = 'WebGPUNotAvailableError'; + this.cause = originalError; + } +} + +/** + * Check if WebGPU is available in this browser + * Quick check without loading the model + */ +export const checkWebGPUAvailability = async (): Promise => { + try { + // Cast to any to avoid WebGPU types not being available in all TS configs + const nav = navigator as any; + if (!nav.gpu) { + return false; + } + const adapter = await nav.gpu.requestAdapter(); + if (!adapter) { + return false; + } + // Try to get a device - this is where it usually fails + const device = await adapter.requestDevice(); + device.destroy(); // Clean up + return true; + } catch { + return false; + } +}; + +/** + * Get the current device being used for inference + */ +export const getCurrentDevice = (): 'webgpu' | 'wasm' | null => currentDevice; + +/** + * Initialize the embedding model + * Uses singleton pattern - only loads once, subsequent calls return cached instance + * + * @param onProgress - Optional callback for model download progress + * @param config - Optional configuration override + * @param forceDevice - Force a specific device (bypasses WebGPU check) + * @returns Promise resolving to the embedder pipeline + * @throws WebGPUNotAvailableError if WebGPU is requested but unavailable + */ +export const initEmbedder = async ( + onProgress?: ModelProgressCallback, + config: Partial = {}, + forceDevice?: 'webgpu' | 'wasm' +): Promise => { + // Return existing instance if available + if (embedderInstance) { + return embedderInstance; + } + + // If already initializing, wait for that promise + if (isInitializing && initPromise) { + return initPromise; + } + + isInitializing = true; + + const finalConfig = { ...DEFAULT_EMBEDDING_CONFIG, ...config }; + const requestedDevice = forceDevice || finalConfig.device; + + initPromise = (async () => { + try { + // Configure transformers.js environment + env.allowLocalModels = false; + + if (import.meta.env.DEV) { + console.log(`๐Ÿง  Loading embedding model: ${finalConfig.modelId}`); + } + + const progressCallback = onProgress ? (data: any) => { + const progress: ModelProgress = { + status: data.status || 'progress', + file: data.file, + progress: data.progress, + loaded: data.loaded, + total: data.total, + }; + onProgress(progress); + } : undefined; + + // If WebGPU is requested (default), check availability first + if (requestedDevice === 'webgpu') { + if (import.meta.env.DEV) { + console.log('๐Ÿ”ง Checking WebGPU availability...'); + } + + const webgpuAvailable = await checkWebGPUAvailability(); + + if (!webgpuAvailable) { + if (import.meta.env.DEV) { + console.warn('โš ๏ธ WebGPU not available'); + } + isInitializing = false; + initPromise = null; + throw new WebGPUNotAvailableError(); + } + + // Try WebGPU + try { + if (import.meta.env.DEV) { + console.log('๐Ÿ”ง Initializing WebGPU backend...'); + } + + // Type assertion needed due to complex union types in transformers.js + embedderInstance = await (pipeline as any)( + 'feature-extraction', + finalConfig.modelId, + { + device: 'webgpu', + dtype: 'fp32', + progress_callback: progressCallback, + } + ); + currentDevice = 'webgpu'; + + if (import.meta.env.DEV) { + console.log('โœ… Using WebGPU backend'); + } + } catch (err) { + if (import.meta.env.DEV) { + console.warn('โš ๏ธ WebGPU initialization failed:', err); + } + isInitializing = false; + initPromise = null; + embedderInstance = null; + throw new WebGPUNotAvailableError(err as Error); + } + } else { + // WASM mode requested (user chose fallback) + if (import.meta.env.DEV) { + console.log('๐Ÿ”ง Initializing WASM backend (this will be slower)...'); + } + + // Type assertion needed due to complex union types in transformers.js + embedderInstance = await (pipeline as any)( + 'feature-extraction', + finalConfig.modelId, + { + device: 'wasm', // WASM-based CPU execution + dtype: 'fp32', + progress_callback: progressCallback, + } + ); + currentDevice = 'wasm'; + + if (import.meta.env.DEV) { + console.log('โœ… Using WASM backend'); + } + } + + if (import.meta.env.DEV) { + console.log('โœ… Embedding model loaded successfully'); + } + + return embedderInstance!; + } catch (error) { + // Re-throw WebGPUNotAvailableError as-is + if (error instanceof WebGPUNotAvailableError) { + throw error; + } + isInitializing = false; + initPromise = null; + embedderInstance = null; + throw error; + } finally { + isInitializing = false; + } + })(); + + return initPromise; +}; + +/** + * Check if the embedder is initialized and ready + */ +export const isEmbedderReady = (): boolean => { + return embedderInstance !== null; +}; + +/** + * Get the embedder instance (throws if not initialized) + */ +export const getEmbedder = (): FeatureExtractionPipeline => { + if (!embedderInstance) { + throw new Error('Embedder not initialized. Call initEmbedder() first.'); + } + return embedderInstance; +}; + +/** + * Embed a single text string + * + * @param text - Text to embed + * @returns Float32Array of embedding vector (384 dimensions) + */ +export const embedText = async (text: string): Promise => { + const embedder = getEmbedder(); + + const result = await embedder(text, { + pooling: 'mean', + normalize: true, + }); + + // Result is a Tensor, convert to Float32Array + return new Float32Array(result.data as ArrayLike); +}; + +/** + * Embed multiple texts in a single batch + * More efficient than calling embedText multiple times + * + * @param texts - Array of texts to embed + * @returns Array of Float32Array embedding vectors + */ +export const embedBatch = async (texts: string[]): Promise => { + if (texts.length === 0) { + return []; + } + + const embedder = getEmbedder(); + + // Process batch + const result = await embedder(texts, { + pooling: 'mean', + normalize: true, + }); + + // Result shape is [batch_size, dimensions] + // Need to split into individual vectors + const data = result.data as ArrayLike; + const dimensions = DEFAULT_EMBEDDING_CONFIG.dimensions; + const embeddings: Float32Array[] = []; + + for (let i = 0; i < texts.length; i++) { + const start = i * dimensions; + const end = start + dimensions; + embeddings.push(new Float32Array(Array.prototype.slice.call(data, start, end))); + } + + return embeddings; +}; + +/** + * Convert Float32Array to regular number array (for KuzuDB storage) + */ +export const embeddingToArray = (embedding: Float32Array): number[] => { + return Array.from(embedding); +}; + +/** + * Cleanup the embedder (free memory) + * Call this when done with embeddings + */ +export const disposeEmbedder = async (): Promise => { + if (embedderInstance) { + // transformers.js pipelines may have a dispose method + try { + if ('dispose' in embedderInstance && typeof embedderInstance.dispose === 'function') { + await embedderInstance.dispose(); + } + } catch { + // Ignore disposal errors + } + embedderInstance = null; + initPromise = null; + } +}; + diff --git a/src/core/embeddings/embedding-pipeline.ts b/src/core/embeddings/embedding-pipeline.ts new file mode 100644 index 0000000000..6a0d8619a6 --- /dev/null +++ b/src/core/embeddings/embedding-pipeline.ts @@ -0,0 +1,351 @@ +/** + * Embedding Pipeline Module + * + * Orchestrates the background embedding process: + * 1. Query embeddable nodes from KuzuDB + * 2. Generate text representations + * 3. Batch embed using transformers.js + * 4. Update KuzuDB with embeddings + * 5. Create vector index for semantic search + */ + +import { initEmbedder, embedBatch, embedText, embeddingToArray, isEmbedderReady } from './embedder'; +import { generateBatchEmbeddingTexts, generateEmbeddingText } from './text-generator'; +import { + type EmbeddingProgress, + type EmbeddingConfig, + type EmbeddableNode, + type SemanticSearchResult, + type ModelProgress, + DEFAULT_EMBEDDING_CONFIG, + EMBEDDABLE_LABELS, +} from './types'; + +/** + * Progress callback type + */ +export type EmbeddingProgressCallback = (progress: EmbeddingProgress) => void; + +/** + * Query all embeddable nodes from KuzuDB + */ +const queryEmbeddableNodes = async ( + executeQuery: (cypher: string) => Promise +): Promise => { + // Build WHERE clause for embeddable labels + const labelConditions = EMBEDDABLE_LABELS + .map(label => `n.label = '${label}'`) + .join(' OR '); + + const cypher = ` + MATCH (n:CodeNode) + WHERE ${labelConditions} + RETURN n.id AS id, n.name AS name, n.label AS label, + n.filePath AS filePath, n.content AS content, + n.startLine AS startLine, n.endLine AS endLine + `; + + const rows = await executeQuery(cypher); + + return rows.map(row => ({ + id: row.id ?? row[0], + name: row.name ?? row[1], + label: row.label ?? row[2], + filePath: row.filePath ?? row[3], + content: row.content ?? row[4] ?? '', + startLine: row.startLine ?? row[5], + endLine: row.endLine ?? row[6], + })); +}; + +/** + * Batch INSERT embeddings into separate CodeEmbedding table + * Using a separate lightweight table avoids copy-on-write overhead + * that occurs when UPDATEing nodes with large content fields + */ +const batchInsertEmbeddings = async ( + executeWithReusedStatement: ( + cypher: string, + paramsList: Array> + ) => Promise, + updates: Array<{ id: string; embedding: number[] }> +): Promise => { + // INSERT into separate embedding table - much more memory efficient! + const cypher = `CREATE (e:CodeEmbedding {nodeId: $nodeId, embedding: $embedding})`; + const paramsList = updates.map(u => ({ nodeId: u.id, embedding: u.embedding })); + await executeWithReusedStatement(cypher, paramsList); +}; + +/** + * Create the vector index for semantic search + * Now indexes the separate CodeEmbedding table + */ +const createVectorIndex = async ( + executeQuery: (cypher: string) => Promise +): Promise => { + const cypher = ` + CALL CREATE_VECTOR_INDEX('CodeEmbedding', 'code_embedding_idx', 'embedding', metric := 'cosine') + `; + + try { + await executeQuery(cypher); + } catch (error) { + // Index might already exist + if (import.meta.env.DEV) { + console.warn('Vector index creation warning:', error); + } + } +}; + +/** + * Run the embedding pipeline + * + * @param executeQuery - Function to execute Cypher queries against KuzuDB + * @param executeWithReusedStatement - Function to execute with reused prepared statement + * @param onProgress - Callback for progress updates + * @param config - Optional configuration override + */ +export const runEmbeddingPipeline = async ( + executeQuery: (cypher: string) => Promise, + executeWithReusedStatement: (cypher: string, paramsList: Array>) => Promise, + onProgress: EmbeddingProgressCallback, + config: Partial = {} +): Promise => { + const finalConfig = { ...DEFAULT_EMBEDDING_CONFIG, ...config }; + + try { + // Phase 1: Load embedding model + onProgress({ + phase: 'loading-model', + percent: 0, + modelDownloadPercent: 0, + }); + + await initEmbedder((modelProgress: ModelProgress) => { + // Report model download progress + const downloadPercent = modelProgress.progress ?? 0; + onProgress({ + phase: 'loading-model', + percent: Math.round(downloadPercent * 0.2), // 0-20% for model loading + modelDownloadPercent: downloadPercent, + }); + }, finalConfig); + + onProgress({ + phase: 'loading-model', + percent: 20, + modelDownloadPercent: 100, + }); + + if (import.meta.env.DEV) { + console.log('๐Ÿ” Querying embeddable nodes...'); + } + + // Phase 2: Query embeddable nodes + const nodes = await queryEmbeddableNodes(executeQuery); + const totalNodes = nodes.length; + + if (import.meta.env.DEV) { + console.log(`๐Ÿ“Š Found ${totalNodes} embeddable nodes`); + } + + if (totalNodes === 0) { + onProgress({ + phase: 'ready', + percent: 100, + nodesProcessed: 0, + totalNodes: 0, + }); + return; + } + + // Phase 3: Batch embed nodes + const batchSize = finalConfig.batchSize; + const totalBatches = Math.ceil(totalNodes / batchSize); + let processedNodes = 0; + + onProgress({ + phase: 'embedding', + percent: 20, + nodesProcessed: 0, + totalNodes, + currentBatch: 0, + totalBatches, + }); + + for (let batchIndex = 0; batchIndex < totalBatches; batchIndex++) { + const start = batchIndex * batchSize; + const end = Math.min(start + batchSize, totalNodes); + const batch = nodes.slice(start, end); + + // Generate texts for this batch + const texts = generateBatchEmbeddingTexts(batch, finalConfig); + + // Embed the batch + const embeddings = await embedBatch(texts); + + // Update KuzuDB with embeddings + const updates = batch.map((node, i) => ({ + id: node.id, + embedding: embeddingToArray(embeddings[i]), + })); + + await batchInsertEmbeddings(executeWithReusedStatement, updates); + + processedNodes += batch.length; + + // Report progress (20-90% for embedding phase) + const embeddingProgress = 20 + ((processedNodes / totalNodes) * 70); + onProgress({ + phase: 'embedding', + percent: Math.round(embeddingProgress), + nodesProcessed: processedNodes, + totalNodes, + currentBatch: batchIndex + 1, + totalBatches, + }); + } + + // Phase 4: Create vector index + onProgress({ + phase: 'indexing', + percent: 90, + nodesProcessed: totalNodes, + totalNodes, + }); + + if (import.meta.env.DEV) { + console.log('๐Ÿ“‡ Creating vector index...'); + } + + await createVectorIndex(executeQuery); + + // Complete + onProgress({ + phase: 'ready', + percent: 100, + nodesProcessed: totalNodes, + totalNodes, + }); + + if (import.meta.env.DEV) { + console.log('โœ… Embedding pipeline complete!'); + } + } catch (error) { + const errorMessage = error instanceof Error ? error.message : 'Unknown error'; + + if (import.meta.env.DEV) { + console.error('โŒ Embedding pipeline error:', error); + } + + onProgress({ + phase: 'error', + percent: 0, + error: errorMessage, + }); + + throw error; + } +}; + +/** + * Perform semantic search using the vector index + * + * Uses separate CodeEmbedding table and JOINs with CodeNode for metadata + * + * @param executeQuery - Function to execute Cypher queries + * @param query - Search query text + * @param k - Number of results to return (default: 10) + * @param maxDistance - Maximum distance threshold (default: 0.5) + * @returns Array of search results ordered by relevance + */ +export const semanticSearch = async ( + executeQuery: (cypher: string) => Promise, + query: string, + k: number = 10, + maxDistance: number = 0.5 +): Promise => { + if (!isEmbedderReady()) { + throw new Error('Embedding model not initialized. Run embedding pipeline first.'); + } + + // Embed the query + const queryEmbedding = await embedText(query); + const queryVec = embeddingToArray(queryEmbedding); + const queryVecStr = `[${queryVec.join(',')}]`; + + // Query the vector index on CodeEmbedding, then JOIN with CodeNode for metadata + const cypher = ` + CALL QUERY_VECTOR_INDEX('CodeEmbedding', 'code_embedding_idx', + CAST(${queryVecStr} AS FLOAT[384]), ${k}) + YIELD node AS emb, distance + WHERE distance < ${maxDistance} + MATCH (n:CodeNode {id: emb.nodeId}) + RETURN n.id AS nodeId, n.name AS name, n.label AS label, + n.filePath AS filePath, distance, + n.startLine AS startLine, n.endLine AS endLine + ORDER BY distance + `; + + const rows = await executeQuery(cypher); + + return rows.map(row => ({ + nodeId: row.nodeId ?? row[0], + name: row.name ?? row[1], + label: row.label ?? row[2], + filePath: row.filePath ?? row[3], + distance: row.distance ?? row[4], + startLine: row.startLine ?? row[5], + endLine: row.endLine ?? row[6], + })); +}; + +/** + * Semantic search with graph expansion + * Finds similar nodes AND their connections + * + * Uses separate CodeEmbedding table and JOINs with CodeNode + * + * @param executeQuery - Function to execute Cypher queries + * @param query - Search query text + * @param k - Number of initial results + * @param hops - Number of hops to expand (default: 2) + * @returns Search results with connected nodes + */ +export const semanticSearchWithContext = async ( + executeQuery: (cypher: string) => Promise, + query: string, + k: number = 5, + hops: number = 2 +): Promise => { + if (!isEmbedderReady()) { + throw new Error('Embedding model not initialized. Run embedding pipeline first.'); + } + + // Embed the query + const queryEmbedding = await embedText(query); + const queryVec = embeddingToArray(queryEmbedding); + const queryVecStr = `[${queryVec.join(',')}]`; + + // Query embedding table, JOIN with CodeNode, then expand graph + const cypher = ` + CALL QUERY_VECTOR_INDEX('CodeEmbedding', 'code_embedding_idx', + CAST(${queryVecStr} AS FLOAT[384]), ${k}) + YIELD node AS emb, distance + WHERE distance < 0.5 + MATCH (match:CodeNode {id: emb.nodeId}) + MATCH (match)-[r:CodeRelation*1..${hops}]-(connected:CodeNode) + RETURN match.id AS matchId, match.name AS matchName, match.label AS matchLabel, + match.filePath AS matchPath, distance, + collect(DISTINCT { + id: connected.id, + name: connected.name, + label: connected.label, + relationType: [rel IN r | rel.type] + }) AS connections + ORDER BY distance + `; + + return executeQuery(cypher); +}; + diff --git a/src/core/embeddings/index.ts b/src/core/embeddings/index.ts new file mode 100644 index 0000000000..5d384c8d53 --- /dev/null +++ b/src/core/embeddings/index.ts @@ -0,0 +1,11 @@ +/** + * Embeddings Module + * + * Re-exports for the embedding pipeline system. + */ + +export * from './types'; +export * from './embedder'; +export * from './text-generator'; +export * from './embedding-pipeline'; + diff --git a/src/core/embeddings/text-generator.ts b/src/core/embeddings/text-generator.ts new file mode 100644 index 0000000000..36594e1a81 --- /dev/null +++ b/src/core/embeddings/text-generator.ts @@ -0,0 +1,235 @@ +/** + * Text Generator Module + * + * Pure functions to generate embedding text from code nodes. + * Combines node metadata with code snippets for semantic matching. + */ + +import type { EmbeddableNode, EmbeddingConfig } from './types'; +import { DEFAULT_EMBEDDING_CONFIG } from './types'; + +/** + * Extract the filename from a file path + */ +const getFileName = (filePath: string): string => { + const parts = filePath.split('/'); + return parts[parts.length - 1] || filePath; +}; + +/** + * Extract the directory path from a file path + */ +const getDirectory = (filePath: string): string => { + const parts = filePath.split('/'); + parts.pop(); + return parts.join('/') || ''; +}; + +/** + * Truncate content to max length, preserving word boundaries + */ +const truncateContent = (content: string, maxLength: number): string => { + if (content.length <= maxLength) { + return content; + } + + // Find last space before maxLength to avoid cutting words + const truncated = content.slice(0, maxLength); + const lastSpace = truncated.lastIndexOf(' '); + + if (lastSpace > maxLength * 0.8) { + return truncated.slice(0, lastSpace) + '...'; + } + + return truncated + '...'; +}; + +/** + * Clean code content for embedding + * Removes excessive whitespace while preserving structure + */ +const cleanContent = (content: string): string => { + return content + // Normalize line endings + .replace(/\r\n/g, '\n') + // Remove excessive blank lines (more than 2) + .replace(/\n{3,}/g, '\n\n') + // Trim each line + .split('\n') + .map(line => line.trimEnd()) + .join('\n') + .trim(); +}; + +/** + * Generate embedding text for a Function node + */ +const generateFunctionText = ( + node: EmbeddableNode, + maxSnippetLength: number +): string => { + const parts: string[] = [ + `Function: ${node.name}`, + `File: ${getFileName(node.filePath)}`, + ]; + + const dir = getDirectory(node.filePath); + if (dir) { + parts.push(`Directory: ${dir}`); + } + + if (node.content) { + const cleanedContent = cleanContent(node.content); + const snippet = truncateContent(cleanedContent, maxSnippetLength); + parts.push('', snippet); + } + + return parts.join('\n'); +}; + +/** + * Generate embedding text for a Class node + */ +const generateClassText = ( + node: EmbeddableNode, + maxSnippetLength: number +): string => { + const parts: string[] = [ + `Class: ${node.name}`, + `File: ${getFileName(node.filePath)}`, + ]; + + const dir = getDirectory(node.filePath); + if (dir) { + parts.push(`Directory: ${dir}`); + } + + if (node.content) { + const cleanedContent = cleanContent(node.content); + const snippet = truncateContent(cleanedContent, maxSnippetLength); + parts.push('', snippet); + } + + return parts.join('\n'); +}; + +/** + * Generate embedding text for a Method node + */ +const generateMethodText = ( + node: EmbeddableNode, + maxSnippetLength: number +): string => { + const parts: string[] = [ + `Method: ${node.name}`, + `File: ${getFileName(node.filePath)}`, + ]; + + const dir = getDirectory(node.filePath); + if (dir) { + parts.push(`Directory: ${dir}`); + } + + if (node.content) { + const cleanedContent = cleanContent(node.content); + const snippet = truncateContent(cleanedContent, maxSnippetLength); + parts.push('', snippet); + } + + return parts.join('\n'); +}; + +/** + * Generate embedding text for an Interface node + */ +const generateInterfaceText = ( + node: EmbeddableNode, + maxSnippetLength: number +): string => { + const parts: string[] = [ + `Interface: ${node.name}`, + `File: ${getFileName(node.filePath)}`, + ]; + + const dir = getDirectory(node.filePath); + if (dir) { + parts.push(`Directory: ${dir}`); + } + + if (node.content) { + const cleanedContent = cleanContent(node.content); + const snippet = truncateContent(cleanedContent, maxSnippetLength); + parts.push('', snippet); + } + + return parts.join('\n'); +}; + +/** + * Generate embedding text for a File node + * Uses file name and first N characters of content + */ +const generateFileText = ( + node: EmbeddableNode, + maxSnippetLength: number +): string => { + const parts: string[] = [ + `File: ${node.name}`, + `Path: ${node.filePath}`, + ]; + + if (node.content) { + const cleanedContent = cleanContent(node.content); + // For files, use a shorter snippet since they can be very long + const snippet = truncateContent(cleanedContent, Math.min(maxSnippetLength, 300)); + parts.push('', snippet); + } + + return parts.join('\n'); +}; + +/** + * Generate embedding text for any embeddable node + * Dispatches to the appropriate generator based on node label + * + * @param node - The node to generate text for + * @param config - Optional configuration for max snippet length + * @returns Text suitable for embedding + */ +export const generateEmbeddingText = ( + node: EmbeddableNode, + config: Partial = {} +): string => { + const maxSnippetLength = config.maxSnippetLength ?? DEFAULT_EMBEDDING_CONFIG.maxSnippetLength; + + switch (node.label) { + case 'Function': + return generateFunctionText(node, maxSnippetLength); + case 'Class': + return generateClassText(node, maxSnippetLength); + case 'Method': + return generateMethodText(node, maxSnippetLength); + case 'Interface': + return generateInterfaceText(node, maxSnippetLength); + case 'File': + return generateFileText(node, maxSnippetLength); + default: + // Fallback for any other embeddable type + return `${node.label}: ${node.name}\nPath: ${node.filePath}`; + } +}; + +/** + * Generate embedding texts for a batch of nodes + * + * @param nodes - Array of nodes to generate text for + * @param config - Optional configuration + * @returns Array of texts in the same order as input nodes + */ +export const generateBatchEmbeddingTexts = ( + nodes: EmbeddableNode[], + config: Partial = {} +): string[] => { + return nodes.map(node => generateEmbeddingText(node, config)); +}; + diff --git a/src/core/embeddings/types.ts b/src/core/embeddings/types.ts new file mode 100644 index 0000000000..e4a04222b2 --- /dev/null +++ b/src/core/embeddings/types.ts @@ -0,0 +1,117 @@ +/** + * Embedding Pipeline Types + * + * Type definitions for the embedding generation and semantic search system. + */ + +/** + * Node labels that should be embedded for semantic search + * These are code elements that benefit from semantic matching + */ +export const EMBEDDABLE_LABELS = [ + 'Function', + 'Class', + 'Method', + 'Interface', + 'File', +] as const; + +export type EmbeddableLabel = typeof EMBEDDABLE_LABELS[number]; + +/** + * Check if a label should be embedded + */ +export const isEmbeddableLabel = (label: string): label is EmbeddableLabel => + EMBEDDABLE_LABELS.includes(label as EmbeddableLabel); + +/** + * Embedding pipeline phases + */ +export type EmbeddingPhase = + | 'idle' + | 'loading-model' + | 'embedding' + | 'indexing' + | 'ready' + | 'error'; + +/** + * Progress information for the embedding pipeline + */ +export interface EmbeddingProgress { + phase: EmbeddingPhase; + percent: number; + modelDownloadPercent?: number; + nodesProcessed?: number; + totalNodes?: number; + currentBatch?: number; + totalBatches?: number; + error?: string; +} + +/** + * Configuration for the embedding pipeline + */ +export interface EmbeddingConfig { + /** Model identifier for transformers.js */ + modelId: string; + /** Number of nodes to embed in each batch */ + batchSize: number; + /** Embedding vector dimensions */ + dimensions: number; + /** Device to use for inference: 'webgpu' for GPU acceleration, 'wasm' for WASM-based CPU */ + device: 'webgpu' | 'wasm'; + /** Maximum characters of code snippet to include */ + maxSnippetLength: number; +} + +/** + * Default embedding configuration + * Uses snowflake-arctic-embed-xs for browser efficiency + * Tries WebGPU first (fast), user can choose WASM fallback if unavailable + */ +export const DEFAULT_EMBEDDING_CONFIG: EmbeddingConfig = { + modelId: 'Snowflake/snowflake-arctic-embed-xs', + batchSize: 16, + dimensions: 384, + device: 'webgpu', // WebGPU preferred, WASM fallback available if user chooses + maxSnippetLength: 500, +}; + +/** + * Result from semantic search + */ +export interface SemanticSearchResult { + nodeId: string; + name: string; + label: string; + filePath: string; + distance: number; + startLine?: number; + endLine?: number; +} + +/** + * Node data for embedding (minimal structure from KuzuDB query) + */ +export interface EmbeddableNode { + id: string; + name: string; + label: string; + filePath: string; + content: string; + startLine?: number; + endLine?: number; +} + +/** + * Model download progress from transformers.js + */ +export interface ModelProgress { + status: 'initiate' | 'download' | 'progress' | 'done' | 'ready'; + file?: string; + progress?: number; + loaded?: number; + total?: number; +} + diff --git a/src/core/ingestion/ast-cache.ts b/src/core/ingestion/ast-cache.ts index cd2244eefd..61775416a3 100644 --- a/src/core/ingestion/ast-cache.ts +++ b/src/core/ingestion/ast-cache.ts @@ -1,7 +1,7 @@ import { LRUCache } from 'lru-cache'; import Parser from 'web-tree-sitter'; -// Define the interface for our Cache +// Define the interface for the Cache export interface ASTCache { get: (filePath: string) => Parser.Tree | undefined; set: (filePath: string, tree: Parser.Tree) => void; diff --git a/src/core/ingestion/call-processor.ts b/src/core/ingestion/call-processor.ts index 024e89599a..0706767aa5 100644 --- a/src/core/ingestion/call-processor.ts +++ b/src/core/ingestion/call-processor.ts @@ -90,7 +90,7 @@ export const processCalls = async ( }); }); - // Cleanup if we re-parsed + // Cleanup if re-parsed if (wasReparsed) { tree.delete(); } @@ -133,7 +133,7 @@ const resolveCallTarget = ( /** * Filter out common built-in functions and noise - * that we don't want to track as calls + * that shouldn't be tracked as calls */ const isBuiltInOrNoise = (name: string): boolean => { const builtIns = new Set([ diff --git a/src/core/ingestion/import-processor.ts b/src/core/ingestion/import-processor.ts index 024fd0f9b1..77fba9a801 100644 --- a/src/core/ingestion/import-processor.ts +++ b/src/core/ingestion/import-processor.ts @@ -103,7 +103,7 @@ export const processImports = async ( // Clean path (remove quotes) const rawImportPath = sourceNode.text.replace(/['"]/g, ''); - // Resolve to actual file in our system + // Resolve to actual file in the system const resolvedPath = resolveImportPath(file.path, rawImportPath, allFilePaths); if (resolvedPath) { @@ -129,7 +129,7 @@ export const processImports = async ( } }); - // If we re-parsed just for this, delete the tree to save memory + // If re-parsed just for this, delete the tree to save memory if (wasReparsed) { tree.delete(); } diff --git a/src/core/ingestion/symbol-table.ts b/src/core/ingestion/symbol-table.ts index 99e8ffdbfc..c8c35d56f9 100644 --- a/src/core/ingestion/symbol-table.ts +++ b/src/core/ingestion/symbol-table.ts @@ -23,7 +23,7 @@ export interface SymbolTable { lookupFuzzy: (name: string) => SymbolDefinition[]; /** - * Debugging: See how many symbols we have tracked + * Debugging: See how many symbols are tracked */ getStats: () => { fileCount: number; globalSymbolCount: number }; diff --git a/src/core/kuzu/csv-generator.ts b/src/core/kuzu/csv-generator.ts index e3ee3328b3..a3bdea3ad9 100644 --- a/src/core/kuzu/csv-generator.ts +++ b/src/core/kuzu/csv-generator.ts @@ -1,7 +1,7 @@ /** * CSV Generator for KuzuDB * - * Converts our in-memory KnowledgeGraph into CSV format + * Converts the in-memory KnowledgeGraph into CSV format * for bulk loading into KuzuDB. * * RFC 4180 Compliant: @@ -149,6 +149,8 @@ const extractContent = ( * Headers: id,label,name,filePath,startLine,endLine,content * * All string fields are quoted for RFC 4180 compliance + * Note: embedding column is NOT included in CSV - it's populated later via UPDATE queries + * by the embedding pipeline after bulk load completes */ export const generateNodeCSV = ( graph: KnowledgeGraph, diff --git a/src/core/kuzu/kuzu-adapter.ts b/src/core/kuzu/kuzu-adapter.ts index a3d0a8f90d..f470031f27 100644 --- a/src/core/kuzu/kuzu-adapter.ts +++ b/src/core/kuzu/kuzu-adapter.ts @@ -8,7 +8,7 @@ */ import { KnowledgeGraph } from '../graph/types'; -import { NODE_SCHEMA, EDGE_SCHEMA, NODE_TABLE_NAME, EDGE_TABLE_NAME } from './schema'; +import { NODE_SCHEMA, EDGE_SCHEMA, EMBEDDING_SCHEMA, NODE_TABLE_NAME, EDGE_TABLE_NAME } from './schema'; import { generateNodeCSV, generateEdgeCSV } from './csv-generator'; // Holds the reference to the dynamically loaded module @@ -34,8 +34,11 @@ export const initKuzu = async () => { // 3. Initialize WASM await kuzu.init(); - // 4. Create Database - db = new kuzu.Database(':memory:'); + // 4. Create Database with 512MB buffer pool + // Larger buffer needed for embedding storage (6K+ nodes ร— 384 floats) + // Constructor: Database(path, bufferPoolSize, maxNumThreads, enableCompression, readOnly) + const BUFFER_POOL_SIZE = 512 * 1024 * 1024; // 512MB + db = new kuzu.Database(':memory:', BUFFER_POOL_SIZE); conn = new kuzu.Connection(db); if (import.meta.env.DEV) console.log('โœ… KuzuDB WASM Initialized'); @@ -44,6 +47,7 @@ export const initKuzu = async () => { try { await conn.query(NODE_SCHEMA); await conn.query(EDGE_SCHEMA); + await conn.query(EMBEDDING_SCHEMA); if (import.meta.env.DEV) console.log('โœ… KuzuDB Schema Created'); } catch { // Schema might already exist, skip @@ -84,9 +88,10 @@ export const loadGraphToKuzu = async ( await fs.writeFile(edgesPath, edgesCSV); - // Use HEADER=true because our CSV generator adds headers + // Use HEADER=true because the CSV generator adds headers // Use PARALLEL=false because content field has quoted newlines - await conn.query(`COPY ${NODE_TABLE_NAME} FROM "${nodesPath}" (HEADER=true, PARALLEL=false)`); + // Explicitly list columns since CSV doesn't include 'embedding' (populated later via UPDATE) + await conn.query(`COPY ${NODE_TABLE_NAME}(id, label, name, filePath, startLine, endLine, content) FROM "${nodesPath}" (HEADER=true, PARALLEL=false)`); await conn.query(`COPY ${EDGE_TABLE_NAME} FROM "${edgesPath}" (HEADER=true, PARALLEL=false)`); // Verify results @@ -189,3 +194,160 @@ export const closeKuzu = async (): Promise => { } kuzu = null; }; + +/** + * Execute a prepared statement with parameters + * @param cypher - Cypher query with $param placeholders + * @param params - Object mapping param names to values + * @returns Query results + */ +export const executePrepared = async ( + cypher: string, + params: Record +): Promise => { + if (!conn) { + await initKuzu(); + } + + try { + // Note: conn.prepare is async in kuzu-wasm + const stmt = await conn.prepare(cypher); + if (!stmt.isSuccess()) { + const errMsg = await stmt.getErrorMessage(); + throw new Error(`Prepare failed: ${errMsg}`); + } + + const result = await conn.execute(stmt, params); + + // Collect all rows + const rows: any[] = []; + while (await result.hasNext()) { + const row = await result.getNext(); + rows.push(row); + } + + await stmt.close(); + return rows; + } catch (error) { + if (import.meta.env.DEV) console.error('Prepared query failed:', error); + throw error; + } +}; + +/** + * Execute a prepared statement with multiple parameter sets in small sub-batches + * Recreates statement every SUB_BATCH_SIZE executions to allow memory cleanup + * @param cypher - Cypher query with $param placeholders + * @param paramsList - Array of parameter objects to execute + */ +export const executeWithReusedStatement = async ( + cypher: string, + paramsList: Array> +): Promise => { + if (!conn) { + await initKuzu(); + } + + if (paramsList.length === 0) return; + + // Small sub-batch to allow memory cleanup between statement recreations + const SUB_BATCH_SIZE = 4; + + for (let i = 0; i < paramsList.length; i += SUB_BATCH_SIZE) { + const subBatch = paramsList.slice(i, i + SUB_BATCH_SIZE); + + // Create fresh statement for each sub-batch + const stmt = await conn.prepare(cypher); + if (!stmt.isSuccess()) { + const errMsg = await stmt.getErrorMessage(); + throw new Error(`Prepare failed: ${errMsg}`); + } + + try { + for (const params of subBatch) { + await conn.execute(stmt, params); + } + } finally { + await stmt.close(); + } + + // Small delay to allow garbage collection between sub-batches + if (i + SUB_BATCH_SIZE < paramsList.length) { + await new Promise(r => setTimeout(r, 0)); + } + } +}; + +/** + * Test if array parameters work with prepared statements + * This is a diagnostic function to check KuzuDB WASM capabilities + */ +export const testArrayParams = async (): Promise<{ success: boolean; error?: string }> => { + if (!conn) { + await initKuzu(); + } + + try { + // Test with a simple array parameter + const testEmbedding = new Array(384).fill(0).map((_, i) => i / 384); + + // First, get any node ID to test with + const nodeResult = await conn.query(`MATCH (n:${NODE_TABLE_NAME}) RETURN n.id AS id LIMIT 1`); + const nodeRow = await nodeResult.getNext(); + + if (!nodeRow) { + return { success: false, error: 'No nodes found to test with' }; + } + + const testNodeId = nodeRow.id ?? nodeRow[0]; + + if (import.meta.env.DEV) { + console.log('๐Ÿงช Testing array params with node:', testNodeId); + console.log('๐Ÿงช Embedding sample (first 5):', testEmbedding.slice(0, 5)); + } + + // Try using prepared statement with array param + // Note: conn.prepare is async in kuzu-wasm + const cypher = `MATCH (n:${NODE_TABLE_NAME} {id: $nodeId}) SET n.embedding = $embedding`; + const stmt = await conn.prepare(cypher); + + // In async API, isSuccess() returns boolean directly + if (!stmt.isSuccess()) { + const errMsg = await stmt.getErrorMessage(); + return { success: false, error: `Prepare failed: ${errMsg}` }; + } + + // Execute with array parameter + await conn.execute(stmt, { + nodeId: testNodeId, + embedding: testEmbedding, + }); + + await stmt.close(); + + // Verify it was stored + const verifyResult = await conn.query( + `MATCH (n:${NODE_TABLE_NAME} {id: '${testNodeId}'}) RETURN n.embedding AS emb` + ); + const verifyRow = await verifyResult.getNext(); + const storedEmb = verifyRow?.emb ?? verifyRow?.[0]; + + if (storedEmb && Array.isArray(storedEmb) && storedEmb.length === 384) { + if (import.meta.env.DEV) { + console.log('โœ… Array params WORK! Stored embedding length:', storedEmb.length); + } + return { success: true }; + } else { + return { + success: false, + error: `Embedding not stored correctly. Got: ${typeof storedEmb}, length: ${storedEmb?.length}` + }; + } + } catch (error) { + const errorMsg = error instanceof Error ? error.message : String(error); + if (import.meta.env.DEV) { + console.error('โŒ Array params test failed:', errorMsg); + } + return { success: false, error: errorMsg }; + } +}; diff --git a/src/core/kuzu/schema.ts b/src/core/kuzu/schema.ts index 1285008cc7..b849136add 100644 --- a/src/core/kuzu/schema.ts +++ b/src/core/kuzu/schema.ts @@ -10,10 +10,12 @@ export const NODE_TABLE_NAME = 'CodeNode'; export const EDGE_TABLE_NAME = 'CodeRelation'; +export const EMBEDDING_TABLE_NAME = 'CodeEmbedding'; /** * Node table schema * Stores all code elements: Files, Functions, Classes, etc. + * Note: Embeddings stored separately to avoid copy-on-write overhead */ export const NODE_SCHEMA = ` CREATE NODE TABLE ${NODE_TABLE_NAME} ( @@ -27,6 +29,26 @@ CREATE NODE TABLE ${NODE_TABLE_NAME} ( PRIMARY KEY (id) )`; +/** + * Separate embedding table - lightweight structure for vector storage + * This avoids copy-on-write issues when storing embeddings + * (UPDATEing nodes with large content fields would copy entire node) + */ +export const EMBEDDING_SCHEMA = ` +CREATE NODE TABLE ${EMBEDDING_TABLE_NAME} ( + nodeId STRING, + embedding FLOAT[384], + PRIMARY KEY (nodeId) +)`; + +/** + * Create vector index for semantic search + * Uses HNSW (Hierarchical Navigable Small World) algorithm with cosine similarity + */ +export const CREATE_VECTOR_INDEX_QUERY = ` +CALL CREATE_VECTOR_INDEX('${EMBEDDING_TABLE_NAME}', 'code_embedding_idx', 'embedding', metric := 'cosine') +`; + /** * Edge table schema * Stores all relationships: CALLS, IMPORTS, CONTAINS, DEFINES @@ -40,5 +62,5 @@ CREATE REL TABLE ${EDGE_TABLE_NAME} ( /** * All schema creation queries in order */ -export const SCHEMA_QUERIES = [NODE_SCHEMA, EDGE_SCHEMA]; +export const SCHEMA_QUERIES = [NODE_SCHEMA, EDGE_SCHEMA, EMBEDDING_SCHEMA]; diff --git a/src/core/tree-sitter/parser-loader.ts b/src/core/tree-sitter/parser-loader.ts index 4a45f69005..e6092f8b6d 100644 --- a/src/core/tree-sitter/parser-loader.ts +++ b/src/core/tree-sitter/parser-loader.ts @@ -3,7 +3,7 @@ import { SupportedLanguages } from '../../config/supported-languages'; let parser: Parser | null = null; -// Cache the compiled Language objects so we never fetch/compile twice +// Cache the compiled Language objects to avoid fetching/compiling twice const languageCache = new Map(); export const loadParser = async (): Promise => { diff --git a/src/hooks/useAppState.tsx b/src/hooks/useAppState.tsx index 02c97a7bb1..6336e7a47c 100644 --- a/src/hooks/useAppState.tsx +++ b/src/hooks/useAppState.tsx @@ -6,9 +6,11 @@ import { createKnowledgeGraph } from '../core/graph/graph'; import { DEFAULT_VISIBLE_LABELS } from '../lib/constants'; import type { IngestionWorkerApi } from '../workers/ingestion.worker'; import type { FileEntry } from '../services/zip'; +import type { EmbeddingProgress, SemanticSearchResult } from '../core/embeddings/types'; export type ViewMode = 'onboarding' | 'loading' | 'exploring'; export type RightPanelTab = 'code' | 'chat'; +export type EmbeddingStatus = 'idle' | 'loading' | 'embedding' | 'indexing' | 'ready' | 'error'; export interface QueryResult { rows: Record[]; @@ -67,6 +69,19 @@ interface AppState { runPipelineFromFiles: (files: FileEntry[], onProgress: (p: PipelineProgress) => void) => Promise; runQuery: (cypher: string) => Promise; isDatabaseReady: () => Promise; + + // Embedding state + embeddingStatus: EmbeddingStatus; + embeddingProgress: EmbeddingProgress | null; + + // Embedding methods + startEmbeddings: (forceDevice?: 'webgpu' | 'wasm') => Promise; + semanticSearch: (query: string, k?: number) => Promise; + semanticSearchWithContext: (query: string, k?: number, hops?: number) => Promise; + isEmbeddingReady: boolean; + + // Debug/test methods + testArrayParams: () => Promise<{ success: boolean; error?: string }>; } const AppStateContext = createContext(null); @@ -116,6 +131,10 @@ export const AppStateProvider = ({ children }: { children: ReactNode }) => { // Project info const [projectName, setProjectName] = useState(''); + + // Embedding state + const [embeddingStatus, setEmbeddingStatus] = useState('idle'); + const [embeddingProgress, setEmbeddingProgress] = useState(null); // Worker (single instance shared across app) const workerRef = useRef(null); @@ -177,6 +196,76 @@ export const AppStateProvider = ({ children }: { children: ReactNode }) => { } }, []); + // Embedding methods + const startEmbeddings = useCallback(async (forceDevice?: 'webgpu' | 'wasm'): Promise => { + const api = apiRef.current; + if (!api) throw new Error('Worker not initialized'); + + setEmbeddingStatus('loading'); + setEmbeddingProgress(null); + + try { + const proxiedOnProgress = Comlink.proxy((progress: EmbeddingProgress) => { + setEmbeddingProgress(progress); + + // Update status based on phase + switch (progress.phase) { + case 'loading-model': + setEmbeddingStatus('loading'); + break; + case 'embedding': + setEmbeddingStatus('embedding'); + break; + case 'indexing': + setEmbeddingStatus('indexing'); + break; + case 'ready': + setEmbeddingStatus('ready'); + break; + case 'error': + setEmbeddingStatus('error'); + break; + } + }); + + await api.startEmbeddingPipeline(proxiedOnProgress, forceDevice); + } catch (error: any) { + // Check if it's WebGPU not available - let caller handle the dialog + if (error?.name === 'WebGPUNotAvailableError' || + error?.message?.includes('WebGPU not available')) { + setEmbeddingStatus('idle'); // Reset to idle so user can try again + } else { + setEmbeddingStatus('error'); + } + throw error; + } + }, []); + + const semanticSearch = useCallback(async ( + query: string, + k: number = 10 + ): Promise => { + const api = apiRef.current; + if (!api) throw new Error('Worker not initialized'); + return api.semanticSearch(query, k); + }, []); + + const semanticSearchWithContext = useCallback(async ( + query: string, + k: number = 5, + hops: number = 2 + ): Promise => { + const api = apiRef.current; + if (!api) throw new Error('Worker not initialized'); + return api.semanticSearchWithContext(query, k, hops); + }, []); + + const testArrayParams = useCallback(async (): Promise<{ success: boolean; error?: string }> => { + const api = apiRef.current; + if (!api) return { success: false, error: 'Worker not initialized' }; + return api.testArrayParams(); + }, []); + const toggleLabelVisibility = useCallback((label: NodeLabel) => { setVisibleLabels(prev => { if (prev.includes(label)) { @@ -219,6 +308,15 @@ export const AppStateProvider = ({ children }: { children: ReactNode }) => { runPipelineFromFiles, runQuery, isDatabaseReady, + // Embedding state and methods + embeddingStatus, + embeddingProgress, + startEmbeddings, + semanticSearch, + semanticSearchWithContext, + isEmbeddingReady: embeddingStatus === 'ready', + // Debug + testArrayParams, }; return ( diff --git a/src/hooks/useSigma.ts b/src/hooks/useSigma.ts index ae93848e87..e2e923dd90 100644 --- a/src/hooks/useSigma.ts +++ b/src/hooks/useSigma.ts @@ -71,7 +71,7 @@ interface UseSigmaReturn { refreshHighlights: () => void; } -// Noverlap for final cleanup - minimal since we start with good positions +// Noverlap for final cleanup - minimal since it starts with good positions const NOVERLAP_SETTINGS = { maxIterations: 20, // Reduced - less cleanup needed ratio: 1.1, diff --git a/src/lib/graph-adapter.ts b/src/lib/graph-adapter.ts index e7fc744e04..c23d6baa52 100644 --- a/src/lib/graph-adapter.ts +++ b/src/lib/graph-adapter.ts @@ -33,7 +33,7 @@ export interface SigmaEdgeAttributes { */ const getScaledNodeSize = (baseSize: number, nodeCount: number): number => { // Scale factor decreases as graph gets larger - // But we use a minimum that preserves relative differences + // But a minimum is used that preserves relative differences if (nodeCount > 50000) return Math.max(1, baseSize * 0.4); if (nodeCount > 20000) return Math.max(1.5, baseSize * 0.5); if (nodeCount > 5000) return Math.max(2, baseSize * 0.65); @@ -72,7 +72,7 @@ const getNodeMass = (nodeType: NodeLabel, nodeCount: number): number => { }; /** - * Converts our KnowledgeGraph to a graphology Graph for Sigma.js + * Converts the KnowledgeGraph to a graphology Graph for Sigma.js * Folders are positioned in a wide spread, children positioned NEAR their parents */ export const knowledgeGraphToGraphology = ( @@ -208,7 +208,7 @@ export const knowledgeGraphToGraphology = ( if (!visited.has(childId)) { visited.add(childId); addNodeWithPosition(childId); - queue.push(childId); // Add to queue so we process ITS children too + queue.push(childId); // Add to queue so its children are processed too } } } diff --git a/src/services/git-clone.ts b/src/services/git-clone.ts index 8dcec1b506..7924462521 100644 --- a/src/services/git-clone.ts +++ b/src/services/git-clone.ts @@ -17,11 +17,11 @@ const initFS = () => { return fsName; }; -// Use public proxy in development, our own proxy in production +// Use public proxy in development, a custom proxy in production const USE_OWN_PROXY = !import.meta.env.DEV; /** - * Custom HTTP client that uses our query-param based proxy in production + * Custom HTTP client that uses a query-param based proxy in production * isomorphic-git's default corsProxy appends URL as path, which doesn't work * well with Vercel's file-based routing. */ @@ -31,10 +31,10 @@ const createProxiedHttp = (): typeof http => { return http; } - // In production, wrap the HTTP client to use our proxy + // In production, wrap the HTTP client to use the custom proxy return { request: async (config) => { - // Rewrite the URL to go through our proxy + // Rewrite the URL to go through the proxy const proxyUrl = `/api/proxy?url=${encodeURIComponent(config.url)}`; // Call the original http.request with the proxied URL diff --git a/src/workers/ingestion.worker.ts b/src/workers/ingestion.worker.ts index 98bc910f41..1c6ccb86c6 100644 --- a/src/workers/ingestion.worker.ts +++ b/src/workers/ingestion.worker.ts @@ -2,6 +2,14 @@ import * as Comlink from 'comlink'; import { runIngestionPipeline, runPipelineFromFiles } from '../core/ingestion/pipeline'; import { PipelineProgress, SerializablePipelineResult, serializePipelineResult } from '../types/pipeline'; import { FileEntry } from '../services/zip'; +import { + runEmbeddingPipeline, + semanticSearch as doSemanticSearch, + semanticSearchWithContext as doSemanticSearchWithContext, + type EmbeddingProgressCallback, +} from '../core/embeddings/embedding-pipeline'; +import { isEmbedderReady, disposeEmbedder } from '../core/embeddings/embedder'; +import type { EmbeddingProgress, SemanticSearchResult } from '../core/embeddings/types'; // Lazy import for Kuzu to avoid breaking worker if SharedArrayBuffer unavailable let kuzuAdapter: typeof import('../core/kuzu/kuzu-adapter') | null = null; @@ -12,11 +20,15 @@ const getKuzuAdapter = async () => { return kuzuAdapter; }; +// Embedding state +let embeddingProgress: EmbeddingProgress | null = null; +let isEmbeddingComplete = false; + /** * Worker API exposed via Comlink * * Note: The onProgress callback is passed as a Comlink.proxy() from the main thread, - * allowing us to call it from the worker and have it execute on the main thread. + * allowing it to be called from the worker and have it execute on the main thread. */ const workerApi = { /** @@ -49,8 +61,8 @@ const workerApi = { await kuzu.loadGraphToKuzu(result.graph, result.fileContents); if (import.meta.env.DEV) { - const stats = await kuzu.getKuzuStats(); - console.log('KuzuDB loaded:', stats); + const stats = await kuzu.getKuzuStats(); + console.log('KuzuDB loaded:', stats); } } catch { // KuzuDB is optional - silently continue without it @@ -145,6 +157,134 @@ const workerApi = { // Convert to serializable format for transfer back to main thread return serializePipelineResult(result); }, + + // ============================================================ + // Embedding Pipeline Methods + // ============================================================ + + /** + * Start the embedding pipeline in the background + * Generates embeddings for all embeddable nodes and creates vector index + * @param onProgress - Proxied callback for embedding progress updates + * @param forceDevice - Force a specific device ('webgpu' or 'wasm') + */ + async startEmbeddingPipeline( + onProgress: (progress: EmbeddingProgress) => void, + forceDevice?: 'webgpu' | 'wasm' + ): Promise { + const kuzu = await getKuzuAdapter(); + if (!kuzu.isKuzuReady()) { + throw new Error('Database not ready. Please load a repository first.'); + } + + // Reset state + embeddingProgress = null; + isEmbeddingComplete = false; + + const progressCallback: EmbeddingProgressCallback = (progress) => { + embeddingProgress = progress; + if (progress.phase === 'ready') { + isEmbeddingComplete = true; + } + onProgress(progress); + }; + + await runEmbeddingPipeline( + kuzu.executeQuery, + kuzu.executeWithReusedStatement, + progressCallback, + forceDevice ? { device: forceDevice } : {} + ); + }, + + /** + * Perform semantic search on the codebase + * @param query - Natural language search query + * @param k - Number of results to return (default: 10) + * @param maxDistance - Maximum distance threshold (default: 0.5) + * @returns Array of search results ordered by relevance + */ + async semanticSearch( + query: string, + k: number = 10, + maxDistance: number = 0.5 + ): Promise { + const kuzu = await getKuzuAdapter(); + if (!kuzu.isKuzuReady()) { + throw new Error('Database not ready. Please load a repository first.'); + } + if (!isEmbeddingComplete) { + throw new Error('Embeddings not ready. Please wait for embedding pipeline to complete.'); + } + + return doSemanticSearch(kuzu.executeQuery, query, k, maxDistance); + }, + + /** + * Perform semantic search with graph expansion + * Finds similar nodes AND their connections + * @param query - Natural language search query + * @param k - Number of initial results (default: 5) + * @param hops - Number of graph hops to expand (default: 2) + * @returns Search results with connected nodes + */ + async semanticSearchWithContext( + query: string, + k: number = 5, + hops: number = 2 + ): Promise { + const kuzu = await getKuzuAdapter(); + if (!kuzu.isKuzuReady()) { + throw new Error('Database not ready. Please load a repository first.'); + } + if (!isEmbeddingComplete) { + throw new Error('Embeddings not ready. Please wait for embedding pipeline to complete.'); + } + + return doSemanticSearchWithContext(kuzu.executeQuery, query, k, hops); + }, + + /** + * Check if the embedding model is loaded and ready + */ + isEmbeddingModelReady(): boolean { + return isEmbedderReady(); + }, + + /** + * Check if embeddings are fully generated and indexed + */ + isEmbeddingComplete(): boolean { + return isEmbeddingComplete; + }, + + /** + * Get current embedding progress + */ + getEmbeddingProgress(): EmbeddingProgress | null { + return embeddingProgress; + }, + + /** + * Cleanup embedding model resources + */ + async disposeEmbeddingModel(): Promise { + await disposeEmbedder(); + isEmbeddingComplete = false; + embeddingProgress = null; + }, + + /** + * Test if KuzuDB supports array parameters in prepared statements + * This is a diagnostic function + */ + async testArrayParams(): Promise<{ success: boolean; error?: string }> { + const kuzu = await getKuzuAdapter(); + if (!kuzu.isKuzuReady()) { + return { success: false, error: 'Database not ready' }; + } + return kuzu.testArrayParams(); + }, }; // Expose the worker API to the main thread diff --git a/tsconfig.app.json b/tsconfig.app.json index ad93a88e21..f429ad3ea2 100644 --- a/tsconfig.app.json +++ b/tsconfig.app.json @@ -13,6 +13,7 @@ "esModuleInterop": true, "allowSyntheticDefaultImports": true, "forceConsistentCasingInFileNames": true, + "skipLibCheck": true, "baseUrl": "./", "paths": { "@/*": ["./src/*"]