Skip to content

Conversation

@EwanTauran
Copy link

@EwanTauran EwanTauran commented Oct 16, 2025

Adds Airweave integration to Sim, enabling agents to search across 30+ connected data sources including Stripe, GitHub, Notion, Slack, HubSpot, Zendesk, and more through a unified search API.

Airweave makes any app searchable by syncing data from various sources with minimal configuration. This integration allows Sim workflows to query internal company data, customer information, and business metrics from all connected sources in a single search.

What is Airweave?

Airweave is an open-source platform that provides unified search across multiple business applications. It:

  • Connects to 30+ data sources: Stripe, GitHub, Notion, Slack, HubSpot, Zendesk, Linear, Jira, and more
  • Syncs data automatically: Keeps internal knowledge up-to-date with incremental updates
  • Semantic & keyword search: Uses vector embeddings for intelligent search
  • Multi-tenant architecture: Supports OAuth2 and API key authentication
  • AI-powered summaries: Can return raw results or AI-generated answers

Why This Integration Matters

Currently, Sim workflows that need to access company data require:

  • Multiple tool blocks (one per data source)
  • Complex workflow logic to aggregate results
  • Separate API keys and configurations for each service

With Airweave, agents can:

  • Search across all data sources with a single query
  • Get unified, ranked results by relevance
  • Access internal knowledge without knowing which system stores what
  • Receive AI-generated summaries for faster insights

Implementation Details

Files Added

Tools:

  • tools/airweave/types.ts - TypeScript type definitions for Airweave API
  • tools/airweave/search.ts - Search tool implementation with ToolConfig
  • tools/airweave/index.ts - Tool exports

Block:

  • blocks/blocks/airweave.ts - Block configuration with UI elements

Files Modified

  • components/icons.tsx - Added AirweaveIcon (layered stack icon)
  • tools/registry.ts - Registered airweave_search tool
  • blocks/registry.ts - Registered airweave block

Tool Configuration

Tool ID: airweave_search
Block Type: airweave
Category: Tools
Authentication: API Key

Parameters:

  • collectionId (string, required, user-only): The Airweave collection to search
  • query (string, required, user-or-llm): Search query text
  • limit (number, optional, user-only): Maximum results (1-100, default: 10)
  • offset (number, optional, user-only): Pagination offset (default: 0)
  • responseType (string, optional, user-only): 'raw' for results or 'completion' for AI summary
  • recencyBias (number, optional, user-only): Time-weighted ranking (0.0-1.0, default: 0.0)
  • apiKey (string, required, user-only): Airweave API key

Outputs:

  • status (string): Search operation status (success, no_results, no_relevant_results)
  • results (array): Search results with content, metadata, and relevance scores
  • completion (string, optional): AI-generated answer when using completion mode

Block Features

  • Visual Design: Purple background (#8B5CF6) with layered stack icon
  • Response Modes: Toggle between raw results and AI-generated summaries
  • Recency Control: Slider to adjust time-weighted search ranking
  • Pagination: Offset and limit controls for large result sets
  • Error Handling: Clear feedback for API errors and empty results

Usage Examples

As Standalone Block

Search for customer feedback across all connected platforms:

blocks:
  - id: airweave1
    type: airweave
    params:
      collectionId: "customer-data"
      query: "complaints about billing in the last week"
      responseType: "raw"
      limit: 20
      recencyBias: 0.8
      apiKey: <env.AIRWEAVE_API_KEY>

As Agent Tool

Enable agents to search company knowledge automatically:

blocks:
  - id: agent1
    type: agent
    params:
      model: "openai/gpt-4o"
      systemPrompt: "You are a customer support agent with access to all company data via Airweave."
      tools:
        - type: airweave
          params:
            collectionId: "company-knowledge"
            apiKey: <env.AIRWEAVE_API_KEY>

Example interaction:

User: "Has anyone complained about payment failures recently?"
Agent: [Uses Airweave to search across Slack, Zendesk, Stripe, GitHub issues]
Agent: "Yes, I found 3 recent complaints about payment failures..."

Real-World Scenario: Multi-Source Customer Research

blocks:
  - id: start
    type: start

  - id: agent1
    type: agent
    params:
      model: "openai/gpt-4o"
      systemPrompt: "Research customer information comprehensively"
      userPrompt: <start.input>
      tools:
        - type: airweave
          params:
            collectionId: "customer-360"
            apiKey: <env.AIRWEAVE_API_KEY>
        - type: exa  # For web research
          params:
            apiKey: <env.EXA_API_KEY>

  - id: response1
    type: response
    params:
      content: <agent1.content>

What the agent can do:

  • Search Stripe for payment history
  • Check GitHub for reported issues
  • Review Slack conversations about the customer
  • Look up Zendesk support tickets
  • Access Notion documentation they've requested
  • Cross-reference with web search results

All with a single query like: "What's the history with Acme Corp?"


Integration Benefits

For Workflows

  • Simplified data access: One block instead of multiple API integrations
  • Unified search: Query across all sources with semantic understanding
  • Faster development: No need to learn each data source's API
  • Cost effective: Single API key, single sync process

For Agents

  • Better context: Access to complete company knowledge
  • Smarter decisions: Can find relevant data without knowing where it lives
  • Flexible queries: Natural language search across structured and unstructured data
  • Time-aware: Recency bias helps surface latest information

For Users

  • Easy setup: Connect data sources once in Airweave, use everywhere
  • Privacy control: Airweave handles OAuth and permissions
  • Live data: Automatic syncing keeps results current
  • AI summaries: Get answers, not just search results

Technical Compliance

Follows SimStudio Guidelines:

  • Proper naming conventions (AirweaveBlock, AirweaveIcon, airweave_search)
  • Parameter visibility system correctly implemented
  • Tool ID format: {provider}_{tool_name}
  • ToolConfig interface properly implemented

Code Quality:

  • Zero linter errors
  • Full TypeScript type safety
  • Comprehensive parameter documentation
  • Error handling for API failures

Documentation Ready:

  • JSDoc comments on all parameters
  • Clear descriptions for user-facing fields
  • Links to Airweave docs for setup guidance

Setup Requirements

For Users

  1. Create Airweave account at https://app.airweave.ai
  2. Get API key from Airweave dashboard
  3. Create collection and connect data sources (Stripe, Slack, Notion, etc.)
  4. Copy collection ID from Airweave
  5. Use in Sim with the Airweave block or as an agent tool

For Developers

No additional dependencies required. All necessary types are included.


Testing Checklist

  • Follows all SimStudio tool development guidelines
  • Proper naming conventions enforced
  • Parameter visibility correctly configured
  • TypeScript types fully defined
  • Tool and block properly registered
  • Zero linter errors
  • Tested in CI/CD environment (awaiting PR checks)
  • Documentation generated via scripts/generate-docs.sh (post-merge potentially? wdyt sim team)

Screenshots

Will be added after successful CI/CD run and local testing in clean environment


Related Resources


Breaking Changes

None - this is a new integration with no impact on existing functionality.


Future Enhancements

Potential follow-ups (probably not in this PR):

  • Advanced search tool with filtering by data source
  • List collections tool for dynamic collection selection
  • Trigger sync tool for manual data refresh
  • OAuth integration for user-level authentication
  • Bulk search tool for batch queries

Addresses issue #1657

waleedlatif1 and others added 6 commits October 11, 2025 22:23
…oai#1608)

* improvement(performance): remove unused source/target indices, add index on snapshot id (simstudioai#1603)

* fix(blog): rename building to blogs with redirect (simstudioai#1604)

* improvement(privacy-policy): updated privacy policy for google (simstudioai#1602)

* updated privacy policy for google

* update terms, privacy, and emails to incl address and update verbiage

* feat(guardrails): added guardrails block/tools and docs (simstudioai#1605)

* Adding guardrails block

* ack PR comments

* cleanup checkbox in dark mode

* cleanup

* fix supabase tools

* fix(inference-billing): fix inference billing when stream is true via API, add drag-and-drop functionality to deployed chat (simstudioai#1606)

* fix(inference): fix inference billing when stream is true via API

* add drag-and-drop to deployed chat

* feat(mistal): added mistral as a provider, updated model prices (simstudioai#1607)

* feat(mistal): added mistral as a provider, updated model prices

* remove the ability for a block to reference its own outluts

* fixed order of responses for guardrails block

* feat(versions): added the ability to rename deployment versions (simstudioai#1610)

* fix(vulns): fix various vulnerabilities and enhanced code security (simstudioai#1611)

* fix(vulns): fix SSRF vulnerabilities

* cleanup

* cleanup

* regen docs

* remove unused deps

* fix failing tests

* cleanup

* update deps

* regen bun lock
@vercel
Copy link

vercel bot commented Oct 16, 2025

@EwanTauran is attempting to deploy a commit to the Sim Team on Vercel.

A member of the Team first needs to authorize it.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Summary

This PR adds Airweave search tool integration to enable unified search across 30+ connected data sources (Stripe, GitHub, Notion, Slack, etc.). The implementation follows Sim's tool development patterns with proper TypeScript types, parameter visibility, and registry registration.

Key Changes:

  • Added tool implementation (tools/airweave/) with search API integration
  • Created block configuration (blocks/blocks/airweave.ts) with UI controls
  • Registered tool and block in respective registries
  • Added layered stack icon for visual identity

Issues Found:

  • Missing offset subBlock in UI despite being declared in inputs (affects pagination UX)
  • Unrelated dependencies added (@opentelemetry/*, chalk, zustand) - may belong in separate PR or previous commit

Confidence Score: 4/5

  • Safe to merge with one logical fix needed for pagination UI completeness
  • Implementation follows Sim conventions correctly with proper types, parameter visibility, and tool structure. One logical issue exists where offset input is declared but missing from UI subBlocks. Dependency additions (OpenTelemetry, chalk, zustand) are unrelated to the PR scope and need verification. Core Airweave integration is solid and well-documented.
  • Pay attention to apps/sim/blocks/blocks/airweave.ts (missing offset subBlock) and apps/sim/package.json (unrelated dependencies)

Important Files Changed

File Analysis

Filename Score Overview
apps/sim/tools/airweave/types.ts 5/5 Clean type definitions for Airweave API - well-structured with proper TypeScript types
apps/sim/tools/airweave/search.ts 5/5 Proper tool implementation following Sim conventions, with correct parameter visibility and request structure
apps/sim/blocks/blocks/airweave.ts 4/5 Block configuration is well-structured, but missing offset subBlock UI despite declaring it in inputs
apps/sim/package.json 3/5 Added OpenTelemetry, chalk, and zustand dependencies - unclear if needed for Airweave integration

Sequence Diagram

sequenceDiagram
    participant User
    participant AirweaveBlock
    participant AirweaveSearchTool
    participant AirweaveAPI
    
    User->>AirweaveBlock: Configure search (collectionId, query, responseType)
    AirweaveBlock->>AirweaveSearchTool: Execute airweave_search tool
    AirweaveSearchTool->>AirweaveSearchTool: Build request body (limit, offset, recencyBias)
    AirweaveSearchTool->>AirweaveAPI: POST /v1/collections/{id}/search
    Note over AirweaveSearchTool,AirweaveAPI: Authorization: Bearer {apiKey}
    AirweaveAPI-->>AirweaveSearchTool: JSON response (status, results, completion)
    AirweaveSearchTool->>AirweaveSearchTool: Transform response
    AirweaveSearchTool-->>AirweaveBlock: ToolResponse (status, results[], completion?)
    AirweaveBlock-->>User: Display search results or AI summary
Loading

8 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

inputs: {
collectionId: { type: 'string', description: 'Airweave collection ID' },
query: { type: 'string', description: 'Search query' },
limit: { type: 'number', description: 'Maximum number of results' },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: offset input is declared but missing from subBlocks - users can't configure pagination offset in the UI

Suggested change
limit: { type: 'number', description: 'Maximum number of results' },
offset: { type: 'number', description: 'Pagination offset (default: 0)' },
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/sim/blocks/blocks/airweave.ts
Line: 84:84

Comment:
**logic:** `offset` input is declared but missing from `subBlocks` - users can't configure pagination offset in the UI

```suggestion
    offset: { type: 'number', description: 'Pagination offset (default: 0)' },
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +42 to +43
"@opentelemetry/sdk-trace-base": "2.1.0",
"@opentelemetry/sdk-trace-node": "2.1.0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Check if OpenTelemetry dependencies are needed for this PR - they appear unrelated to Airweave integration

Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/sim/package.json
Line: 42:43

Comment:
**style:** Check if OpenTelemetry dependencies are needed for this PR - they appear unrelated to Airweave integration

How can I resolve this? If you propose a fix, please make it concise.

"@types/three": "0.177.0",
"better-auth": "1.3.12",
"browser-image-compression": "^2.0.2",
"chalk": "5.6.2",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Check if chalk is needed for this PR - not used in Airweave integration files

Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/sim/package.json
Line: 70:70

Comment:
**style:** Check if `chalk` is needed for this PR - not used in Airweave integration files

How can I resolve this? If you propose a fix, please make it concise.

"uuid": "^11.1.0",
"xlsx": "0.18.5",
"zod": "^3.24.2"
"zod": "^3.24.2",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Check if zustand is needed for this PR - not used in Airweave integration files

Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/sim/package.json
Line: 122:122

Comment:
**style:** Check if `zustand` is needed for this PR - not used in Airweave integration files

How can I resolve this? If you propose a fix, please make it concise.

@waleedlatif1
Copy link
Collaborator

@EwanTauran can you rebase with staging, & run the generate-docs script from sim/package.json to auto-gen the tool docs. Thanks!!

EwanTauran and others added 5 commits October 17, 2025 15:22
…n gh action for docs (simstudioai#1652)

* fix(i18n): fix SDK and guardrails translation corruption

* re-enable i18n gh action
…e, and improve performance (simstudioai#1651)

* improvement(dashboard): cleanup execution dashboard UI, fix logs trace, and improve perforamnce

* cleanup

* cleaned up

* ack PR comments
@EwanTauran
Copy link
Author

rebased with staging and generated docs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants