Skip to content

Commit ee77dea

Browse files
waleedlatif1waleed
authored andcommitted
feat(guardrails): added guardrails block/tools and docs (#1605)
* Adding guardrails block * ack PR comments * cleanup checkbox in dark mode * cleanup * fix supabase tools
1 parent bba407b commit ee77dea

File tree

32 files changed

+2206
-24
lines changed

32 files changed

+2206
-24
lines changed
Lines changed: 251 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,251 @@
1+
---
2+
title: Guardrails
3+
---
4+
5+
import { Callout } from 'fumadocs-ui/components/callout'
6+
import { Step, Steps } from 'fumadocs-ui/components/steps'
7+
import { Tab, Tabs } from 'fumadocs-ui/components/tabs'
8+
import { Image } from '@/components/ui/image'
9+
import { Video } from '@/components/ui/video'
10+
11+
The Guardrails block validates and protects your AI workflows by checking content against multiple validation types. Ensure data quality, prevent hallucinations, detect PII, and enforce format requirements before content moves through your workflow.
12+
13+
<div className="flex justify-center">
14+
<Image
15+
src="/static/blocks/guardrails.png"
16+
alt="Guardrails Block"
17+
width={500}
18+
height={350}
19+
className="my-6"
20+
/>
21+
</div>
22+
23+
## Overview
24+
25+
The Guardrails block enables you to:
26+
27+
<Steps>
28+
<Step>
29+
<strong>Validate JSON Structure</strong>: Ensure LLM outputs are valid JSON before parsing
30+
</Step>
31+
<Step>
32+
<strong>Match Regex Patterns</strong>: Verify content matches specific formats (emails, phone numbers, URLs, etc.)
33+
</Step>
34+
<Step>
35+
<strong>Detect Hallucinations</strong>: Use RAG + LLM scoring to validate AI outputs against knowledge base content
36+
</Step>
37+
<Step>
38+
<strong>Detect PII</strong>: Identify and optionally mask personally identifiable information across 40+ entity types
39+
</Step>
40+
</Steps>
41+
42+
## Validation Types
43+
44+
### JSON Validation
45+
46+
Validates that content is properly formatted JSON. Perfect for ensuring structured LLM outputs can be safely parsed.
47+
48+
**Use Cases:**
49+
- Validate JSON responses from Agent blocks before parsing
50+
- Ensure API payloads are properly formatted
51+
- Check structured data integrity
52+
53+
**Output:**
54+
- `passed`: `true` if valid JSON, `false` otherwise
55+
- `error`: Error message if validation fails (e.g., "Invalid JSON: Unexpected token...")
56+
57+
### Regex Validation
58+
59+
Checks if content matches a specified regular expression pattern.
60+
61+
**Use Cases:**
62+
- Validate email addresses
63+
- Check phone number formats
64+
- Verify URLs or custom identifiers
65+
- Enforce specific text patterns
66+
67+
**Configuration:**
68+
- **Regex Pattern**: The regular expression to match against (e.g., `^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$` for emails)
69+
70+
**Output:**
71+
- `passed`: `true` if content matches pattern, `false` otherwise
72+
- `error`: Error message if validation fails
73+
74+
### Hallucination Detection
75+
76+
Uses Retrieval-Augmented Generation (RAG) with LLM scoring to detect when AI-generated content contradicts or isn't grounded in your knowledge base.
77+
78+
**How It Works:**
79+
1. Queries your knowledge base for relevant context
80+
2. Sends both the AI output and retrieved context to an LLM
81+
3. LLM assigns a confidence score (0-10 scale)
82+
- **0** = Full hallucination (completely ungrounded)
83+
- **10** = Fully grounded (completely supported by knowledge base)
84+
4. Validation passes if score ≥ threshold (default: 3)
85+
86+
**Configuration:**
87+
- **Knowledge Base**: Select from your existing knowledge bases
88+
- **Model**: Choose LLM for scoring (requires strong reasoning - GPT-4o, Claude 3.7 Sonnet recommended)
89+
- **API Key**: Authentication for selected LLM provider (auto-hidden for hosted/Ollama models)
90+
- **Confidence Threshold**: Minimum score to pass (0-10, default: 3)
91+
- **Top K** (Advanced): Number of knowledge base chunks to retrieve (default: 10)
92+
93+
**Output:**
94+
- `passed`: `true` if confidence score ≥ threshold
95+
- `score`: Confidence score (0-10)
96+
- `reasoning`: LLM's explanation for the score
97+
- `error`: Error message if validation fails
98+
99+
**Use Cases:**
100+
- Validate Agent responses against documentation
101+
- Ensure customer support answers are factually accurate
102+
- Verify generated content matches source material
103+
- Quality control for RAG applications
104+
105+
### PII Detection
106+
107+
Detects personally identifiable information using Microsoft Presidio. Supports 40+ entity types across multiple countries and languages.
108+
109+
<div className="mx-auto w-3/5 overflow-hidden rounded-lg">
110+
<Video src="guardrails.mp4" width={500} height={350} />
111+
</div>
112+
113+
**How It Works:**
114+
1. Scans content for PII entities using pattern matching and NLP
115+
2. Returns detected entities with locations and confidence scores
116+
3. Optionally masks detected PII in the output
117+
118+
**Configuration:**
119+
- **PII Types to Detect**: Select from grouped categories via modal selector
120+
- **Common**: Person name, Email, Phone, Credit card, IP address, etc.
121+
- **USA**: SSN, Driver's license, Passport, etc.
122+
- **UK**: NHS number, National insurance number
123+
- **Spain**: NIF, NIE, CIF
124+
- **Italy**: Fiscal code, Driver's license, VAT code
125+
- **Poland**: PESEL, NIP, REGON
126+
- **Singapore**: NRIC/FIN, UEN
127+
- **Australia**: ABN, ACN, TFN, Medicare
128+
- **India**: Aadhaar, PAN, Passport, Voter number
129+
- **Mode**:
130+
- **Detect**: Only identify PII (default)
131+
- **Mask**: Replace detected PII with masked values
132+
- **Language**: Detection language (default: English)
133+
134+
**Output:**
135+
- `passed`: `false` if any selected PII types are detected
136+
- `detectedEntities`: Array of detected PII with type, location, and confidence
137+
- `maskedText`: Content with PII masked (only if mode = "Mask")
138+
- `error`: Error message if validation fails
139+
140+
**Use Cases:**
141+
- Block content containing sensitive personal information
142+
- Mask PII before logging or storing data
143+
- Compliance with GDPR, HIPAA, and other privacy regulations
144+
- Sanitize user inputs before processing
145+
146+
## Configuration
147+
148+
### Content to Validate
149+
150+
The input content to validate. This typically comes from:
151+
- Agent block outputs: `<agent.content>`
152+
- Function block results: `<function.output>`
153+
- API responses: `<api.output>`
154+
- Any other block output
155+
156+
### Validation Type
157+
158+
Choose from four validation types:
159+
- **Valid JSON**: Check if content is properly formatted JSON
160+
- **Regex Match**: Verify content matches a regex pattern
161+
- **Hallucination Check**: Validate against knowledge base with LLM scoring
162+
- **PII Detection**: Detect and optionally mask personally identifiable information
163+
164+
## Outputs
165+
166+
All validation types return:
167+
168+
- **`<guardrails.passed>`**: Boolean indicating if validation passed
169+
- **`<guardrails.validationType>`**: The type of validation performed
170+
- **`<guardrails.input>`**: The original input that was validated
171+
- **`<guardrails.error>`**: Error message if validation failed (optional)
172+
173+
Additional outputs by type:
174+
175+
**Hallucination Check:**
176+
- **`<guardrails.score>`**: Confidence score (0-10)
177+
- **`<guardrails.reasoning>`**: LLM's explanation
178+
179+
**PII Detection:**
180+
- **`<guardrails.detectedEntities>`**: Array of detected PII entities
181+
- **`<guardrails.maskedText>`**: Content with PII masked (if mode = "Mask")
182+
183+
## Example Use Cases
184+
185+
### Validate JSON Before Parsing
186+
187+
<div className="mb-4 rounded-md border p-4">
188+
<h4 className="font-medium">Scenario: Ensure Agent output is valid JSON</h4>
189+
<ol className="list-decimal pl-5 text-sm">
190+
<li>Agent generates structured JSON response</li>
191+
<li>Guardrails validates JSON format</li>
192+
<li>Condition block checks `<guardrails.passed>`</li>
193+
<li>If passed → Parse and use data, If failed → Retry or handle error</li>
194+
</ol>
195+
</div>
196+
197+
### Prevent Hallucinations
198+
199+
<div className="mb-4 rounded-md border p-4">
200+
<h4 className="font-medium">Scenario: Validate customer support responses</h4>
201+
<ol className="list-decimal pl-5 text-sm">
202+
<li>Agent generates response to customer question</li>
203+
<li>Guardrails checks against support documentation knowledge base</li>
204+
<li>If confidence score ≥ 3 → Send response</li>
205+
<li>If confidence score \< 3 → Flag for human review</li>
206+
</ol>
207+
</div>
208+
209+
### Block PII in User Inputs
210+
211+
<div className="mb-4 rounded-md border p-4">
212+
<h4 className="font-medium">Scenario: Sanitize user-submitted content</h4>
213+
<ol className="list-decimal pl-5 text-sm">
214+
<li>User submits form with text content</li>
215+
<li>Guardrails detects PII (emails, phone numbers, SSN, etc.)</li>
216+
<li>If PII detected → Reject submission or mask sensitive data</li>
217+
<li>If no PII → Process normally</li>
218+
</ol>
219+
</div>
220+
221+
<div className="mx-auto w-3/5 overflow-hidden rounded-lg">
222+
<Video src="guardrails-example.mp4" width={500} height={350} />
223+
</div>
224+
225+
### Validate Email Format
226+
227+
<div className="mb-4 rounded-md border p-4">
228+
<h4 className="font-medium">Scenario: Check email address format</h4>
229+
<ol className="list-decimal pl-5 text-sm">
230+
<li>Agent extracts email from text</li>
231+
<li>Guardrails validates with regex pattern</li>
232+
<li>If valid → Use email for notification</li>
233+
<li>If invalid → Request correction</li>
234+
</ol>
235+
</div>
236+
237+
## Best Practices
238+
239+
- **Chain with Condition blocks**: Use `<guardrails.passed>` to branch workflow logic based on validation results
240+
- **Use JSON validation before parsing**: Always validate JSON structure before attempting to parse LLM outputs
241+
- **Choose appropriate PII types**: Only select the PII entity types relevant to your use case for better performance
242+
- **Set reasonable confidence thresholds**: For hallucination detection, adjust threshold based on your accuracy requirements (higher = stricter)
243+
- **Use strong models for hallucination detection**: GPT-4o or Claude 3.7 Sonnet provide more accurate confidence scoring
244+
- **Mask PII for logging**: Use "Mask" mode when you need to log or store content that may contain PII
245+
- **Test regex patterns**: Validate your regex patterns thoroughly before deploying to production
246+
- **Monitor validation failures**: Track `<guardrails.error>` messages to identify common validation issues
247+
248+
<Callout type="info">
249+
Guardrails validation happens synchronously in your workflow. For hallucination detection, choose faster models (like GPT-4o-mini) if latency is critical.
250+
</Callout>
251+
107 KB
Loading

0 commit comments

Comments
 (0)