-
Notifications
You must be signed in to change notification settings - Fork 37
Renders PlotlyJS HTML plots in the Chat message frontend #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Vineeth Kalluru <[email protected]>
Signed-off-by: Vineeth Kalluru <[email protected]>
59c0287
to
6b5ece9
Compare
WalkthroughAdds HTML file rendering support in chat: detects HTML links/inline blocks in assistant messages, removes them from displayed text, and renders them via a new HtmlFileRenderer component. Introduces a Next.js API to serve HTML files, adds detection/cleanup utilities, corresponding unit tests, Jest mocks/mapping for remark/rehype plugins, and config updates. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant ChatUI as ChatMessage (React)
participant Detector as htmlFileDetector
participant Renderer as HtmlFileRenderer
User->>ChatUI: View assistant message
ChatUI->>Detector: detectHtmlFileLinks(content)
Detector-->>ChatUI: HtmlFileLink[]
ChatUI->>ChatUI: removeHtmlFileLinksFromContent(content)
ChatUI-->>User: Render cleaned markdown
loop For each HtmlFileLink
ChatUI->>Renderer: Render HtmlFileRenderer(props)
end
sequenceDiagram
autonumber
actor User
participant Renderer as HtmlFileRenderer
participant API as /api/load-html-file
participant FS as Filesystem (server)
Note over Renderer: Initial render
Renderer->>Renderer: normalize path / detect inline HTML
User->>Renderer: Click "Show"
alt Inline HTML
Renderer-->>User: Display iframe via srcDoc
else External .html file
Renderer->>API: GET ?filePath=...
API->>FS: Validate + read .html
alt Exists and readable
FS-->>API: HTML content
API-->>Renderer: 200 text/html
Renderer-->>User: Display iframe via srcDoc
else Not found / invalid / error
API-->>Renderer: 4xx/5xx
Renderer-->>User: Show error guidance
end
end
User->>Renderer: Copy path/content
Renderer-->>User: Clipboard confirmation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🧪 Early access (Sonnet 4.5): enabledWe are currently testing the Sonnet 4.5 model, which is expected to improve code review quality. However, this model may lead to increased noise levels in the review comments. Please disable the early access features if the noise level causes any inconvenience. Note:
Comment |
Signed-off-by: Vineeth Kalluru <[email protected]>
Signed-off-by: Vineeth Kalluru <[email protected]>
Signed-off-by: Vineeth Kalluru <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
🧹 Nitpick comments (3)
pages/api/load-html-file.ts (1)
5-11
: Consider adding rate limiting for DOS protection.The endpoint has no rate limiting, making it vulnerable to abuse where attackers could repeatedly request large HTML files to exhaust server resources.
Consider adding rate limiting middleware or using a solution like
next-rate-limit
:import rateLimit from 'express-rate-limit'; const limiter = rateLimit({ windowMs: 60 * 1000, // 1 minute max: 30 // limit each IP to 30 requests per minute });components/Chat/ChatMessage.tsx (1)
217-241
: Consider reducing duplication between prepareContent and prepareContentWithoutHtmlLinks.The two functions share identical logic except for lines 234-236 (removal of HTML file links). This duplication makes maintenance harder—changes to formatting or processing must be applied in both places.
Refactor to eliminate duplication:
- const prepareContent = ({ + const prepareContentCore = ({ message = {} as Message, responseContent = true, intermediateStepsContent = false, role = 'assistant', + removeHtmlLinks = false, } = {}) => { const { content = '', intermediateSteps = [] } = message; if (role === 'user') return content.trim(); let result = ''; if (intermediateStepsContent) { result += generateContentIntermediate(intermediateSteps); } if (responseContent) { - result += result ? `\n\n${content}` : content; + const processedContent = removeHtmlLinks + ? removeHtmlFileLinksFromContent(content) + : content; + result += result ? `\n\n${processedContent}` : processedContent; } // fixing malformed html and removing extra spaces to avoid markdown issues return fixMalformedHtml(result)?.trim()?.replace(/\n\s+/, '\n '); }; + const prepareContent = (options = {}) => prepareContentCore(options); - // Prepare content with HTML file links removed to avoid duplicate display - const prepareContentWithoutHtmlLinks = ({ - message = {} as Message, - responseContent = true, - intermediateStepsContent = false, - role = 'assistant', - } = {}) => { - const { content = '', intermediateSteps = [] } = message; - - if (role === 'user') return content.trim(); - - let result = ''; - if (intermediateStepsContent) { - result += generateContentIntermediate(intermediateSteps); - } - - if (responseContent) { - // Remove HTML file links from content - const cleanContent = removeHtmlFileLinksFromContent(content); - result += result ? `\n\n${cleanContent}` : cleanContent; - } - - // fixing malformed html and removing extra spaces to avoid markdown issues - return fixMalformedHtml(result)?.trim()?.replace(/\n\s+/, '\n '); - }; + const prepareContentWithoutHtmlLinks = (options = {}) => + prepareContentCore({ ...options, removeHtmlLinks: true });utils/app/htmlFileDetector.ts (1)
67-71
: Consider nested tag edge case in title extraction.The regex patterns assume
<title>
and<h1>
content doesn't contain nested HTML tags. For example,<title>My <em>Title</em></title>
would only capture "My ". This is likely acceptable for typical PlotlyJS HTML files, but be aware of this limitation.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
__mocks__/rehype-raw.js
(1 hunks)__mocks__/remark-gfm.js
(1 hunks)__mocks__/remark-math.js
(1 hunks)__tests__/components/Chat/ChatMessage.test.tsx
(1 hunks)__tests__/components/Chat/HtmlFileRenderer.test.tsx
(1 hunks)__tests__/utils/app/htmlFileDetector.test.ts
(1 hunks)components/Chat/ChatMessage.tsx
(5 hunks)components/Chat/HtmlFileRenderer.tsx
(1 hunks)jest.config.js
(1 hunks)pages/api/load-html-file.ts
(1 hunks)utils/app/htmlFileDetector.ts
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
components/Chat/ChatMessage.tsx (4)
utils/app/htmlFileDetector.ts (3)
HtmlFileLink
(1-7)detectHtmlFileLinks
(20-32)removeHtmlFileLinksFromContent
(227-261)types/chat.ts (1)
Message
(1-10)utils/app/helper.ts (2)
generateContentIntermediate
(260-303)fixMalformedHtml
(327-337)components/Chat/HtmlFileRenderer.tsx (1)
HtmlFileRenderer
(12-321)
__tests__/utils/app/htmlFileDetector.test.ts (1)
utils/app/htmlFileDetector.ts (2)
detectHtmlFileLinks
(20-32)removeHtmlFileLinksFromContent
(227-261)
utils/app/htmlFileDetector.ts (1)
pages/api/load-html-file.ts (1)
handler
(5-59)
__tests__/components/Chat/HtmlFileRenderer.test.tsx (1)
components/Chat/HtmlFileRenderer.tsx (1)
HtmlFileRenderer
(12-321)
__tests__/components/Chat/ChatMessage.test.tsx (2)
types/chat.ts (1)
Message
(1-10)components/Chat/ChatMessage.tsx (1)
ChatMessage
(45-447)
🔇 Additional comments (25)
__mocks__/remark-math.js (1)
1-16
: LGTM! Mock implementation is appropriate.The no-op transformer correctly bypasses LaTeX math parsing during tests while maintaining the expected plugin interface. The default export alias on line 16 ensures compatibility with both CommonJS and ESM import patterns.
components/Chat/HtmlFileRenderer.tsx (2)
206-263
: Good: Comprehensive UI with proper visual hierarchy and accessibility.The component header provides clear status indicators, appropriate action buttons, and good visual feedback. The gradient background, badges, and button states enhance UX.
304-304
: The review comment incorrectly assumes PlotlyJS usage.After verifying the codebase, PlotlyJS is not used anywhere in this application. The
HtmlFileRenderer
component is a generic HTML renderer that displays arbitrary HTML content via thesrcDoc
attribute, not specifically Plotly charts. The codebase usesrecharts
andreact-force-graph-2d
for visualizations instead.The sandbox attribute
"allow-scripts allow-same-origin"
is appropriate for rendering generic HTML content. The concerns about PlotlyJS-specific features (zoom/pan interactions, modebar, export functionality) are not applicable here.Likely an incorrect or invalid review comment.
jest.config.js (2)
21-23
: LGTM! Mock mappings support ESM compatibility.The new moduleNameMapper entries correctly route rehype-raw, remark-gfm, and remark-math to their respective CommonJS-compatible mocks, ensuring tests run without ESM import issues in Jest's CommonJS environment.
26-26
: LGTM! Comprehensive transformIgnorePatterns for unified ecosystem.The expanded pattern now covers the full range of unist/mdast/hast utilities and their dependencies, preventing Jest from attempting to transform these ESM-only packages. This is the correct approach for handling the unified markdown/HTML processing ecosystem.
__mocks__/rehype-raw.js (1)
1-16
: LGTM! Mock implementation follows established patterns.The no-op transformer appropriately bypasses HTML parsing during tests while maintaining the expected plugin interface. The implementation is consistent with the other mocks in this PR (remark-math, remark-gfm).
__mocks__/remark-gfm.js (1)
1-17
: LGTM!The mock implementation is appropriate for isolating component behavior in tests. The no-op transformer and dual export pattern correctly handle both CommonJS and default imports.
__tests__/utils/app/htmlFileDetector.test.ts (1)
1-140
: LGTM!The test suite provides comprehensive coverage of HTML file detection and cleanup functionality. The tests validate:
- Multiple detection patterns (HTML anchors, Markdown links, file:// URLs, inline HTML)
- Edge cases (duplicates, special characters, non-HTML links)
- Content cleanup while preserving surrounding text
The assertions correctly verify both detection (link objects with proper fields) and removal (cleaned content).
__tests__/components/Chat/HtmlFileRenderer.test.tsx (4)
1-21
: LGTM!Test setup with mocked fetch and clipboard APIs is appropriate for isolating component behavior.
23-68
: LGTM!Tests correctly verify loading states, successful rendering with iframe srcDoc, non-existent file handling, and error scenarios. The use of
waitFor
properly handles async operations.
83-146
: LGTM!UI interaction tests correctly verify toggle, copy-to-clipboard for both file paths and inline HTML content, and proper alert messages.
70-81
: Verified: inline HTML correctly skips fetch.The component implementation confirms that
checkFileExists
returns early whenisInlineHtml
is true (lines 42-45), settingfileExists
without calling fetch. The test assertion is correct.components/Chat/ChatMessage.tsx (4)
36-37
: LGTM!Imports for the new HtmlFileRenderer component and HTML file detection utilities are correctly added.
188-192
: LGTM!HTML file detection is appropriately scoped to assistant messages only, avoiding unnecessary processing for user messages.
354-385
: LGTM!Correctly using
prepareContentWithoutHtmlLinks
for assistant messages to remove HTML file links from displayed content, preventing duplicate display alongside the HtmlFileRenderer.
387-400
: LGTM!The rendering logic correctly:
- Conditionally renders HtmlFileRenderer only when
htmlFileLinks
is non-empty- Maps over detected links with proper key generation
- Passes all necessary props (filePath, title, isInlineHtml, htmlContent)
__tests__/components/Chat/ChatMessage.test.tsx (3)
1-36
: LGTM!Test setup correctly mocks HtmlFileRenderer and BotAvatar, and provides a proper HomeContext wrapper for rendering ChatMessage components.
38-98
: LGTM!Integration tests correctly verify:
- HtmlFileRenderer renders for assistant messages with HTML links
- HTML links are removed from displayed content
- No renderer for user messages (even with links)
- No renderer for assistant messages without links
- Inline HTML content is handled properly
100-112
: LGTM!Test correctly verifies that multiple HTML files in a single message result in multiple HtmlFileRenderer instances.
utils/app/htmlFileDetector.ts (6)
1-14
: LGTM! Well-structured type definitions.The interfaces and type alias provide clear contracts and good type safety for the detection logic.
20-32
: LGTM! Well-structured detection flow.The function properly orchestrates inline HTML detection, pattern matching, and deduplication. The use of
processedRanges
prevents overlapping matches effectively.
37-62
: LGTM! Inline HTML detection handles complete blocks correctly.The regex pattern correctly identifies complete
<html>...</html>
blocks with non-greedy matching. The synthetic filePath generation ensures unique identifiers for each inline block.
76-109
: LGTM! Pattern matching with proper overlap handling.The pattern/handler architecture provides flexibility while the overlap detection ensures each content region is processed only once.
171-221
: LGTM! Helper functions are well-implemented.The overlap detection logic is correct, and the path/title extraction functions provide good defensive programming and user-friendly formatting.
114-166
: Fix character class in plain file path pattern.Line 159: The character class
[\w\/\-\.]
has the hyphen between\/
and\.
, which could be interpreted as a range rather than a literal hyphen. This may cause unexpected matching behavior.Apply this diff to fix the character class:
- regex: /\s(\/[\w\/\-\.]+\.html)\b/g, + regex: /\s(\/[\w/.\-]+\.html)\b/g,Alternatively, escape the forward slash (though unnecessary in character classes) and place hyphen at the end or escape it:
- regex: /\s(\/[\w\/\-\.]+\.html)\b/g, + regex: /\s(\/[\w/\.\-]+\.html)\b/g,Likely an incorrect or invalid review comment.
// We'll abort quickly to avoid downloading large files | ||
const controller = new AbortController(); | ||
const timeoutId = setTimeout(() => controller.abort(), 3000); // 3 second timeout | ||
|
||
const response = await fetch(`/api/load-html-file?path=${encodeURIComponent(cleanPath)}`, { | ||
signal: controller.signal | ||
}); | ||
|
||
clearTimeout(timeoutId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Abort controller timeout cleanup missing.
If the fetch completes before the 3-second timeout (line 54), the timeout is cleared (line 60). However, if the component unmounts while the fetch is pending, the timeout continues and setFileExists
is called on an unmounted component.
Apply this diff to clean up on unmount:
const checkFileExists = async () => {
// For inline HTML, file always "exists"
if (isInlineHtml && inlineHtmlContent) {
setFileExists(true);
return;
}
try {
const cleanPath = cleanFilePath(filePath);
console.log('Checking if HTML file exists:', cleanPath);
// Use a simple GET request to check if file exists
// We'll abort quickly to avoid downloading large files
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 3000); // 3 second timeout
- const response = await fetch(`/api/load-html-file?path=${encodeURIComponent(cleanPath)}`, {
- signal: controller.signal
- });
-
- clearTimeout(timeoutId);
+ try {
+ const response = await fetch(`/api/load-html-file?path=${encodeURIComponent(cleanPath)}`, {
+ signal: controller.signal
+ });
+
+ clearTimeout(timeoutId);
+ // ... rest of logic
+ } catch (err) {
+ clearTimeout(timeoutId);
+ throw err;
+ }
Better yet, return a cleanup function from the useEffect that calls controller.abort()
.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In components/Chat/HtmlFileRenderer.tsx around lines 52-60, the AbortController
timeout is not cleaned up if the component unmounts while fetch is pending,
risking setFileExists being called on an unmounted component; fix by returning a
cleanup function from the useEffect that calls controller.abort() and
clearTimeout(timeoutId) so the pending fetch is aborted and the timer is
cancelled on unmount, ensuring no state updates occur after unmount.
useEffect(() => { | ||
// First check if file exists | ||
checkFileExists(); | ||
}, [filePath, isInlineHtml, inlineHtmlContent]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing dependency in useEffect causes stale closures.
The effect on line 115-118 calls checkFileExists()
but doesn't include it in the dependency array. This violates the exhaustive-deps rule and can cause stale closures or missed updates.
Apply this diff:
useEffect(() => {
// First check if file exists
checkFileExists();
- }, [filePath, isInlineHtml, inlineHtmlContent]);
+ }, [filePath, isInlineHtml, inlineHtmlContent, checkFileExists]);
Alternatively, wrap checkFileExists
in useCallback
to stabilize its reference:
const checkFileExists = useCallback(async () => {
// ... existing implementation
}, [filePath, isInlineHtml, inlineHtmlContent]);
🤖 Prompt for AI Agents
In components/Chat/HtmlFileRenderer.tsx around lines 115 to 118, the useEffect
calls checkFileExists() but does not include checkFileExists in the dependency
array which can lead to stale closures; fix by either adding checkFileExists to
the effect dependencies or stabilize its identity by wrapping it in useCallback
with the proper dependencies (filePath, isInlineHtml, inlineHtmlContent) and
then include that stable function in the useEffect deps so the effect runs
correctly when inputs change.
useEffect(() => { | ||
// Only load content if file exists and component is expanded | ||
if (isExpanded && fileExists) { | ||
loadHtmlContent(); | ||
} | ||
}, [isExpanded, fileExists]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing dependency in useEffect causes stale closures.
Similar issue: loadHtmlContent()
is not in the dependency array, which can cause the effect to reference stale versions of the function.
Apply this diff:
useEffect(() => {
// Only load content if file exists and component is expanded
if (isExpanded && fileExists) {
loadHtmlContent();
}
- }, [isExpanded, fileExists]);
+ }, [isExpanded, fileExists, loadHtmlContent]);
And wrap loadHtmlContent
in useCallback
:
const loadHtmlContent = useCallback(async () => {
// ... existing implementation
}, [isExpanded, htmlContent, error, fileExists, filePath, isInlineHtml, inlineHtmlContent]);
🤖 Prompt for AI Agents
In components/Chat/HtmlFileRenderer.tsx around lines 120 to 125, the useEffect
calls loadHtmlContent but doesn't include it in the dependency array which can
cause stale closures; wrap loadHtmlContent in useCallback and include the proper
dependencies and then add loadHtmlContent to the useEffect dependencies.
Specifically, refactor loadHtmlContent into useCallback(async () => { /*
existing implementation */ }, [isExpanded, htmlContent, error, fileExists,
filePath, isInlineHtml, inlineHtmlContent]) and update the useEffect dependency
array to [isExpanded, fileExists, loadHtmlContent].
const openInSystemBrowser = () => { | ||
if (isInlineHtml) { | ||
// For inline HTML, create a blob URL and try to open it | ||
try { | ||
const blob = new Blob([inlineHtmlContent || ''], { type: 'text/html' }); | ||
const url = URL.createObjectURL(blob); | ||
window.open(url, '_blank'); | ||
// Clean up the URL after a delay | ||
setTimeout(() => URL.revokeObjectURL(url), 10000); | ||
} catch (error) { | ||
console.error('Error opening inline HTML:', error); | ||
alert('Unable to open inline HTML content in new window.'); | ||
} | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blob URL memory leak potential.
The blob URL created on line 154 is only revoked after 10 seconds (line 157). If the user closes the new window immediately, the blob remains in memory. Additionally, if this function is called multiple times, multiple blob URLs accumulate.
Consider tracking blob URLs and revoking them sooner:
const [activeBlobUrls, setActiveBlobUrls] = useState<string[]>([]);
const openInSystemBrowser = () => {
if (isInlineHtml) {
try {
const blob = new Blob([inlineHtmlContent || ''], { type: 'text/html' });
const url = URL.createObjectURL(blob);
setActiveBlobUrls(prev => [...prev, url]);
window.open(url, '_blank');
// Clean up after a shorter delay or on unmount
setTimeout(() => {
URL.revokeObjectURL(url);
setActiveBlobUrls(prev => prev.filter(u => u !== url));
}, 5000);
} catch (error) {
// ...
}
}
};
// Cleanup on unmount
useEffect(() => {
return () => {
activeBlobUrls.forEach(url => URL.revokeObjectURL(url));
};
}, [activeBlobUrls]);
🤖 Prompt for AI Agents
components/Chat/HtmlFileRenderer.tsx around lines 149-162: the current
implementation creates a blob URL and only revokes it after a fixed 10s timeout,
which can leak memory if windows are closed immediately or the function is
called repeatedly; fix by tracking created blob URLs (use a ref or state array)
and revoking each URL as soon as its opened window is closed (capture the window
handle from window.open and poll/attach unload to detect closure) or revoke
after a much shorter delay if detection fails, and ensure any remaining URLs are
revoked in a cleanup on unmount to prevent accumulation.
<iframe | ||
srcDoc={htmlContent} | ||
className="w-full border-0" | ||
sandbox="allow-scripts allow-same-origin" | ||
title={title || (isInlineHtml ? 'Inline HTML Content' : 'HTML Content')} | ||
style={{ height: '600px', minHeight: '500px' }} | ||
/> | ||
</div> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential XSS vulnerability with unsanitized HTML in iframe srcDoc.
The htmlContent
is rendered directly in an iframe via srcDoc
without sanitization. While the sandbox
attribute provides some protection, a sophisticated XSS payload could still escape or exploit browser vulnerabilities, especially if the HTML comes from untrusted sources.
Consider adding HTML sanitization:
import DOMPurify from 'isomorphic-dompurify';
// In the render section:
<iframe
srcDoc={DOMPurify.sanitize(htmlContent, {
ALLOWED_TAGS: ['html', 'head', 'body', 'script', 'div', 'svg', 'style', ...],
ALLOWED_ATTR: ['class', 'id', 'style', ...]
})}
// ...
/>
However, for PlotlyJS plots, you may need to allow <script>
tags, which limits sanitization effectiveness. Document this risk and ensure HTML files only come from trusted sources.
// Security check: prevent directory traversal attacks | ||
const resolvedPath = path.resolve(filePath); | ||
const workspaceRoot = process.cwd(); | ||
|
||
// Check if the file is within the workspace or in expected output directories | ||
const isInWorkspace = resolvedPath.startsWith(workspaceRoot) || | ||
resolvedPath.includes('/output_data/') || | ||
resolvedPath.includes('/plots/'); | ||
|
||
if (!isInWorkspace) { | ||
return res.status(403).json({ error: 'Access to this file path is not allowed' }); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Path traversal vulnerability and weak access control.
The current implementation has multiple security issues:
-
Path traversal not prevented:
path.resolve()
alone doesn't stop traversal attacks. An attacker can supply../../../etc/passwd.html
which resolves to an absolute path outside the workspace. -
Weak boundary check (lines 30-32):
includes('/output_data/')
matches anywhere in the path, not just allowed directories. An attacker could use/tmp/output_data/../../../etc/passwd.html
. -
Race condition: Checking
startsWith(workspaceRoot)
AFTER allowing paths with/output_data/
creates inconsistent security boundaries.
Apply this diff to fix the path traversal vulnerability:
- // Security check: prevent directory traversal attacks
- const resolvedPath = path.resolve(filePath);
- const workspaceRoot = process.cwd();
-
- // Check if the file is within the workspace or in expected output directories
- const isInWorkspace = resolvedPath.startsWith(workspaceRoot) ||
- resolvedPath.includes('/output_data/') ||
- resolvedPath.includes('/plots/');
-
- if (!isInWorkspace) {
- return res.status(403).json({ error: 'Access to this file path is not allowed' });
- }
+ // Security check: prevent directory traversal attacks
+ const workspaceRoot = process.cwd();
+ const allowedDirs = [
+ path.join(workspaceRoot, 'output_data'),
+ path.join(workspaceRoot, 'plots'),
+ workspaceRoot
+ ];
+
+ const resolvedPath = path.resolve(workspaceRoot, filePath);
+ const isInAllowedDir = allowedDirs.some(dir => resolvedPath.startsWith(dir + path.sep));
+
+ if (!isInAllowedDir) {
+ return res.status(403).json({ error: 'Access to this file path is not allowed' });
+ }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
// Security check: prevent directory traversal attacks | |
const resolvedPath = path.resolve(filePath); | |
const workspaceRoot = process.cwd(); | |
// Check if the file is within the workspace or in expected output directories | |
const isInWorkspace = resolvedPath.startsWith(workspaceRoot) || | |
resolvedPath.includes('/output_data/') || | |
resolvedPath.includes('/plots/'); | |
if (!isInWorkspace) { | |
return res.status(403).json({ error: 'Access to this file path is not allowed' }); | |
} | |
// Security check: prevent directory traversal attacks | |
const workspaceRoot = process.cwd(); | |
const allowedDirs = [ | |
path.join(workspaceRoot, 'output_data'), | |
path.join(workspaceRoot, 'plots'), | |
workspaceRoot | |
]; | |
const resolvedPath = path.resolve(workspaceRoot, filePath); | |
const isInAllowedDir = allowedDirs.some(dir => resolvedPath.startsWith(dir + path.sep)); | |
if (!isInAllowedDir) { | |
return res.status(403).json({ error: 'Access to this file path is not allowed' }); | |
} |
🤖 Prompt for AI Agents
pages/api/load-html-file.ts lines 25-36: the current path checks allow traversal
and misuse of includes() and create a race condition; instead compute
resolvedPath = path.resolve(filePath) and compute rel =
path.relative(workspaceRoot, resolvedPath) ensuring rel does not start with '..'
and is not equal to '..' (use path.sep normalization) to guarantee the file is
inside workspaceRoot, and separately pre-resolve absolute allowed output
directories (e.g., path.resolve(workspaceRoot, 'output_data'),
path.resolve(workspaceRoot, 'plots')) and check that resolvedPath startsWith one
of those canonical allowed dirs (no includes()); perform these checks before any
filesystem access and, to mitigate TOCTOU, open the file using a safe syscall
(or fs.promises.open) and validate the file descriptor's real path (fs.realpath)
against allowed canonical paths before reading.
// Check if file exists | ||
if (!fs.existsSync(resolvedPath)) { | ||
return res.status(404).json({ error: 'File not found' }); | ||
} | ||
|
||
// Read the HTML file | ||
const htmlContent = fs.readFileSync(resolvedPath, 'utf-8'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Major: Use async file system operations to avoid blocking.
Synchronous fs.existsSync()
and fs.readFileSync()
block the Node.js event loop, degrading performance under load. Next.js API routes should use async operations.
Apply this diff:
- // Check if file exists
- if (!fs.existsSync(resolvedPath)) {
- return res.status(404).json({ error: 'File not found' });
- }
-
- // Read the HTML file
- const htmlContent = fs.readFileSync(resolvedPath, 'utf-8');
+ // Check if file exists and read it
+ let htmlContent: string;
+ try {
+ htmlContent = await fs.promises.readFile(resolvedPath, 'utf-8');
+ } catch (err: any) {
+ if (err.code === 'ENOENT') {
+ return res.status(404).json({ error: 'File not found' });
+ }
+ throw err;
+ }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
// Check if file exists | |
if (!fs.existsSync(resolvedPath)) { | |
return res.status(404).json({ error: 'File not found' }); | |
} | |
// Read the HTML file | |
const htmlContent = fs.readFileSync(resolvedPath, 'utf-8'); | |
// Check if file exists and read it | |
let htmlContent: string; | |
try { | |
htmlContent = await fs.promises.readFile(resolvedPath, 'utf-8'); | |
} catch (err: any) { | |
if (err.code === 'ENOENT') { | |
return res.status(404).json({ error: 'File not found' }); | |
} | |
throw err; | |
} |
🤖 Prompt for AI Agents
In pages/api/load-html-file.ts around lines 38 to 44, the code uses blocking
fs.existsSync and fs.readFileSync; replace them with non-blocking async
equivalents: use fs.promises.stat or fs.promises.access (with try/catch) to
check existence and permissions, and use await
fs.promises.readFile(resolvedPath, 'utf-8') to read the file; ensure the
surrounding handler is async, wrap the operations in try/catch to return 404 or
500 as appropriate, and avoid using synchronous APIs to prevent blocking the
event loop.
// Set appropriate headers | ||
res.setHeader('Content-Type', 'text/html'); | ||
res.setHeader('Cache-Control', 'no-cache, no-store, must-revalidate'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Add security headers to prevent XSS and clickjacking.
HTML content from files should be served with Content Security Policy (CSP) and X-Frame-Options headers to mitigate XSS risks, especially since this content will be rendered in iframes.
Add these headers:
// Set appropriate headers
res.setHeader('Content-Type', 'text/html');
res.setHeader('Cache-Control', 'no-cache, no-store, must-revalidate');
+ res.setHeader('X-Content-Type-Options', 'nosniff');
+ res.setHeader('Content-Security-Policy', "default-src 'self' 'unsafe-inline' 'unsafe-eval'; script-src 'self' 'unsafe-inline' 'unsafe-eval' https://cdn.plot.ly;");
Note: The CSP allows unsafe-inline
and unsafe-eval
for PlotlyJS functionality - adjust based on your exact security requirements.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
// Set appropriate headers | |
res.setHeader('Content-Type', 'text/html'); | |
res.setHeader('Cache-Control', 'no-cache, no-store, must-revalidate'); | |
// Set appropriate headers | |
res.setHeader('Content-Type', 'text/html'); | |
res.setHeader('Cache-Control', 'no-cache, no-store, must-revalidate'); | |
res.setHeader('X-Content-Type-Options', 'nosniff'); | |
res.setHeader('Content-Security-Policy', "default-src 'self' 'unsafe-inline' 'unsafe-eval'; script-src 'self' 'unsafe-inline' 'unsafe-eval' https://cdn.plot.ly;"); |
🤖 Prompt for AI Agents
In pages/api/load-html-file.ts around lines 46 to 48, the response only sets
Content-Type and Cache-Control but lacks security headers; add a
Content-Security-Policy header that restricts sources while allowing
'unsafe-inline' and 'unsafe-eval' for PlotlyJS (e.g. default-src 'none';
script-src 'self' 'unsafe-inline' 'unsafe-eval' https:; style-src 'self'
'unsafe-inline' https:; img-src 'self' data: https:; connect-src 'self' https:;
font-src 'self' https:), and add X-Frame-Options set to SAMEORIGIN (or DENY if
not intended for iframes) to prevent clickjacking; set these headers on the
response before sending the HTML.
export function removeHtmlFileLinksFromContent(content: string): string { | ||
let cleanContent = content; | ||
|
||
// Remove HTML anchor tags to HTML files | ||
cleanContent = cleanContent.replace( | ||
/<a\s+href=["']?(file:\/\/[^"'\s>]+\.html)["']?[^>]*>([^<]+)<\/a>/gi, | ||
'' | ||
); | ||
|
||
// Remove HTML img tags that reference HTML files | ||
cleanContent = cleanContent.replace(/<img\s+[^>]*src=["']?(file:\/\/[^"'\s>]+\.html)["']?[^>]*\/?>/gi, ''); | ||
|
||
// Remove any complete HTML blocks | ||
cleanContent = cleanContent.replace(/<html[^>]*>[\s\S]*?<\/html>/gi, ''); | ||
|
||
// Remove markdown links to HTML files | ||
cleanContent = cleanContent.replace(/\[([^\]]+)\]\((file:\/\/[^)]+\.html)\)/g, ''); | ||
|
||
// Remove standalone file:// URLs | ||
cleanContent = cleanContent.replace(/\s(file:\/\/[^\s"'<>`\)]+\.html)\s*(?=[.,!?;:\n]|$)/g, ''); | ||
cleanContent = cleanContent.replace(/^(file:\/\/[^\s"'<>`\)]+\.html)\s*(?=[.,!?;:\n]|$)/gm, ''); | ||
|
||
// Remove file paths in various quote styles | ||
cleanContent = cleanContent.replace(/`([^`]+\.html)`/g, ''); | ||
cleanContent = cleanContent.replace(/"([^"]+\.html)"/g, ''); | ||
cleanContent = cleanContent.replace(/'([^']+\.html)'/g, ''); | ||
|
||
// Remove plain file paths | ||
cleanContent = cleanContent.replace(/\s(\/[\w\/\-\.]+\.html)\s*(?=[.,!?;:\n]|$)/g, ''); | ||
|
||
// Clean up extra newlines | ||
cleanContent = cleanContent.replace(/\n\s*\n\s*\n+/g, '\n\n'); | ||
|
||
return cleanContent.trim(); | ||
} No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix character class consistency (line 255) and verify removal patterns.
Line 255 has the same character class issue as line 159 in the detection patterns. Additionally, ensure that the removal patterns stay synchronized with the detection patterns to prevent display inconsistencies.
Apply this diff to fix the character class:
- cleanContent = cleanContent.replace(/\s(\/[\w\/\-\.]+\.html)\s*(?=[.,!?;:\n]|$)/g, '');
+ cleanContent = cleanContent.replace(/\s(\/[\w/.\-]+\.html)\s*(?=[.,!?;:\n]|$)/g, '');
Otherwise, the comprehensive removal logic correctly handles various formats and cleans up extra newlines effectively.
Consider extracting the regex patterns into shared constants used by both detectHtmlFileLinks
and removeHtmlFileLinksFromContent
to ensure they stay synchronized. This would prevent potential bugs where detection and removal patterns diverge.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
export function removeHtmlFileLinksFromContent(content: string): string { | |
let cleanContent = content; | |
// Remove HTML anchor tags to HTML files | |
cleanContent = cleanContent.replace( | |
/<a\s+href=["']?(file:\/\/[^"'\s>]+\.html)["']?[^>]*>([^<]+)<\/a>/gi, | |
'' | |
); | |
// Remove HTML img tags that reference HTML files | |
cleanContent = cleanContent.replace(/<img\s+[^>]*src=["']?(file:\/\/[^"'\s>]+\.html)["']?[^>]*\/?>/gi, ''); | |
// Remove any complete HTML blocks | |
cleanContent = cleanContent.replace(/<html[^>]*>[\s\S]*?<\/html>/gi, ''); | |
// Remove markdown links to HTML files | |
cleanContent = cleanContent.replace(/\[([^\]]+)\]\((file:\/\/[^)]+\.html)\)/g, ''); | |
// Remove standalone file:// URLs | |
cleanContent = cleanContent.replace(/\s(file:\/\/[^\s"'<>`\)]+\.html)\s*(?=[.,!?;:\n]|$)/g, ''); | |
cleanContent = cleanContent.replace(/^(file:\/\/[^\s"'<>`\)]+\.html)\s*(?=[.,!?;:\n]|$)/gm, ''); | |
// Remove file paths in various quote styles | |
cleanContent = cleanContent.replace(/`([^`]+\.html)`/g, ''); | |
cleanContent = cleanContent.replace(/"([^"]+\.html)"/g, ''); | |
cleanContent = cleanContent.replace(/'([^']+\.html)'/g, ''); | |
// Remove plain file paths | |
cleanContent = cleanContent.replace(/\s(\/[\w\/\-\.]+\.html)\s*(?=[.,!?;:\n]|$)/g, ''); | |
// Clean up extra newlines | |
cleanContent = cleanContent.replace(/\n\s*\n\s*\n+/g, '\n\n'); | |
return cleanContent.trim(); | |
} | |
export function removeHtmlFileLinksFromContent(content: string): string { | |
let cleanContent = content; | |
// Remove HTML anchor tags to HTML files | |
cleanContent = cleanContent.replace( | |
/<a\s+href=["']?(file:\/\/[^"'\s>]+\.html)["']?[^>]*>([^<]+)<\/a>/gi, | |
'' | |
); | |
// Remove HTML img tags that reference HTML files | |
cleanContent = cleanContent.replace(/<img\s+[^>]*src=["']?(file:\/\/[^"'\s>]+\.html)["']?[^>]*\/?>/gi, ''); | |
// Remove any complete HTML blocks | |
cleanContent = cleanContent.replace(/<html[^>]*>[\s\S]*?<\/html>/gi, ''); | |
// Remove markdown links to HTML files | |
cleanContent = cleanContent.replace(/\[([^\]]+)\]\((file:\/\/[^)]+\.html)\)/g, ''); | |
// Remove standalone file:// URLs | |
cleanContent = cleanContent.replace(/\s(file:\/\/[^\s"'<>`\)]+\.html)\s*(?=[.,!?;:\n]|$)/g, ''); | |
cleanContent = cleanContent.replace(/^(file:\/\/[^\s"'<>`\)]+\.html)\s*(?=[.,!?;:\n]|$)/gm, ''); | |
// Remove file paths in various quote styles | |
cleanContent = cleanContent.replace(/`([^`]+\.html)`/g, ''); | |
cleanContent = cleanContent.replace(/"([^"]+\.html)"/g, ''); | |
cleanContent = cleanContent.replace(/'([^']+\.html)'/g, ''); | |
// Remove plain file paths | |
cleanContent = cleanContent.replace(/\s(\/[\w/.\-]+\.html)\s*(?=[.,!?;:\n]|$)/g, ''); | |
// Clean up extra newlines | |
cleanContent = cleanContent.replace(/\n\s*\n\s*\n+/g, '\n\n'); | |
return cleanContent.trim(); | |
} |
🤖 Prompt for AI Agents
In utils/app/htmlFileDetector.ts around lines 227 to 261, the removal regex at
line 255 uses a different character class than the corresponding detection
pattern (same issue as at line 159) causing mismatched behavior; update the
offending regex to use the exact same character class/character set as the
detection pattern (make the file:// and path character classes identical, e.g.
mirror the allowed characters like [^\s"'<>`\)]+ used in detection), ensure all
other removal patterns match their detection counterparts, and optionally
refactor by extracting the shared regex patterns into constants used by both
detectHtmlFileLinks and removeHtmlFileLinksFromContent so they stay
synchronized.
Renders PlotlyJS HTML plots in the Chat message frontend
Overview
This feature enhances the NeMo Agent Toolkit UI to automatically detect and render PlotlyJS HTML plots within chat messages, providing users with seamless visualization capabilities directly in the conversation interface.
Background
This feature was developed as part of the Predictive Maintenance workflow on NAT. In agentic workflows, AI agents frequently generate data visualizations using the PlotlyJS library to help users understand complex datasets, trends, and insights. Previously, these plots were only accessible as HTML file paths mentioned in chat responses, requiring users to manually navigate to separate files to view the visualizations.
Technical Implementation
The agentic workflow generates plots using the PlotlyJS library and surfaces the HTML file paths to the user via the main LLM, which appears in the chat message. This PR introduces the following key changes:
HtmlFileRenderer
componentKey Features
Use Cases
Files Changed
components/Chat/ChatMessage.tsx
: Enhanced to detect and render HTML plotscomponents/Chat/HtmlFileRenderer.tsx
: New component for rendering HTML contentutils/app/htmlFileDetector.ts
: Utility functions for detecting and parsing HTML file linksjest.config.js
: Updated to handle ES module dependencies__mocks__/
: Added mocks for rehype-raw, remark-gfm, and remark-math pluginsUnit Testing
Test Coverage Summary
This PR includes comprehensive unit tests to ensure reliability and maintainability of the HTML plot rendering feature. All tests pass successfully with robust coverage of core functionality and edge cases.
htmlFileDetector.test.ts
,ChatMessage.test.tsx
rehype-raw.js
,remark-gfm.js
,remark-math.js
Jest Configuration Updates
To support the markdown rendering libraries used in the chat interface, we added mock implementations for ES modules that were incompatible with Jest's CommonJS environment:
These mocks provide no-op transformers that are sufficient for component testing, as we're testing component behavior rather than markdown parsing specifics.
Example Test Output
Testing
Example Images:
Summary by CodeRabbit
New Features
Tests