Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 61 additions & 10 deletions src/core/prompts/tools/read-file.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,20 @@ export function getReadFileDescription(args: ToolArgs): string {
const isMultipleReadsEnabled = maxConcurrentReads > 1

return `## read_file
Description: Request to read the contents of ${isMultipleReadsEnabled ? "one or more files" : "a file"}. The tool outputs line-numbered content (e.g. "1 | const x = 1") for easy reference when creating diffs or discussing code.${args.partialReadsEnabled ? " Use line ranges to efficiently read specific portions of large files." : ""} Supports text extraction from PDF and DOCX files, but may not handle other binary files properly.
Description: Request to read the contents of ${isMultipleReadsEnabled ? "one or more files" : "a file"}. The tool outputs line-numbered content (e.g. "1 | const x = 1") for easy reference when creating diffs or discussing code.${args.partialReadsEnabled ? " Use line ranges to efficiently read specific portions of large files." : ""} Use pattern to search for specific content in files. Supports text extraction from PDF and DOCX files, but may not handle other binary files properly.

${isMultipleReadsEnabled ? `**IMPORTANT: You can read a maximum of ${maxConcurrentReads} files in a single request.** If you need to read more files, use multiple sequential read_file requests.` : "**IMPORTANT: Multiple file reads are currently disabled. You can only read one file at a time.**"}

${args.partialReadsEnabled ? `By specifying line ranges, you can efficiently read specific portions of large files without loading the entire file into memory.` : ""}
${args.partialReadsEnabled ? `By specifying line ranges, you can efficiently read specific portions of large files without loading the entire file into memory.` : ""} Use pattern to search for specific content and get lightweight results with match locations - pattern returns line numbers + context, allowing you to then use line_range to read full context around matches.
Parameters:
- args: Contains one or more file elements, where each file contains:
- path: (required) File path (relative to workspace directory ${args.cwd})
${args.partialReadsEnabled ? `- line_range: (optional) One or more line range elements in format "start-end" (1-based, inclusive)` : ""}
${
args.partialReadsEnabled
? `- line_range: (optional) One or more line range elements in format "start-end" (1-based, inclusive)
`
: ""
}- pattern: (optional) Regex pattern to search within the file (lightweight search mode)

Usage:
<read_file>
Expand Down Expand Up @@ -69,17 +74,63 @@ ${isMultipleReadsEnabled ? "3. " : "2. "}Reading an entire file:
</args>
</read_file>

IMPORTANT: You MUST use this Efficient Reading Strategy:
- ${isMultipleReadsEnabled ? `You MUST read all related files and implementations together in a single operation (up to ${maxConcurrentReads} files at once)` : "You MUST read files one at a time, as multiple file reads are currently disabled"}
- You MUST obtain all necessary context before proceeding with changes
${isMultipleReadsEnabled ? "4. " : "3. "}Searching for a pattern (lightweight mode):
<read_file>
<args>
<file>
<path>src/app.ts</path>
<pattern>async function|TODO</pattern>
</file>
</args>
</read_file>
${
args.partialReadsEnabled
? `
${isMultipleReadsEnabled ? "5. " : "4. "}Combining pattern and line_range (search within specific range):
<read_file>
<args>
<file>
<path>src/utils.ts</path>
<line_range>100-500</line_range>
<pattern>export const</pattern>
</file>
</args>
</read_file>`
: ""
}

CRITICAL RULES FOR READING FILES (YOU MUST FOLLOW):

**Pattern Search (Always Available):**
- **Find specific content:** Use pattern to search within a single file, or use search_files to search across multiple files
- **Pattern search workflow:** Use pattern to find matches (returns line numbers + context)${args.partialReadsEnabled ? `, then use line_range to read full context around matches` : ""}
- **Pattern output:** Returns match locations (line numbers) + 2 lines of context per match (max 20 matches)

✅ Pattern Examples:
- Searching for pattern: <pattern>async function|class.*implements</pattern>
${args.partialReadsEnabled ? `- Pattern + range: <line_range>100-500</line_range> and <pattern>TODO|FIXME</pattern>` : ""}
${
args.partialReadsEnabled
? `- You MUST use line ranges to read specific portions of large files, rather than reading entire files when not needed
- You MUST combine adjacent line ranges (<10 lines apart)
- You MUST use multiple ranges for content separated by >10 lines
- You MUST include sufficient line context for planned modifications while keeping ranges minimal
? `
**Line Range Features:**
1. **Large files (>300 lines):** System will auto-limit to first 100 lines if you don't specify line_range, with an educational notice showing you how to use line_range for specific sections
2. **Preview unknown files:** Read first without line_range to see file size and structure, then use line_range for specific sections if needed
3. **Multiple sections:** Use multiple <line_range> tags in ONE request to read non-adjacent sections efficiently

✅ Line Range Examples:
- Reading large file preview: <line_range>1-100</line_range>
- Reading specific function: <line_range>450-520</line_range>
- Reading multiple sections: <line_range>1-50</line_range> and <line_range>200-250</line_range>

📚 How it works:
- Files ≤300 lines: Full content returned
- Files >300 lines without line_range: First 100 lines + notice with usage examples
- Files >300 lines with line_range: Exact ranges you specified

`
: ""
}
- ${isMultipleReadsEnabled ? `You MUST read all related files together in a single operation (up to ${maxConcurrentReads} files at once)` : "You MUST read files one at a time, as multiple file reads are currently disabled"}
- You MUST obtain all necessary context before proceeding with changes
${isMultipleReadsEnabled ? `- When you need to read more than ${maxConcurrentReads} files, prioritize the most critical files first, then use subsequent read_file requests for additional files` : ""}`
}
231 changes: 231 additions & 0 deletions src/core/tools/readFileTool.ts
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ import {
processImageFile,
ImageMemoryTracker,
} from "./helpers/imageHelpers"
import { regexSearchFiles } from "../../services/ripgrep"

export function getReadFileToolDescription(blockName: string, blockParams: any): string {
// Handle both single path and multiple files via args
Expand Down Expand Up @@ -54,6 +55,120 @@ export function getReadFileToolDescription(blockName: string, blockParams: any):
return `[${blockName} with missing path/args]`
}
}

/**
* Pattern 搜索結果介面
*/
interface PatternMatch {
startLine: number
endLine: number
matchLine: number // 實際匹配的行號
content: string // 帶行號的上下文(前後各 2 行)
}

/**
* 在單個檔案中搜索 pattern
* @param filePath - 完整檔案路徑
* @param cwd - 工作目錄
* @param pattern - Regex pattern
* @param rooIgnoreController - Ignore 控制器
* @returns 匹配結果數組
*/
async function searchPatternInFile(
filePath: string,
cwd: string,
pattern: string,
rooIgnoreController?: any,
): Promise<PatternMatch[]> {
try {
// 使用 ripgrep 搜索單個檔案
const searchResults = await regexSearchFiles(cwd, filePath, pattern, undefined, rooIgnoreController)

// 解析 ripgrep 輸出
// 輸出格式範例:
// # src/app.ts
// 45 | export async function fetchData() {
// 46 | const response = await fetch(url)
// 47 | return response.json()
// ----

const matches: PatternMatch[] = []
const lines = searchResults.split("\n")

let currentMatchLines: { line: number; text: string }[] = []
let inMatchBlock = false
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused variable 'inMatchBlock' is declared and updated but never used. Consider removing it.

Suggested change
let inMatchBlock = false

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable inMatchBlock is set but never used in searchPatternInFile. Removing it would simplify the code.

Suggested change
let inMatchBlock = false


for (let i = 0; i < lines.length; i++) {
const line = lines[i]

// 跳過檔案名行
if (line.startsWith("#")) {
continue
}

// 分隔符表示一個匹配塊結束
if (line.trim() === "----") {
if (currentMatchLines.length > 0) {
// 提取行號範圍
const lineNumbers = currentMatchLines.map((l) => l.line)
const startLine = Math.min(...lineNumbers)
const endLine = Math.max(...lineNumbers)

// 組合內容
const content = currentMatchLines
.map((l) => `${String(l.line).padStart(3, " ")} | ${l.text}`)
.join("\n")

// 假設匹配行在中間位置
const matchLine = currentMatchLines[Math.floor(currentMatchLines.length / 2)].line

matches.push({
startLine,
endLine,
matchLine,
content,
})

currentMatchLines = []
}
inMatchBlock = false
continue
}

// 解析行號和內容
// 格式: " 45 | export async function fetchData() {"
const match = line.match(/^\s*(\d+)\s+\|\s+(.*)$/)
if (match) {
const lineNumber = parseInt(match[1], 10)
const lineText = match[2]
currentMatchLines.push({ line: lineNumber, text: lineText })
inMatchBlock = true
}
}

// 處理最後一個匹配塊(如果沒有結尾的分隔符)
if (currentMatchLines.length > 0) {
const lineNumbers = currentMatchLines.map((l) => l.line)
const startLine = Math.min(...lineNumbers)
const endLine = Math.max(...lineNumbers)
const content = currentMatchLines.map((l) => `${String(l.line).padStart(3, " ")} | ${l.text}`).join("\n")
const matchLine = currentMatchLines[Math.floor(currentMatchLines.length / 2)].line

matches.push({
startLine,
endLine,
matchLine,
content,
})
}

return matches
} catch (error) {
console.error(`[searchPatternInFile] Error searching pattern in ${filePath}:`, error)
return []
}
}

// Types
interface LineRange {
start: number
Expand All @@ -63,6 +178,7 @@ interface LineRange {
interface FileEntry {
path?: string
lineRanges?: LineRange[]
pattern?: string // 新增:用於在檔案中搜索 regex pattern
}

// New interface to track file processing state
Expand All @@ -73,6 +189,7 @@ interface FileResult {
error?: string
notice?: string
lineRanges?: LineRange[]
pattern?: string // Pattern for searching within the file
xmlContent?: string // Final XML content for this file
imageDataUrl?: string // Image data URL for image files
feedbackText?: string // User feedback text from approval/denial
Expand Down Expand Up @@ -137,6 +254,7 @@ export async function readFileTool(
const fileEntry: FileEntry = {
path: file.path,
lineRanges: [],
pattern: file.pattern, // 解析 pattern 參數
}

if (file.line_range) {
Expand Down Expand Up @@ -196,6 +314,7 @@ export async function readFileTool(
path: entry.path || "",
status: "pending",
lineRanges: entry.lineRanges,
pattern: entry.pattern,
}))

// Function to update file result status
Expand Down Expand Up @@ -545,6 +664,118 @@ export async function readFileTool(
continue
}

// Handle pattern search (lightweight search mode)
if (fileResult.pattern) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation doesn't support combining pattern with line_range as documented in example 5 of the tool description. When both parameters are provided, line_range is processed first (line 651) and returns early with continue (line 664), preventing this pattern logic from executing. This means users cannot search for patterns within a specific line range as the documentation suggests. To fix this, the pattern search logic would need to either: (1) be executed before line_range handling and filter matches to the specified ranges, or (2) read only the specified line ranges first and then search within them.

const matches = await searchPatternInFile(
fullPath,
cline.cwd,
fileResult.pattern,
cline.rooIgnoreController,
)

// Track file read
await cline.fileContextTracker.trackFileContext(relPath, "read_tool" as RecordSource)

if (matches.length === 0) {
const xmlInfo = `<metadata>
<total_lines>${totalLines}</total_lines>
<pattern>${fileResult.pattern}</pattern>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User input (fileResult.pattern) is embedded directly into XML. For security and well-formed XML, please sanitize or escape XML special characters.

<matches_count>0</matches_count>
</metadata>
<notice>No matches found for pattern "${fileResult.pattern}"</notice>`

updateFileResult(relPath, {
xmlContent: `<file><path>${relPath}</path>\n${xmlInfo}\n</file>`,
})
continue
}

// 限制返回的匹配數量(最多 20 個)
const limitedMatches = matches.slice(0, 20)
const hasMore = matches.length > 20

let xmlInfo = `<metadata>
<total_lines>${totalLines}</total_lines>
<pattern>${fileResult.pattern}</pattern>
<matches_count>${matches.length}</matches_count>
${hasMore ? `<showing_matches>20</showing_matches>` : ""}
</metadata>\n`

// 添加每個匹配的內容
limitedMatches.forEach((match, idx) => {
const lineAttr = ` lines="${match.startLine}-${match.endLine}"`
xmlInfo += `<search_result${lineAttr}>\n${match.content}\n</search_result>\n`
})

// 添加提示信息
const firstMatch = limitedMatches[0]
const exampleStart = Math.max(1, firstMatch.matchLine - 10)
const exampleEnd = Math.min(totalLines, firstMatch.matchLine + 10)

let notice = `Found ${matches.length} match(es)${hasMore ? `, showing first 20` : ""}.`
notice += `\n\nTo read full context around a match, use line_range:\n`
notice += `<read_file>\n<args>\n <file>\n <path>${relPath}</path>\n`
notice += ` <line_range>${exampleStart}-${exampleEnd}</line_range> <!-- Context around first match -->\n`
notice += ` </file>\n</args>\n</read_file>`

xmlInfo += `<notice>${notice}</notice>`

updateFileResult(relPath, {
xmlContent: `<file><path>${relPath}</path>\n${xmlInfo}\n</file>`,
})
continue
}

// Auto-limit large files without line_range to protect context (Option G)
// This helps weak models learn to use line_range by showing them the correct syntax
if (totalLines > 300 && (!fileResult.lineRanges || fileResult.lineRanges.length === 0)) {
const autoLimitLines = 100
const content = addLineNumbers(await readLines(fullPath, autoLimitLines - 1, 0), 1)
const lineRangeAttr = ` lines="1-${autoLimitLines}"`
let xmlInfo = `<metadata>\n<total_lines>${totalLines}</total_lines>\n<showing_lines>1-${autoLimitLines}</showing_lines>\n</metadata>\n`
xmlInfo += `<content${lineRangeAttr}>\n${content}</content>\n`

// Try to get code definitions to help model locate specific sections
try {
const defResult = await parseSourceCodeDefinitionsForFile(fullPath, cline.rooIgnoreController)
if (defResult) {
xmlInfo += `<list_code_definition_names>${defResult}</list_code_definition_names>\n`
}
} catch (error) {
// Silently ignore definition parsing errors for non-supported languages
if (error instanceof Error && !error.message.startsWith("Unsupported language:")) {
console.warn(`[read_file] Warning parsing definitions: ${error.message}`)
}
}

// Educational notice to teach weak models how to use line_range
const educationalNotice = `⚠️ This file has ${totalLines} lines (exceeds 300-line threshold).
Showing first ${autoLimitLines} lines only to preserve context.

To read specific sections, use line_range parameter:
<read_file>
<args>
<file>
<path>${relPath}</path>
<line_range>1-100</line_range> <!-- First 100 lines -->
<line_range>450-550</line_range> <!-- Lines around specific function -->
</file>
</args>
</read_file>

You can specify multiple <line_range> elements to read non-adjacent sections efficiently.`

xmlInfo += `<notice>${educationalNotice}</notice>\n`

// Track file read
await cline.fileContextTracker.trackFileContext(relPath, "read_tool" as RecordSource)

updateFileResult(relPath, {
xmlContent: `<file><path>${relPath}</path>\n${xmlInfo}</file>`,
})
continue
}

// Handle definitions-only mode
if (maxReadFileLine === 0) {
try {
Expand Down
Loading