Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Context: add three new experimental retriever strategies #5494

Merged
merged 21 commits into from
Sep 24, 2024

Conversation

hitesh-1997
Copy link
Contributor

@hitesh-1997 hitesh-1997 commented Sep 7, 2024

Context

The PR adds three major context sources useful for autocomplete and next-edit-suggestion.

Recent Copy retriever.

Developers often copy/cut and paste context and before pasting we show autocomplete suggestions to the user. The PR leverages the copied content on the clipboard by the user as a context source.
I wasn't able to find any vscode api event exposed which triggers when user copy or cut text in the editor. This seems like an open issue: microsoft/vscode#30066

Another alternative I think of is to use a keybinding of Ctrl+x and Ctrl+c, but not fully sure about its implications, since this is one of the most common shortcuts.

As a workaround, The way current PR accomplishes the same is:

  1. Tracks the selection made by the user in the last 1 minutes and keeps tracks of upto 100 most recent selections.
  2. At the time of retrieval, checks if the current clipboard content matches the selection items in the list.

Recent View Ports

This context source captures and utilizes the recently viewed portions of code in the editor. It keeps track of the visible areas of code that the developer has scrolled through or focused on within a specified time frame.

Diagnostics

The Diagnostics context source leverages the diagnostic information provided by VS Code and language servers. It collects and utilizes information about errors, for a file as a context source.

Test plan

  1. Automated test
  • Added CI tests for each of the retrievers
  1. Manual test
  • Override the setting "cody.autocomplete.experimental.graphContext": "recent-copy" in vscode settings.
  • Observe the context events using Autocomplete Trace View

@hitesh-1997 hitesh-1997 marked this pull request as draft September 7, 2024 14:46
@hitesh-1997 hitesh-1997 marked this pull request as ready for review September 9, 2024 15:55
Comment on lines 21 to 24
constructor(
private readonly maxAgeMs: number,
private readonly maxSelections: number
) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use an object here to make the constructor call readable new RecentCopyRetriever(60 * 1000, 100).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 90 to 92
const firstOutdatedChange = this.trackedSelections.findIndex(selection => {
return now - selection.timestamp < this.maxAgeMs
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use > here to find outdates items?

Suggested change
const firstOutdatedChange = this.trackedSelections.findIndex(selection => {
return now - selection.timestamp < this.maxAgeMs
})
const firstOutdatedChange = this.trackedSelections.findIndex(selection => {
return now - selection.timestamp > this.maxAgeMs
})

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add basic unit tests to account for such cases. Sonnet 3.5 can probably cover 90% of cases here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion Valery :)
Added the test case for all the retrievers.

Comment on lines 74 to 75
// Remove the selection from the tracked selections
this.trackedSelections = this.trackedSelections.splice(existingSelectionIndex, 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assigns the removed element to trackedSelections instead of removing it. It should be:

Suggested change
// Remove the selection from the tracked selections
this.trackedSelections = this.trackedSelections.splice(existingSelectionIndex, 1)
// Remove the selection from the tracked selections
this.trackedSelections.splice(existingSelectionIndex, 1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected


public async retrieve(): Promise<AutocompleteContextSnippet[]> {
const clipboardContent = await vscode.env.clipboard.readText()
const selectionItem = this.getSelectionItemIfExist(clipboardContent)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const selectionItem = this.getSelectionItemIfExist(clipboardContent)
const selectionItem = this.trackedSelections.find(ts => ts.content === text)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 58 to 63
private async onDidChangeTextEditorSelection(
event: vscode.TextEditorSelectionChangeEvent
): Promise<void> {
const editor = event.textEditor
this.addSelectionForTracking(editor.document, editor.selection)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need an async function here?

Suggested change
private async onDidChangeTextEditorSelection(
event: vscode.TextEditorSelectionChangeEvent
): Promise<void> {
const editor = event.textEditor
this.addSelectionForTracking(editor.document, editor.selection)
}
private onDidChangeTextEditorSelection(event: vscode.TextEditorSelectionChangeEvent): void {
const editor = event.textEditor
this.addSelectionForTracking(editor.document, editor.selection)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh right, we dont need async, removed it

Comment on lines 101 to 105
this.trackedSelections = []
for (const disposable of this.disposables) {
disposable.dispose()
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this.trackedSelections = []
for (const disposable of this.disposables) {
disposable.dispose()
}
}
this.trackedSelections = []
for (const disposable of this.disposables) {
disposable.dispose()
}
this.disposables = []
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@hitesh-1997 hitesh-1997 force-pushed the hitesh/autocomplete-copy-retriever branch from 0b2aeeb to 54c4cfa Compare September 21, 2024 00:02
@hitesh-1997 hitesh-1997 marked this pull request as draft September 21, 2024 12:52
@hitesh-1997 hitesh-1997 marked this pull request as ready for review September 22, 2024 15:59
@hitesh-1997 hitesh-1997 marked this pull request as draft September 22, 2024 15:59
@hitesh-1997 hitesh-1997 marked this pull request as ready for review September 23, 2024 00:37
@olafurpg olafurpg changed the title experimental recent copy retriever Context: add three new experimental retriever strategies Sep 23, 2024
Copy link
Member

@olafurpg olafurpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍🏻 Only have nitpick comments on the coding style

Do I understand correctly that these context sources are disabled by default, and they can only be configured to run one at a time (not mixed)? Either way, not a blocking comment, just unclear to me how we will mix these

const diagnostic = [
createDiagnostic(
vscode.DiagnosticSeverity.Error,
new vscode.Range(1, 16, 1, 21),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I personally avoid writing test cases with hardcoded line numbers like these. I find it's better to compute the position based on markers in the original file like this

            function foo() {
                console.log(<DIAGNOSTIC_START/>'foo'<DIAGNOSTIC_END/>)
            }

The benefit of doing it this way is that you can more easily refactor the input code while having the start/end positions update automatically for you.

My experience is that hardcoded numbers like these quickly become out of sync and it's a nightmare to update them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a nice technique, will address in the follow up PR :)

}

public async getClipboardContent(): Promise<string> {
return vscode.env.clipboard.readText()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's better to use toplevel functions instead of class methods when you don't access this inside the method. In this case, you can probably just inline the call to vscode.env.clipboard.readText()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaah, I see we're mocking it out in the tests, which is why it's a method here. Fine by me, but might be worth adding a comment explaining the motivation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the comment

@@ -0,0 +1,97 @@
import type { AutocompleteContextSnippet } from '@sourcegraph/cody-shared'
import { debounce } from 'lodash'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, see #5648

Suggested change
import { debounce } from 'lodash'
import debounce from 'lodash/debounce'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

export class RecentViewPortRetriever implements vscode.Disposable, ContextRetriever {
public identifier = RetrieverIdentifier.RecentViewPortRetriever
private disposables: vscode.Disposable[] = []
private trackedViewPorts: Map<string, TrackedViewPort> = new Map()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: when using string keys, it can help with clarity to explain what the keys actually are either in the variable name or with a comment

Suggested change
private trackedViewPorts: Map<string, TrackedViewPort> = new Map()
private viewportsByDocumentUri: Map<string, TrackedViewPort> = new Map()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if (this.trackedViewPorts.size >= this.maxTrackedFiles) {
// Remove the least recently accessed viewport
const oldestKey = Array.from(this.trackedViewPorts.entries()).sort(
([, a], [, b]) => a.lastAccessTimestamp - b.lastAccessTimestamp
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be simpler to use LRUCache, which we are already using in the repo for tree-sitter trees.

Docs https://www.npmjs.com/package/lru-cache

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes, this is much simpler now, changed to lru cache.

@hitesh-1997
Copy link
Contributor Author

LGTM 👍🏻 Only have nitpick comments on the coding style

Do I understand correctly that these context sources are disabled by default, and they can only be configured to run one at a time (not mixed)? Either way, not a blocking comment, just unclear to me how we will mix these

A great point Olaf.
I am logging the data collected for these context sources here in the PR: #5661
This will help us how to combine these context sources to improve the model.

@hitesh-1997 hitesh-1997 merged commit 588501f into main Sep 24, 2024
18 checks passed
@hitesh-1997 hitesh-1997 deleted the hitesh/autocomplete-copy-retriever branch September 24, 2024 01:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants