Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support chat modality in attempts #644

Merged
merged 30 commits into from
Jul 1, 2024
Merged

support chat modality in attempts #644

merged 30 commits into from
Jul 1, 2024

Conversation

leondz
Copy link
Collaborator

@leondz leondz commented May 1, 2024

new pattern: chat histories are supported and managed entirely in Attempt

Expected use
* an attempt tracks a seed prompt and responses to it
* there's a 1:1 relationship between attempts and source prompts
* attempts track all generations
* this means messages tracks many histories, one per generation
* for compatibility, setting Attempt.prompt will set just one turn, and this is unpacked later
when output is set; we don't know the # generations to expect until some output arrives
* to keep alignment, generators need to return aligned lists of length == #generations

Implementation is by selectively overriding access to Attempt.prompt and Attempt.outputs. The hope is to retain existing compatibility across garak, while also tracking message histories for plugins that use them.

Managing histories means output tracking, so we need a 1:1 mapping from prompt to output. Because there are typically multiple output generations to one seed prompt, this needs to managed. Two options:

  1. Attempt tracks many histories, one for each of the number of output generations requested. This requires being aware of / sensitive to the number of generations.
  2. Many Attempts, one per generation

Because 2. breaks the underlying data model, we prefer 1. This does mean being aware of the number of outputs, so a few consistency checks are built in to attempt, as well as silently upgrading from one to multiple histories once we know the number of outputs. Tests are included.

Patterns/expectations for Attempt getting:
.prompt - returns the first user prompt
.outputs - returns the most recent model outputs
.latest_prompts - returns a list of the latest user prompts
.all_outputs - returns all model outputs from all dialogue turns

Patterns/expectations for Attempt setting:
.prompt - sets the first prompt, or fails if this has already been done
.outputs - sets a new layer of model responses. silently handles expansion of prompt to multiple histories
.latest_prompts - adds a new set of user prompts

pending work: atkgen has a roll-your-own chat model, which is not compatible

resolves #109

@leondz leondz added the enhancement Architectural upgrades label May 1, 2024
@leondz
Copy link
Collaborator Author

leondz commented May 1, 2024

todo:

  • refactor getters for .latest_prompts and .outputs

@leondz
Copy link
Collaborator Author

leondz commented May 1, 2024

todo:

  • check the integration with buffs, and write a test or two

@leondz leondz marked this pull request as ready for review June 25, 2024 11:58
garak/attempt.py Show resolved Hide resolved
garak/detectors/promptinject.py Outdated Show resolved Hide resolved
garak/probes/base.py Outdated Show resolved Hide resolved
@leondz leondz mentioned this pull request Jun 27, 2024
@leondz leondz merged commit af626f1 into main Jul 1, 2024
6 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Jul 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement Architectural upgrades
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support chat modality
3 participants