support chat modality in attempts #644

leondz · 2024-05-01T09:11:53Z

new pattern: chat histories are supported and managed entirely in Attempt

Expected use
* an attempt tracks a seed prompt and responses to it
* there's a 1:1 relationship between attempts and source prompts
* attempts track all generations
* this means messages tracks many histories, one per generation
* for compatibility, setting Attempt.prompt will set just one turn, and this is unpacked later
when output is set; we don't know the # generations to expect until some output arrives
* to keep alignment, generators need to return aligned lists of length == #generations

Implementation is by selectively overriding access to Attempt.prompt and Attempt.outputs. The hope is to retain existing compatibility across garak, while also tracking message histories for plugins that use them.

Managing histories means output tracking, so we need a 1:1 mapping from prompt to output. Because there are typically multiple output generations to one seed prompt, this needs to managed. Two options:

Attempt tracks many histories, one for each of the number of output generations requested. This requires being aware of / sensitive to the number of generations.
Many Attempts, one per generation

Because 2. breaks the underlying data model, we prefer 1. This does mean being aware of the number of outputs, so a few consistency checks are built in to attempt, as well as silently upgrading from one to multiple histories once we know the number of outputs. Tests are included.

Patterns/expectations for Attempt getting:
.prompt - returns the first user prompt
.outputs - returns the most recent model outputs
.latest_prompts - returns a list of the latest user prompts
.all_outputs - returns all model outputs from all dialogue turns

Patterns/expectations for Attempt setting:
.prompt - sets the first prompt, or fails if this has already been done
.outputs - sets a new layer of model responses. silently handles expansion of prompt to multiple histories
.latest_prompts - adds a new set of user prompts

pending work: atkgen has a roll-your-own chat model, which is not compatible

resolves #109

leondz · 2024-05-01T09:19:04Z

todo:

refactor getters for .latest_prompts and .outputs

leondz · 2024-05-01T10:00:08Z

todo:

check the integration with buffs, and write a test or two

…ast test to be clearly a 5x5 test

…e handling constructor vars

…attempts

…nversation has started

…hreads

…g and float-returning

…s to apply

garak/attempt.py

garak/detectors/promptinject.py

garak/probes/base.py

…ttempt

support chat modality in attempts

fe5b8d4

leondz added the enhancement Architectural upgrades label May 1, 2024

Merge branch 'main' into feature/dialogue

bb72070

leondz mentioned this pull request Jun 19, 2024

Refactor huggingface config support #742

Merged

leondz added 18 commits June 25, 2024 08:45

retain test_attempt

70f61e5

raise informative exception if outputs set before prompt set

6d965a8

comply w/ requirement to set attempt prompt before outputs; renamed l…

eeb25fa

…ast test to be clearly a 5x5 test

disable setting outputs, messages in constructor; init messages befor…

d4c8847

…e handling constructor vars

clarify access patterns for outputs, prompt

1a02a4d

ensure prompt is set before manipulating outputs

fe94bf5

stop HF Detector from manipulating attempt outputs

c17a85c

use correct constructor signature

a64b2c9

use typeerror for unsupported operation; rename add_turn + add docstring

0b11f40

test outputs behaviour; refactor and enable function-based access of …

c8068c3

…attempts

tolerate retrieving attempt.outputs even if there are none and the co…

a6a92ac

…nversation has started

update atkgen to use attempt-based chat modality

326e8e9

move to doing detection on all_outputs, which flattens conversation t…

ad0636a

…hreads

handle Nones in results

752b6e4

handle None as non-hit in TriggerListDetector

d5e8c57

'None' is not a continuation

096dda3

check detectors are using all_outputs; fix some detector None-handlin…

106cc36

…g and float-returning

relax constraint on setting prompt in order to allow single-turn buff…

c927036

…s to apply

leondz marked this pull request as ready for review June 25, 2024 11:58

leondz requested review from jmartin-tech and erickgalinkin June 25, 2024 11:59

erickgalinkin reviewed Jun 25, 2024

View reviewed changes

garak/attempt.py Show resolved Hide resolved

garak/detectors/promptinject.py Outdated Show resolved Hide resolved

garak/probes/base.py Outdated Show resolved Hide resolved

leondz added 2 commits June 26, 2024 07:49

Merge branch 'main' into feature/dialogue

f675e1d

rm print debug stmt

4cdc170

leondz added 5 commits June 26, 2024 08:44

setting attempt prompt to None is not a supported operation

2786cba

reduce number of operations in minting attempt

401dec9

black

9ddb38e

re-enable atkgen/chat test

61387e6

test type check on setting prompt to None

2c36272

leondz mentioned this pull request Jun 26, 2024

align detector output types & results #756

Open

Merge branch 'main' into feature/dialogue

0c96c59

leondz mentioned this pull request Jun 27, 2024

Add Skeleton Key probe #757

Open

leondz added 2 commits July 1, 2024 16:40

Merge branch 'main' into feature/dialogue

5f3fc12

_mint_attempt now takes a prompt param and leaves type gating up to a…

71c700f

…ttempt

jmartin-tech approved these changes Jul 1, 2024

View reviewed changes

leondz merged commit af626f1 into main Jul 1, 2024
6 checks passed

github-actions bot locked and limited conversation to collaborators Jul 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support chat modality in attempts #644

support chat modality in attempts #644

leondz commented May 1, 2024 •

edited

Loading

leondz commented May 1, 2024 •

edited

Loading

leondz commented May 1, 2024 •

edited

Loading

support chat modality in attempts #644

support chat modality in attempts #644

Conversation

leondz commented May 1, 2024 • edited Loading

leondz commented May 1, 2024 • edited Loading

leondz commented May 1, 2024 • edited Loading

leondz commented May 1, 2024 •

edited

Loading

leondz commented May 1, 2024 •

edited

Loading

leondz commented May 1, 2024 •

edited

Loading