Skip to content

Adding sentences from the AdvBench dataset #30

@cassiasamp

Description

@cassiasamp

Description (Actual Behavior)

Currently the sentences in the API files and the API consider harmful and biased prompts.

Expected Behavior

It is expected that the API also identifies prompts that are adversarial and might constitute an LLM attack.

Possible Approach

Include some prompts from AdvBench, a benchmark for adversarial prompts and expand the API to identify them.

Steps to Reproduce

N/A

Context

N/A

Metadata

Metadata

Assignees

Labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions