Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report comparative info for detector scores #814

Merged
merged 26 commits into from
Aug 2, 2024
Merged

Conversation

leondz
Copy link
Owner

@leondz leondz commented Jul 31, 2024

Enable interpretation of scores in a run by calibrating them against a bag of models.

  • garak/analyze/perf_stats.py takes a glob of report jsonls and calculates mean, standard deviation, and shapiro-wilk p-values (latter is to assess how well the spread of scores fit a normal distribution) for each probe/detector found
  • garak/resources/calibration contains the files from which stats are derived in a comparison. These contents are generated from perf_stats.py
  • garak/analyze/report_digest.py and templates updated to calculate a z-score for probe/detector combinations where this is possible, given a default calibration json, to print it in the html output, and to also report what this score means & where it came from

@leondz leondz added the reporting Reporting, analysis, and other per-run result functions label Jul 31, 2024
@leondz leondz marked this pull request as ready for review August 1, 2024 16:09
Copy link
Collaborator

@jmartin-tech jmartin-tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing in progress, can you update the description here to explain what this PR does and offer example run of perf_stats.py? Don't need the actual report files just breadcrumbs to follow when we need to update these resources.

garak/analyze/misp.py Show resolved Hide resolved
garak/analyze/report_digest.py Outdated Show resolved Hide resolved
@leondz
Copy link
Owner Author

leondz commented Aug 1, 2024

Yup, done!

Copy link
Collaborator

@jmartin-tech jmartin-tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to #813 the resource files here need to replace ContinueSlursReclaimedSlurs80 with ContinueSlursReclaimedSlursMini.

@leondz
Copy link
Owner Author

leondz commented Aug 2, 2024

Outstanding catch

@leondz leondz merged commit ecb959d into main Aug 2, 2024
10 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Aug 2, 2024
@leondz leondz deleted the feature/calibration-calc branch August 15, 2024 15:04
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
reporting Reporting, analysis, and other per-run result functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants