Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor calibration / z-score code #847

Merged
merged 22 commits into from
Aug 31, 2024
Merged

Conversation

leondz
Copy link
Collaborator

@leondz leondz commented Aug 21, 2024

  • add tests
  • move calculation out of report_digest
  • condition inclusion of explainer in report html, on whether any zscores were there
  • handle failed calibration loads gracefully

@leondz
Copy link
Collaborator Author

leondz commented Aug 21, 2024

  • added option (in config) to show z-score and summary viz in CLI
  • fixed bug in ThresholdEvaluator
  • add various viz schemas to resources.theme

resolves #840

@leondz leondz marked this pull request as ready for review August 21, 2024 09:41
@leondz leondz linked an issue Aug 21, 2024 that may be closed by this pull request
Copy link
Collaborator

@jmartin-tech jmartin-tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extraction makes this much easier to have meaningful testing, however the deferral of error checking seems a bit overzealous. More early validation seems in order, and may enable more clarity in the reporting.

garak/analyze/calibration.py Outdated Show resolved Hide resolved
garak/analyze/calibration.py Outdated Show resolved Hide resolved
Comment on lines 33 to 35
c = garak.analyze.calibration.Calibration("")
c.load_calibration()
assert isinstance(c._data, dict)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems odd, if the value passed is not a valid file path should this raise an exception?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess an exception should be raised at some level, but nothing that stops execution. Added check that load_calibration() returns None (vs. the number of records loaded)


def load_calibration(
self, calibration_filename: Union[str, None] = None
) -> Union[None, int]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this method expected to return a value if the value is never checked?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it now is, in tests, to signal when an exception occurred. i can see hot-swapping of calibrations in the future, so I want load_calibration to be self-contained.

garak/analyze/calibration.py Show resolved Hide resolved
/ "calibration.json"
)
else:
self.calibration_filename = calibration_path
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer early input validation of the provided value as an existing path when not None. A lazy load of the data seems reasonable, however verification the value will be consumable seems appropriate.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added validation (verification deferred to existing code invoked at load time)

Copy link
Collaborator

@jmartin-tech jmartin-tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This achieves the initial goal. I am not sold on the idea that the class should defer all evaluation of the file until the runtime retrieval of a z_score however not a critical requirement.

I would prefer to see consumers that instantiate the class guard for an exception and avoid further calls to the class if a valid calibration object is not created. I suspect I am simply not aware of the reason this provides an advantage, and as currently used, I am not seeing value in logging an error on first access of a score vs on create of the object.

@leondz
Copy link
Collaborator Author

leondz commented Aug 27, 2024

I would prefer to see consumers that instantiate the class guard for an exception and avoid further calls to the class if a valid calibration object is not created.

On reflection I think this might be a better way to go, will take a look

@leondz leondz marked this pull request as draft August 27, 2024 19:05
@leondz leondz marked this pull request as ready for review August 28, 2024 08:53
Copy link
Collaborator

@erickgalinkin erickgalinkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments, at least one thing I'd like to refactor for cleanliness. However, overall, I'm sold.

garak/analyze/report_digest.py Outdated Show resolved Hide resolved
5: "excellent",
}

ZSCORE_DEFCON_BOUNDS = [-1, -0.125, 0.125, 1]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DEFCON?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

natural name for a 5-point likert scale indicating recommended degree of panic, esp if one was exposed to War Games as a child. my man j-moss has my back on this

garak/analyze/calibration.py Show resolved Hide resolved
garak/resources/theme/__init__.py Show resolved Hide resolved
@jmartin-tech jmartin-tech merged commit 60a2108 into main Aug 31, 2024
10 checks passed
@jmartin-tech jmartin-tech deleted the update/calibration_refactor branch August 31, 2024 19:23
@github-actions github-actions bot locked and limited conversation to collaborators Aug 31, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

factor out Z-score code
3 participants