-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor calibration / z-score code #847
Conversation
…ation load gracefully; only show z-score explainer if there are z-scores present
…ding & fail gracefully; factor out Z clacluation
…; fix bug in ThresholdEvaluator where parent constructor wasn't called
resolves #840 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The extraction makes this much easier to have meaningful testing, however the deferral of error checking seems a bit overzealous. More early validation seems in order, and may enable more clarity in the reporting.
tests/analyze/test_calibration.py
Outdated
c = garak.analyze.calibration.Calibration("") | ||
c.load_calibration() | ||
assert isinstance(c._data, dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems odd, if the value passed is not a valid file path should this raise an exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess an exception should be raised at some level, but nothing that stops execution. Added check that load_calibration()
returns None (vs. the number of records loaded)
|
||
def load_calibration( | ||
self, calibration_filename: Union[str, None] = None | ||
) -> Union[None, int]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this method expected to return a value if the value is never checked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it now is, in tests, to signal when an exception occurred. i can see hot-swapping of calibrations in the future, so I want load_calibration
to be self-contained.
/ "calibration.json" | ||
) | ||
else: | ||
self.calibration_filename = calibration_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer early input validation of the provided value as an existing path when not None
. A lazy load of the data seems reasonable, however verification the value will be consumable seems appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added validation (verification deferred to existing code invoked at load time)
Co-authored-by: Jeffrey Martin <[email protected]> Signed-off-by: Leon Derczynski <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This achieves the initial goal. I am not sold on the idea that the class should defer all evaluation of the file until the runtime retrieval of a z_score
however not a critical requirement.
I would prefer to see consumers that instantiate the class guard for an exception and avoid further calls to the class if a valid calibration object is not created. I suspect I am simply not aware of the reason this provides an advantage, and as currently used, I am not seeing value in logging an error on first access of a score vs on create of the object.
On reflection I think this might be a better way to go, will take a look |
…on failed load to simplify logic elsewhere (this function will always have to be able to signal 'not found' anyway)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments, at least one thing I'd like to refactor for cleanliness. However, overall, I'm sold.
5: "excellent", | ||
} | ||
|
||
ZSCORE_DEFCON_BOUNDS = [-1, -0.125, 0.125, 1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DEFCON?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
natural name for a 5-point likert scale indicating recommended degree of panic, esp if one was exposed to War Games as a child. my man j-moss has my back on this
report_digest