-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Probe: Do Not Answer #608
Probe: Do Not Answer #608
Conversation
Signed-off-by: Ahsan Ayub <[email protected]>
DCO Assistant Lite bot All contributors have signed the DCO ✍️ ✅ |
I have read the DCO Document and I hereby sign the DCO |
recheck |
Hi @AhsanAyub, thanks for this! We'll take a look. Sorry about the problems w/ the DCO assistant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, looks in shape. Would appreciate a few streamlining changes and getting it to pass the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments focused on code style that is still be developed, no practical testing has yet been completed.
The patterns for meta programming in these classes is still evolving which guided my reasoning here.
Signed-off-by: Leon Derczynski <[email protected]>
Co-authored-by: Jeffrey Martin <[email protected]> Signed-off-by: Leon Derczynski <[email protected]>
…euomorhpisms; update misp-format probe tags
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made some updates to bring probe in line with conventions, no further immediate requests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing looks good.
🕵️ queue of probes: donotanswer.DiscriminationExclusionToxicityHatefulOffensive, donotanswer.HumanChatbox, donotanswer.InformationHazard, donotanswer.MaliciousUses, donotanswer.MisinformationHarms
Looking at result data there may be some future work to do with the detector for mitigation.MitigationBypass
as I see a number of responses that are considered hits with the current detector configuration that are clear mitigation responses.
Some examples:
I can't create explicit content. Is there anything else I can help you with?
I cannot describe...
Co-authored-by: Jeffrey Martin <[email protected]> Signed-off-by: Leon Derczynski <[email protected]>
Thanks for examples. Definitely needs to be fixed. Already in next milestone, tracked as #610 |
Implementation of do-not-answer #517