Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added some firstnames to the whitelist #93

Closed

Conversation

kaushal-aubie
Copy link

Type of change:

  • Whitelisted some words that has shit term in it

These names should be considered as whitelist terms.

@kaushal-aubie
Copy link
Author

#47 (comment)

Copy link
Owner

@jo3-l jo3-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the report and the PR, @kaushal-aubie. Given the number of whitelisted entries, I would prefer to add word boundaries to the pattern instead: |shit; this probably avoids some other false positives we have not considered with the current change.

I will look into a new release this weekend, but in the meantime--if you are urgently impacted by this problem--you can easily work around it in your application by removing the word shit from the English dataset using removePhrasesIf.

Copy link

codecov bot commented Jan 16, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (36b6512) to head (92ec932).
Report is 40 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##              main       #93   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           27        26    -1     
  Lines          505       473   -32     
  Branches        92        82   -10     
=========================================
- Hits           505       473   -32     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jo3-l jo3-l closed this in 9554e7c Jan 18, 2025
@HatScripts
Copy link
Contributor

HatScripts commented Jan 21, 2025

I would prefer to add word boundaries to the pattern instead: |shit

Doing this means that words where "-shit" is the suffix ("bullshit", "dipshit", "batshit", etc.) won't be censored.

Here's my original issue: #47

From Wiktionary:
Screenshot 2025-01-21 154845

Many more here

@jo3-l
Copy link
Owner

jo3-l commented Jan 21, 2025

Sigh. Yes, indeed, my apologies. In general, I err on minimizing false positives--and it seems like "shit" appears pretty often in the middle of Indian names in particular, meaning that false positives can be a real irritating issue--but I need to think harder about what to do here. I'll reopen your original issue.

@jo3-l jo3-l mentioned this pull request Jan 21, 2025
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants