Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: collapseDuplicatesTransformer does not collapse the last letter #77

Open
3 of 5 tasks
rion18 opened this issue Aug 28, 2024 · 0 comments
Open
3 of 5 tasks
Labels
bug Something isn't working

Comments

@rion18
Copy link
Contributor

rion18 commented Aug 28, 2024

Expected behavior

Using obscenity to censor a string containing repeating characters such as pppiiittt and a dataset that contains the word pit.

Using:

collapseDuplicatesTransformer({
  defaultThreshold: 1,
}),

I would expect the whole pppiiittt word to be matched.

Actual behavior

Instead, only the first t is detected, matching pppiiit. The final two t are "not a part of the profanity", while they should be.

Minimal reproducible example

const {
  englishDataset,
  parseRawPattern,
  DataSet,
  RegExpMatcher,
  collapseDuplicatesTransformer,
} = require('obscenity');

const data = new DataSet()
    .addAll(englishDataset)
    .addPhrase(phrase => 
      phrase
        .setMetadata({ originalWord: 'pit' })
        .addPattern(parseRawPattern('pit'))
    ).build();

const transformers = {
  blacklistMatcherTransformers: [
    collapseDuplicatesTransformer({
      defaultThreshold: 1,
    }),
  ],
  whitelistMatcherTransformers: [],
};

const matcher = new RegExpMatcher({
    ...profanityDataset,
    ...transformers,
  });

const stringPit = 'ppiitt';
if (matcher.hasMatch(stringPit)) {
  const matches = matcher.getAllMatches(stringPit, true);
  return textCensor.applyTo(stringPit, matches);
}
return stringPit;

Steps to reproduce

No response

Additional context

No response

Node.js version

18.17.1

Obscenity version

0.4.0

Priority

  • Low
  • Medium
  • High

Terms

  • I agree to follow the project's Code of Conduct.
  • I have searched existing issues for similar reports.
@rion18 rion18 added the bug Something isn't working label Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant