Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] remove_accents #479

Merged
merged 1 commit into from
Feb 20, 2023
Merged

[FIX] remove_accents #479

merged 1 commit into from
Feb 20, 2023

Conversation

bosd
Copy link
Collaborator

@bosd bosd commented Feb 19, 2023

Use an regex method to remove the combining diacritical marks

https://en.wikipedia.org/wiki/Combining_Diacritical_Marks

fixes #477

use the method to remove the combining diacritical marks

https://en.wikipedia.org/wiki/Combining_Diacritical_Marks

fixes invoice-x#477
@bosd
Copy link
Collaborator Author

bosd commented Feb 19, 2023

Some context:
To come up with this solution I tested a handfull of different methods to solve this.
Eventually I picked the one which did deliver the desired outcome and best performance.

I'm to humble to share the messy code of these tests. But here is the performance graph.

remove_accents_perf

strip_accent function was clearly the slowest performing, when the input string increases.
So eventually I went for the one with the re one (shown in red).
The orange plot is the current implementation of remove accents. It's the fastest. But removes too much characters.
So not an option.

The purple plot is the same function as the red one. Only the python .re module has been exchanged for the regex module.

Copy link
Collaborator

@rmilecki rmilecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quickly tested with my WIP tests coverage in #478, seems to work.

@bosd bosd merged commit 40f1f92 into invoice-x:master Feb 20, 2023
@bosd bosd deleted the fix-remove_accents branch February 20, 2023 06:42
@bosd
Copy link
Collaborator Author

bosd commented Feb 20, 2023

Should we hotfix this? by making another release on pypi.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

remove_accents option deletes € sign
2 participants