remove_accents option does not work properly anymore #444

alainv62 · 2023-01-08T21:14:43Z

Since the removal of the unidecode library and its replacement with the unicodedata module in commit 4d517d1, the remove_accents option does not work properly anymore.
Eg: in French, 'référence' is replaced with 'rfrence'.
It seems that the normal form KC is here responsible as the normal form KD works fine with this example ('référence' is properly replaced with 'reference').

The text was updated successfully, but these errors were encountered:

esmeraldas63 · 2023-01-12T12:48:24Z

Have encountered same issues with Lithuanian and Latvian letters too

bosd · 2023-01-13T06:20:53Z

We need to fix this! Currently im travelling. So not much time to attend to this.
Does anyone have a suggestion for an alternative implementation or library?

alainv62 · 2023-01-13T08:16:53Z

Why not just revert to the former solution optimized_str = unidecode(optimized_str) which was working fine?

bosd · 2023-01-14T07:17:11Z

Why not just revert to the former solution ?

Because of the mentioned license issue.

There are some good alternatives. Like built-ins or alternative libraries ( adding extra dependencies is not preferred).

alainv62 · 2023-01-14T08:06:10Z

What about using the normal form KD ?

rmilecki · 2023-01-21T15:27:21Z

Why not just revert to the former solution optimized_str = unidecode(optimized_str) which was working fine?

See description in pull request that introduced that change: #436

And for the orginal report: #435

bosd · 2023-02-05T20:03:39Z

What about using the normal form KD ?

Yes, that seems to fix it..
small test:

>>> str1="référence"
>>> unicodedata.normalize('NFKD', str1).encode('ascii', 'ignore').decode('ascii')
'reference'

>>> str2="ä"
>>> unicodedata.normalize('NFKD', str2).encode('ascii', 'ignore').decode('ascii')
'a'

bosd · 2023-02-05T20:40:37Z

It is probably an good idea to add a test for the remove accents function.
Is there an example invoice to add to the tests?

bosd mentioned this issue Feb 5, 2023

remove_accents: refactor to NFKD normalization #464

Merged

rmilecki closed this as completed in #464 Feb 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove_accents option does not work properly anymore #444

remove_accents option does not work properly anymore #444

alainv62 commented Jan 8, 2023

esmeraldas63 commented Jan 12, 2023

bosd commented Jan 13, 2023

alainv62 commented Jan 13, 2023

bosd commented Jan 14, 2023

alainv62 commented Jan 14, 2023

rmilecki commented Jan 21, 2023

bosd commented Feb 5, 2023

bosd commented Feb 5, 2023

remove_accents option does not work properly anymore #444

remove_accents option does not work properly anymore #444

Comments

alainv62 commented Jan 8, 2023

esmeraldas63 commented Jan 12, 2023

bosd commented Jan 13, 2023

alainv62 commented Jan 13, 2023

bosd commented Jan 14, 2023

alainv62 commented Jan 14, 2023

rmilecki commented Jan 21, 2023

bosd commented Feb 5, 2023

bosd commented Feb 5, 2023