Skip to content
This repository has been archived by the owner on Jan 6, 2022. It is now read-only.

added mb_ str functions #5

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

bariew
Copy link

@bariew bariew commented Jan 27, 2017

This is for non-english strings encoding support

@hannesvdvreken
Copy link

@bariew See d4h/php-finediff. The mb_ functions are used there.

@arvindpdmn
Copy link

Sorry if this is a stupid question. I was able to diff text containing for example éλ with the current code. Why do we need mb_str functions?

@hannesvdvreken
Copy link

If you would diff 2 strings with 4 bytes (2 multibyte characters), and only the first byte of the 4 bytes is the same in both strings, then this would trip.

@arvindpdmn
Copy link

I did a test with two strings: ùù changed to éλ where the UTF-8 encodings are \xc3\xb9\xc3\xb9 and \xc3\xa9\xce\xbb respectively.

@hannesvdvreken, you are right that the diff fails at character level. If I do word-level diff, it works. For my application, I'm using only word level diff. Can I therefore ignore mb_str? Thanks.

@hannesvdvreken
Copy link

For my application, I'm using only word level diff. Can I therefore ignore mb_str?

I guess so. Edges of words are defined by spaces and spaces are not multibyte. If words defined by multibytes are the same, then words defined by bytes are the same too.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants