Merge and simplify unescape #992

luukvbaal · 2021-05-08T22:03:37Z

While considering to implement the suggestion in #935 (comment) for my fork and looking at unescape() I thought the while loop to reduce the number of wide chars to max columns seemed unnecessary.

This merges unescape() for the default and NOLOCALE case and adjusts the number wide chars to maxcols in the same call to mbstowcs() by providing maxcols instead of NAME_MAX as the character limit, avoiding the extra while loop.

Also came across the issue mentioned in #989 while testing this. Without #989 applied wbuf[maxcols] = '\0'; results in a segfault with icons enabled and maxcols < 2.

jarun · 2021-05-09T01:50:00Z

I remember there was some issue with broken wide characters because of which we had to consider wcswidth().

The third param in mbstowcs() denotes max wide characters that are written to dest and not bytes. Some wide characters take more than one column (say, in Indian languages). I am not sure about this change.

luukvbaal · 2021-05-09T01:58:43Z

Ah I see. Yeah I hadn't considered that, my bad. Feel free to close as you see fit.

jarun · 2021-05-09T02:11:57Z

Yes, let's drop it.

luukvbaal · 2021-05-09T10:03:18Z

Yes, let's drop it.

@jarun Were you able to verify that this implementation doesn't work with any characters btw? I tested it with a wide glyph I have on my system and it seems to work fine:

Seems like a good improvement to me to merge the two unescape() functions and to drop the while loop for determining the widechar column width so if it works I'd say why not. If you have any other problems with the implementation of course nevermind.

jarun · 2021-05-09T11:05:45Z

It breaks. And please give me a break. Run mktest.sh and test with the korean chars.

I really keep busy with different stuff and I need people to stop harping on the same string.

jarun · 2021-05-09T11:09:09Z

OK. I understand you didn't know about the korean string we had for testing.

jarun · 2021-05-09T11:16:08Z

luukvbaal · 2021-05-09T12:03:26Z

I see, yeah I didn't know about the test string sorry.

jarun · 2021-05-09T12:59:19Z

No problem!

luukvbaal · 2021-05-09T13:09:33Z

BTW I found an implementation that works for the Korean strings while still merging and simplifying the two unescape() functions and still getting rid of the "Reduce number of wide chars to max columns" while loop: luukvbaal@1819922.

Let me know if I should reopen the PR or if you have no interest in merging the two nevermind.

jarun · 2021-05-09T13:35:26Z

I am thinking of the following detection logic:

diff --git a/src/nnn.c b/src/nnn.c
index dde27b6..9ba8028 100644
--- a/src/nnn.c
+++ b/src/nnn.c
@@ -3417,7 +3417,7 @@ static void resetdircolor(int flags)
  * Max supported str length: NAME_MAX;
  */
 #ifndef NOLOCALE
-static wchar_t *unescape(const char *str, uint_t maxcols)
+static wchar_t *unescape_w(const char *str, uint_t maxcols)
 {
        wchar_t * const wbuf = (wchar_t *)g_buf;
        wchar_t *buf = wbuf;
@@ -3457,19 +3457,23 @@ static wchar_t *unescape(const char *str, uint_t maxcols)
 
        return wbuf;
 }
-#else
+#endif
+
 static char *unescape(const char *str, uint_t maxcols)
 {
        ssize_t len = (ssize_t)xstrsncpy(g_buf, str, maxcols);
 
        --len;
-       while (--len >= 0)
+       while (--len >= 0) {
+#ifndef NOLOCALE
+               if (g_buf[len] & 0x8)
+                       return NULL;
+#endif
                if (g_buf[len] <= '\x1f' || g_buf[len] == '\x7f')
                        g_buf[len] = '\?';
-
+       }
        return g_buf;
 }
-#endif
 
 static off_t get_size(off_t size, off_t *pval, uint_t comp)
 {
@@ -3754,7 +3758,10 @@ static void printent(const struct entry *ent, uint_t namecols, bool sel)
                ++namecols;
 
 #ifndef NOLOCALE
-       addwstr(unescape(ent->name, namecols));
+       if (unescape(ent->name, MIN(namecols, ent->nlen) + 1))
+               addstr(g_buf);
+       else
+               addwstr(unescape_w(ent->name, namecols));
 #else
        addstr(unescape(ent->name, MIN(namecols, ent->nlen) + 1));
 #endif

luukvbaal · 2021-05-09T13:45:27Z

Hmm okay. luukvbaal@92e01d4 is my final implementation of how I would merge the two and I see no problems with it.

Seems more elegant to me and ~~should be more performant due to dropping a while loop~~ but do whatever you feel is best.

Edit: the function for clarity

#ifdef NOLOCALE
static char *unescape(const char *str, uint_t maxcols)
{
	char * const wbuf = g_buf;
	char *buf = wbuf;

	xstrsncpy(wbuf, str, maxcols);
#else
static wchar_t *unescape(const char *str, uint_t maxcols)
{
	wchar_t * const wbuf = (wchar_t *)g_buf;
	wchar_t *buf = wbuf;
	size_t len, lencount = maxcols + 1;

	/* Convert multi-byte to wide char */
	mbstowcs(wbuf, str, maxcols);
	len = wcswidth(wbuf, --lencount);

	if (len >= maxcols) {
		/* Reduce wide chars one by one till it fits */
		while (len > maxcols)
			len = wcswidth(wbuf, --lencount);

		wbuf[lencount] = L'\0';
	}
#endif

	while (*buf) {
		if (*buf <= '\x1f' || *buf == '\x7f')
			*buf = '\?';

		++buf;
	}

	return wbuf;
}

jarun · 2021-05-09T14:07:38Z

The detection (plus on-the-fly escaping) logic performs better on English-native systems but not on non-English-native systems.

jarun · 2021-05-09T14:27:24Z

I think your final version is optimized enough given we consider maxcols already. So let's have it in. The detection will fail in every case on a German or Japanese system.

luukvbaal · 2021-05-09T14:31:31Z

Sure, made the PR #993.

Merge and simplify unescape

afd2b7e

jarun closed this May 9, 2021

luukvbaal mentioned this pull request May 9, 2021

Merge and simplify unescape #993

Merged

github-actions bot locked and limited conversation to collaborators Jun 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge and simplify unescape #992

Merge and simplify unescape #992

luukvbaal commented May 8, 2021 •

edited

Loading

jarun commented May 9, 2021

luukvbaal commented May 9, 2021

jarun commented May 9, 2021

luukvbaal commented May 9, 2021

jarun commented May 9, 2021

jarun commented May 9, 2021

jarun commented May 9, 2021

luukvbaal commented May 9, 2021

jarun commented May 9, 2021

luukvbaal commented May 9, 2021 •

edited

Loading

jarun commented May 9, 2021 •

edited

Loading

luukvbaal commented May 9, 2021 •

edited by jarun

Loading

jarun commented May 9, 2021 •

edited

Loading

jarun commented May 9, 2021

luukvbaal commented May 9, 2021

Merge and simplify unescape #992

Merge and simplify unescape #992

Conversation

luukvbaal commented May 8, 2021 • edited Loading

jarun commented May 9, 2021

luukvbaal commented May 9, 2021

jarun commented May 9, 2021

luukvbaal commented May 9, 2021

jarun commented May 9, 2021

jarun commented May 9, 2021

jarun commented May 9, 2021

luukvbaal commented May 9, 2021

jarun commented May 9, 2021

luukvbaal commented May 9, 2021 • edited Loading

jarun commented May 9, 2021 • edited Loading

luukvbaal commented May 9, 2021 • edited by jarun Loading

jarun commented May 9, 2021 • edited Loading

jarun commented May 9, 2021

luukvbaal commented May 9, 2021

luukvbaal commented May 8, 2021 •

edited

Loading

luukvbaal commented May 9, 2021 •

edited

Loading

jarun commented May 9, 2021 •

edited

Loading

luukvbaal commented May 9, 2021 •

edited by jarun

Loading

jarun commented May 9, 2021 •

edited

Loading