Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge and simplify unescape #992

Closed
wants to merge 1 commit into from
Closed

Merge and simplify unescape #992

wants to merge 1 commit into from

Conversation

luukvbaal
Copy link
Collaborator

@luukvbaal luukvbaal commented May 8, 2021

While considering to implement the suggestion in #935 (comment) for my fork and looking at unescape() I thought the while loop to reduce the number of wide chars to max columns seemed unnecessary.

This merges unescape() for the default and NOLOCALE case and adjusts the number wide chars to maxcols in the same call to mbstowcs() by providing maxcols instead of NAME_MAX as the character limit, avoiding the extra while loop.

Also came across the issue mentioned in #989 while testing this. Without #989 applied wbuf[maxcols] = '\0'; results in a segfault with icons enabled and maxcols < 2.

@jarun
Copy link
Owner

jarun commented May 9, 2021

I remember there was some issue with broken wide characters because of which we had to consider wcswidth().

The third param in mbstowcs() denotes max wide characters that are written to dest and not bytes. Some wide characters take more than one column (say, in Indian languages). I am not sure about this change.

@luukvbaal
Copy link
Collaborator Author

Ah I see. Yeah I hadn't considered that, my bad. Feel free to close as you see fit.

@jarun
Copy link
Owner

jarun commented May 9, 2021

Yes, let's drop it.

@jarun jarun closed this May 9, 2021
@luukvbaal
Copy link
Collaborator Author

Yes, let's drop it.

@jarun Were you able to verify that this implementation doesn't work with any characters btw? I tested it with a wide glyph I have on my system and it seems to work fine:
image
image

Seems like a good improvement to me to merge the two unescape() functions and to drop the while loop for determining the widechar column width so if it works I'd say why not. If you have any other problems with the implementation of course nevermind.

@jarun
Copy link
Owner

jarun commented May 9, 2021

It breaks. And please give me a break. Run mktest.sh and test with the korean chars.

I really keep busy with different stuff and I need people to stop harping on the same string.

@jarun
Copy link
Owner

jarun commented May 9, 2021

OK. I understand you didn't know about the korean string we had for testing.

@jarun
Copy link
Owner

jarun commented May 9, 2021

@luukvbaal
Copy link
Collaborator Author

I see, yeah I didn't know about the test string sorry.

@jarun
Copy link
Owner

jarun commented May 9, 2021

No problem!

@luukvbaal
Copy link
Collaborator Author

luukvbaal commented May 9, 2021

BTW I found an implementation that works for the Korean strings while still merging and simplifying the two unescape() functions and still getting rid of the "Reduce number of wide chars to max columns" while loop: luukvbaal@1819922.

Let me know if I should reopen the PR or if you have no interest in merging the two nevermind.

@jarun
Copy link
Owner

jarun commented May 9, 2021

I am thinking of the following detection logic:

diff --git a/src/nnn.c b/src/nnn.c
index dde27b6..9ba8028 100644
--- a/src/nnn.c
+++ b/src/nnn.c
@@ -3417,7 +3417,7 @@ static void resetdircolor(int flags)
  * Max supported str length: NAME_MAX;
  */
 #ifndef NOLOCALE
-static wchar_t *unescape(const char *str, uint_t maxcols)
+static wchar_t *unescape_w(const char *str, uint_t maxcols)
 {
        wchar_t * const wbuf = (wchar_t *)g_buf;
        wchar_t *buf = wbuf;
@@ -3457,19 +3457,23 @@ static wchar_t *unescape(const char *str, uint_t maxcols)
 
        return wbuf;
 }
-#else
+#endif
+
 static char *unescape(const char *str, uint_t maxcols)
 {
        ssize_t len = (ssize_t)xstrsncpy(g_buf, str, maxcols);
 
        --len;
-       while (--len >= 0)
+       while (--len >= 0) {
+#ifndef NOLOCALE
+               if (g_buf[len] & 0x8)
+                       return NULL;
+#endif
                if (g_buf[len] <= '\x1f' || g_buf[len] == '\x7f')
                        g_buf[len] = '\?';
-
+       }
        return g_buf;
 }
-#endif
 
 static off_t get_size(off_t size, off_t *pval, uint_t comp)
 {
@@ -3754,7 +3758,10 @@ static void printent(const struct entry *ent, uint_t namecols, bool sel)
                ++namecols;
 
 #ifndef NOLOCALE
-       addwstr(unescape(ent->name, namecols));
+       if (unescape(ent->name, MIN(namecols, ent->nlen) + 1))
+               addstr(g_buf);
+       else
+               addwstr(unescape_w(ent->name, namecols));
 #else
        addstr(unescape(ent->name, MIN(namecols, ent->nlen) + 1));
 #endif

@luukvbaal
Copy link
Collaborator Author

luukvbaal commented May 9, 2021

Hmm okay. luukvbaal@92e01d4 is my final implementation of how I would merge the two and I see no problems with it.

Seems more elegant to me and should be more performant due to dropping a while loop but do whatever you feel is best.

Edit: the function for clarity

#ifdef NOLOCALE
static char *unescape(const char *str, uint_t maxcols)
{
	char * const wbuf = g_buf;
	char *buf = wbuf;

	xstrsncpy(wbuf, str, maxcols);
#else
static wchar_t *unescape(const char *str, uint_t maxcols)
{
	wchar_t * const wbuf = (wchar_t *)g_buf;
	wchar_t *buf = wbuf;
	size_t len, lencount = maxcols + 1;

	/* Convert multi-byte to wide char */
	mbstowcs(wbuf, str, maxcols);
	len = wcswidth(wbuf, --lencount);

	if (len >= maxcols) {
		/* Reduce wide chars one by one till it fits */
		while (len > maxcols)
			len = wcswidth(wbuf, --lencount);

		wbuf[lencount] = L'\0';
	}
#endif

	while (*buf) {
		if (*buf <= '\x1f' || *buf == '\x7f')
			*buf = '\?';

		++buf;
	}

	return wbuf;
}

@jarun
Copy link
Owner

jarun commented May 9, 2021

The detection (plus on-the-fly escaping) logic performs better on English-native systems but not on non-English-native systems.

@jarun
Copy link
Owner

jarun commented May 9, 2021

I think your final version is optimized enough given we consider maxcols already. So let's have it in. The detection will fail in every case on a German or Japanese system.

@luukvbaal
Copy link
Collaborator Author

Sure, made the PR #993.

@github-actions github-actions bot locked and limited conversation to collaborators Jun 9, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants