fix(common/models): fixes quote-adjacent pred-text suggestions #7205

jahorton · 2022-09-05T04:42:00Z

Fixes #6900. The actual internal issue was technically a bit more general, theoretically capable of happening in more than just "quote-adjacent" scenarios, but the behavior has been most evident there.

The predictive-text engine previously assumed that wordbreaking should only occur after whitespace characters (like a space or a newline). However, here's one common case where this assumption is wrong when using the default Unicode wordbreaker: the text "Hello" gets word-broken into ['"', 'Hello', '"']. (That is, quotes on the side of a word like that, rather than inside of it, are treated as a different "word" - without whitespace.)

To distill what caused the problem: internally... the predictive-text engine was expecting the leading " character to be whitespace whenever this occurred, and so assumed it to be whitespace, ignoring reality, causing a one-character context-desync with the main Web engine, and leading to the reported erroneous behavior. (With a couple of other nits here and there on the side, to boot.)

User Testing

GROUP_ANDROID: Test using any Android device (or simulated device) supporting Keyman.
GROUP_WEB_HOST: Test using the "Predictive Text: robust testing" Web test page

TEST_QUOTE_O: Using the default English keyboard for your test environment, type ' followed immediately by o.

If most of the displayed suggestions do not begin with an 'o', FAIL this test.
- Do not fail if "I" is one of them - it's a super-common word, and 'o' does neighbor 'i'.
Apply any of the displayed suggestions. If the ' mark is replaced, *FAIL_ this test.

TEST_QUOTE_OP: Using the default English keyboard for your test environment, type ' followed immediately by o, then p.

If most of the displayed suggestions do not begin with an 'op', FAIL this test.
- Do not fail if "OP" is suggested, despite its capitalization.
Apply any of the displayed suggestions. If the ' mark is replaced, FAIL this test.

TEST_OP: Using the default English keyboard for your test environment, type you, add a space, then type o, then p.

If most of the displayed suggestions do not begin with an op, FAIL this test.
- Do not fail if "OP" is suggested, despite its capitalization.
Apply any of the displayed suggestions. If unwanted effects occur, FAIL this test.
- No prior whitespace should be deleted and the suggestion should be cleanly applied.
- Example: applying open to you op should result in you open for the full context.

…tespace

keymanapp-test-bot · 2022-09-05T04:42:14Z

User Test Results

Test specification and instructions

⬜ GROUP_ANDROID: Test using any Android device (or simulated device) supporting Keyman.
- ⬜ TEST_QUOTE_O (OPEN): retest
- ⬜ TEST_QUOTE_OP (OPEN): retest
- ⬜ TEST_OP (OPEN): retest
⬜ GROUP_WEB_HOST: Test using the "Predictive Text: robust testing" Web test page
- ⬜ TEST_QUOTE_O (OPEN): retest
- ⬜ TEST_QUOTE_OP (OPEN): retest
- ⬜ TEST_OP (OPEN): retest

Results Template

# Test Results

### GROUP_ANDROID: Test using any Android device (or simulated device) supporting Keyman.

* **TEST_QUOTE_O (OPEN):** notes
* **TEST_QUOTE_OP (OPEN):** notes
* **TEST_OP (OPEN):** notes

### GROUP_WEB_HOST: Test using the "Predictive Text: robust testing" Web test page

* **TEST_QUOTE_O (OPEN):** notes
* **TEST_QUOTE_OP (OPEN):** notes
* **TEST_OP (OPEN):** notes

jahorton · 2022-09-05T04:54:39Z

... fully posted it, then realized part of the first drafted unit test still fails. Still a lot better than before; just the one last nit to get, I guess.

bharanidharanj · 2022-09-05T12:11:26Z

GROUP_ANDROID: Test using any Android device (or simulated device) supporting Keyman.

TEST_QUOTE_O (PASSED): Tested this with the attached PR build (keyman 16.0.56-alpha-test-7205)in the Android Mobile device Ver 11.0 and I noticed most of the displayed suggestions begin with an 'o'. Applied suggestions did not replaced by the ' mark.
TEST_QUOTE_OP (PASSED): Tested this with the attached PR build in the Android Mobile device Ver 11.0 and I noticed that most of the displayed suggestions begin with an 'op'. Applied suggestions did not replaced by the ' mark.
TEST_OP (PASSED): Tested this as per the instructions and most the of the displayed suggestions begin with an op. Applying 'open' to 'you op' results 'you open' for the full context.

bharanidharanj · 2022-09-05T12:22:34Z

GROUP_WEB_HOST: Test using the "Predictive Text: robust testing" Web test page

TEST_QUOTE_O (PASSED): Tested as per the instructions using the "Predictive Text: robust testing" Web test page and it is working as expected.
TEST_QUOTE_OP (PASSED): Tested as per the instructions using the "Predictive Text: robust testing" Web test page and it is working as expected.
TEST_OP (PASSED): Tested as per the instructions using the "Predictive Text: robust testing" Web test page and it is working as expected.

common/web/lm-worker/src/transformUtils.ts

mcdurdin · 2022-09-07T22:05:01Z

I notice both web and lm test builds failed. Not sure if that is new to this PR or longstanding instability. We need to get on top of these failures (even if it means disabling tests because we can't get them stable).

mcdurdin

So I am not entirely clear on the rationale behind some of the algorithm changes. It seems to make sense but I don't have a deep enough understanding of the whole module to really be 100% on it. Code itself looks okay, apart from the isWhitespace function which seems weird. Appreciate the refactoring of the three related functions into new TransformUtils class.

common/web/lm-worker/src/correction/context-tracker.ts

common/web/lm-worker/src/model-compositor.ts

mcdurdin · 2022-09-07T22:17:58Z

common/web/lm-worker/src/transformUtils.ts

+  static isWhitespace(transform: Transform): boolean {
+    // Matches prefixed text + any instance of a character with Unicode general property Z* or the following: CR, LF, and Tab.
+    let whitespaceRemover = /.*[\u0009\u000A\u000D\u0020\u00a0\u1680\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u200b\u2028\u2029\u202f\u205f\u3000]/i;
+
+    // Filter out null-inserts; their high probability can cause issues.
+    if(transform.insert == '') { // Can actually register as 'whitespace'.
+      return false;
+    }
+
+    let insert = transform.insert;
+
+    insert = insert.replace(whitespaceRemover, '');
+
+    return insert == '';
+  }


This seems a little convoluted.

I'm really not sure exactly what we are testing here. It seems like you are looking for insert.match(/[\u0009...]$/)? Is it intended to match having a single whitespace character on the end of the text? If so, then the function name is not very clear. If not, then AFAICT the function is not correct.

i modifier should not be necessary on the regex -- there's no case to be insensitive about 🤣.

Can we use a character class for the regex match rather than a set of characters? (e.g. insert.match(/[\p{Z}\r\n]$/u). Note Chrome 50+ though (so we could have the shorter regex as a comment if we can't use it for back-compat reasons).

Finally, can we have a javadoc comment rather than the in-function comment?

Keep in mind that I did not edit this function at all. I simply extracted it from its prior home. From master:

keyman/common/web/lm-worker/src/model-compositor.ts

Lines 19 to 33 in 80d7ce7

protected isWhitespace(transform: Transform): boolean {

// Matches prefixed text + any instance of a character with Unicode general property Z* or the following: CR, LF, and Tab.

let whitespaceRemover = /.*[\u0009\u000A\u000D\u0020\u00a0\u1680\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u200b\u2028\u2029\u202f\u205f\u3000]/i;

// Filter out null-inserts; their high probability can cause issues.

if(transform.insert == '') { // Can actually register as 'whitespace'.

return false;

}

let insert = transform.insert;

insert = insert.replace(whitespaceRemover, '');

return insert == '';

}

This all dates back to #1851, and @eddieantonio was the one responsible for that regex: #1851 (comment)

Can't speak to why he didn't use a character class here, but I figure that there was some reason or other. For what it's worth, following that character class link and drilling down to the actual classes: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Character_Classes

I don't see a compatibility chart there. Now, there are other ways to search for this...

https://caniuse.com/mdn-javascript_builtins_regexp_property_escapes

Note the minimum Chrome version listed there - if desktop Chrome didn't support these property escapes before version 64, I doubt that mobile had it implemented at that time. This, to me, implies a strong chance that using them would break predictive text outright on non-updated Android 5.0 devices, at the least.

Where do you see a $ in the regex string here? Is it implied with the flags on the pattern? It's certainly not present in plain-text.

apart from the isWhitespace function which seems weird

So... feel free to make an issue about the function's implementation seeming weird to you, but since there are no edits to this function within this PR, any potential changes should be considered out of scope here. It's been in place since 12.0, I think.

(Using git blame to double-check...)

(I'm pretty sure that the more-recent 'change' on the empty line was lingering whitespace removal.)

Yep, happy to postpone fixing it. Wonder if this function may be causing issues though because it seems weird...

Where do you see a $ in the regex string here? Is it implied with the flags on the pattern? It's certainly not present in plain-text.

It's implied by return insert == ''; -- so this will only be true if the regex deletes everything in the string, which means it must be the last char in the string.

Keep in mind that we're dealing with incoming transforms - in essence, this is used on incoming keystrokes, not existing context. (Hence the classname of TransformUtils. (Emphasis on Transform)

isWhitespace({ insert: 'a ', deleteLeft: 0 )

will yield false - after all, that 'a' will remain.

Meanwhile, this one:

isWhitespace({ insert: '\u{0020}\u{0020}\u{0020}\u{0020}', // four spaces in a row; doesn't format well deleteLeft: 0 // on web without the \u stuff )

will yield true. The design is meant to check if any prediction-triggering keystroke is fully whitespace or not.

Its uses do imply a few assumptions, though - we're not doing any hasWhitespace checks, which might be more valid for niche, likely-unintuitive scenarios.

Are you sure? I just ran this (note: stripped down to just pass a string in -- after all, that's the only property of Transform that is used there):

let foo = [ 'a ', ' ', ' ', ' x x ', 'xxx ', 'xxx' ]; for(let f of foo) { console.log(`"${f}": `, isWhitespace(f)); }

and the result was:

$ node foo.js "a ": true " ": true " ": true " x x ": true "xxx ": true "xxx": false

If it is supposed to check if a given transform is 100% whitespace and not empty, then:

let whitespace = /^[\u0009\u000A\u000D\u0020\u00a0\u1680\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u200b\u2028\u2029\u202f\u205f\u3000]+$/; return transform.insert.match(whitespace) !== null;

Oh right... regex matching by default gets the portion, rather than checking against the whole. I tend to forget that sometimes.

Sorry on the delay. Just got back to this.

Though, I did go ahead and add unit testing for the function while I was at it.

common/web/lm-worker/src/transformUtils.ts

jahorton · 2022-09-08T01:02:16Z

I notice both web and lm test builds failed. Not sure if that is new to this PR or longstanding instability. We need to get on top of these failures (even if it means disabling tests because we can't get them stable).

For the LMLayer, it's the test failure addressed by #7037 again. (It happens on lmlayer worker + hosted-in-browser integration.)
Web's is similarly unrelated.

mcdurdin · 2022-09-08T01:50:22Z

For the LMLayer, it's the test failure addressed by #7037 again. (It happens on lmlayer worker + hosted-in-browser integration.)
Web's is similarly unrelated.

OK. Think we need to get #7037 merged. I'm not excited about it because it's hard to 100% verify, being timing-based, but the noise is causing trouble in dev processes. And I think we need another round on fixing the web failing tests because again, it's really hard to tell the relationship in failed tests.

…contrast logic

jahorton · 2022-09-08T03:49:32Z

Just swapped the approach for something clearer (and less fiddly, to boot!). The change is quite significant, so...

@keymanapp-test-bot retest all

mcdurdin · 2022-09-14T06:04:07Z

common/web/lm-worker/src/transformUtils.ts

+    if(transform.insert == '') {
      return false;
    }


This shouldn't be needed as the regex match will exclude it.

common/web/lm-worker/src/transformUtils.ts

Co-authored-by: Marc Durdin <[email protected]>

keyman-server · 2022-09-15T18:01:55Z

Changes in this pull request will be available for download in Keyman version 16.0.64-alpha

jahorton added 5 commits September 5, 2022 10:26

fix(web): pred-text context tracking when wordbreak not caused by whi…

ad8c30c

…tespace

fix(web): missed method references

be562b3

fix(web): adds unit test targeting issue, handler for edge case

b0184ec

chore(web): test name tweak

0a237f8

docs(web): fixes missed doc update

d33a9f0

jahorton added this to the A16S10 milestone Sep 5, 2022

jahorton requested review from sgschantz and mcdurdin as code owners September 5, 2022 04:42

keymanapp-test-bot bot added the user-test-missing User tests have not yet been defined for the PR label Sep 5, 2022

github-actions bot added common/ common/web/ fix labels Sep 5, 2022

keymanapp-test-bot bot added has-user-test user-test-required User tests have not been completed and removed user-test-missing User tests have not yet been defined for the PR labels Sep 5, 2022

jahorton marked this pull request as draft September 5, 2022 04:55

jahorton added 2 commits September 5, 2022 12:17

fix(web): post-suggestion-apply error

57ce30e

fix(common/models): context token .isNew maintenance

b47596a

jahorton marked this pull request as ready for review September 5, 2022 05:35

fix(web): context-tracker newFlag management for new contexts

26c5150

keymanapp-test-bot bot removed the user-test-required User tests have not been completed label Sep 5, 2022

darcywong00 reviewed Sep 6, 2022

View reviewed changes

common/web/lm-worker/src/transformUtils.ts Outdated Show resolved Hide resolved

chore(common/models): suggested tweak from review

9634c76

jahorton mentioned this pull request Sep 6, 2022

fix(common/models): reconnects unit tests for worker-internal submodules 🎡 #7215

Merged

jahorton added 2 commits September 6, 2022 10:47

fix(common/models): undefined != 0

9be8839

fix(common/models): backspacing shouldn't make 'new' tokens

fb0cc50

mcdurdin requested changes Sep 7, 2022

View reviewed changes

mcdurdin mentioned this pull request Sep 8, 2022

chore(web): TransformUtils.insertWhitespace seems convoluted #7232

Closed

jahorton added 2 commits September 8, 2022 10:48

change(common/models): drops .isNew, replaces with tokenized context …

52cb9aa

…contrast logic

feat(common/models): also, unit tests

3aed0bf

keymanapp-test-bot bot added the user-test-required User tests have not been completed label Sep 8, 2022

jahorton added 2 commits September 8, 2022 10:54

change(web): pushWhitespaceToTail tweak

75e0b27

change(web): conciser version of last commit

6c50de9

jahorton mentioned this pull request Sep 9, 2022

feat(android): accepting a suggestion then adding punctuation should delete automatically added space #7163

Open

jahorton added 3 commits September 12, 2022 15:03

fix(common/models): isWhitespace, adds related unit tests

415201b

fix(common/models): needed export for unit tests

665b149

chore(web): Merge branch 'master' into fix/web/non-whitespace-wordbreaks

2f044c4

jahorton requested a review from mcdurdin September 13, 2022 07:54

This was referenced Sep 13, 2022

bug(common/models): LM replaces left quote and character(s) with the selected option #6900

Closed

bug(model): the prediction picked omits the opening single and double quote #6024

Closed

mcdurdin reviewed Sep 14, 2022

View reviewed changes

mcdurdin approved these changes Sep 14, 2022

View reviewed changes

common/web/lm-worker/src/transformUtils.ts Outdated Show resolved Hide resolved

jahorton and others added 2 commits September 14, 2022 13:32

chore(common/models): Apply suggestions from code review

a390c2f

Co-authored-by: Marc Durdin <[email protected]>

chore(common/models): final requested tweak

917fce0

jahorton merged commit ec53662 into master Sep 15, 2022

jahorton deleted the fix/web/non-whitespace-wordbreaks branch September 15, 2022 01:23

jahorton mentioned this pull request Sep 19, 2022

fix(common/models): fixes reference dropped by git merge #7313

Merged

MakaraSok mentioned this pull request May 11, 2023

chore(ios): Changes required for XCode 14.3 #8746

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(common/models): fixes quote-adjacent pred-text suggestions #7205

fix(common/models): fixes quote-adjacent pred-text suggestions #7205

jahorton commented Sep 5, 2022 •

edited

Loading

keymanapp-test-bot bot commented Sep 5, 2022 •

edited

Loading

jahorton commented Sep 5, 2022

bharanidharanj commented Sep 5, 2022

bharanidharanj commented Sep 5, 2022

mcdurdin commented Sep 7, 2022

mcdurdin left a comment

mcdurdin Sep 7, 2022

jahorton Sep 8, 2022

jahorton Sep 8, 2022 •

edited

Loading

mcdurdin Sep 8, 2022

jahorton Sep 8, 2022

mcdurdin Sep 8, 2022 •

edited

Loading

mcdurdin Sep 8, 2022

jahorton Sep 8, 2022

jahorton Sep 12, 2022

jahorton commented Sep 8, 2022

mcdurdin commented Sep 8, 2022

jahorton commented Sep 8, 2022 •

edited

Loading

mcdurdin Sep 14, 2022

keyman-server commented Sep 15, 2022

	protected isWhitespace(transform: Transform): boolean {
	// Matches prefixed text + any instance of a character with Unicode general property Z* or the following: CR, LF, and Tab.
	let whitespaceRemover = /.*[\u0009\u000A\u000D\u0020\u00a0\u1680\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u200b\u2028\u2029\u202f\u205f\u3000]/i;

	// Filter out null-inserts; their high probability can cause issues.
	if(transform.insert == '') { // Can actually register as 'whitespace'.
	return false;
	}

	let insert = transform.insert;

	insert = insert.replace(whitespaceRemover, '');

	return insert == '';
	}

fix(common/models): fixes quote-adjacent pred-text suggestions #7205

fix(common/models): fixes quote-adjacent pred-text suggestions #7205

Conversation

jahorton commented Sep 5, 2022 • edited Loading

User Testing

keymanapp-test-bot bot commented Sep 5, 2022 • edited Loading

User Test Results

jahorton commented Sep 5, 2022

bharanidharanj commented Sep 5, 2022

GROUP_ANDROID: Test using any Android device (or simulated device) supporting Keyman.

bharanidharanj commented Sep 5, 2022

GROUP_WEB_HOST: Test using the "Predictive Text: robust testing" Web test page

mcdurdin commented Sep 7, 2022

mcdurdin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jahorton Sep 8, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mcdurdin Sep 8, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jahorton commented Sep 8, 2022

mcdurdin commented Sep 8, 2022

jahorton commented Sep 8, 2022 • edited Loading

Choose a reason for hiding this comment

keyman-server commented Sep 15, 2022

jahorton commented Sep 5, 2022 •

edited

Loading

keymanapp-test-bot bot commented Sep 5, 2022 •

edited

Loading

jahorton Sep 8, 2022 •

edited

Loading

mcdurdin Sep 8, 2022 •

edited

Loading

jahorton commented Sep 8, 2022 •

edited

Loading