[LMLayer, Web] Context Management + Suggestion UX #1851

jahorton · 2019-07-01T09:10:02Z

This PR performs the following enhancements:

Adds a wordbreaking API to the LMLayer for use by KMW.
LMLayer now generates 'keep' Suggestions and prevents Suggestion duplication.
KMW now relies on the new 'keep' Suggestion from LMLayer; no more <keep> text!
The LMLayer now properly processes BKSP keystrokes. (The fixed 'bug'.)
KMW now requests wordbreaking to get display text for reversion operations.
On BKSP, KMW displays a reversion option instead of auto-reverting.
BKSP is no longer blocked post-acceptance, while still allowing reversion.

keyman-server · 2019-07-01T09:40:37Z

eddieantonio

This looks like good progress! These improvements/fixes are definitely highly-anticipated! There are definitely a few things to clean up at this point so far though; please refer to my inline comments.

Additionally, are we still maintaining this document: https://github.com/keymanapp/keyman/blob/master/common/predictive-text/docs/worker-communication-protocol.md?

If so, this pull request should also add the wordbreak and word message to that document. If we're not maintaining that document anymore, then we should delete it altogether.

Also, I make a number of suggestions to solve the second most difficult problem in Computer Science. Please have a go at it! 😄

common/predictive-text/message.d.ts

eddieantonio · 2019-07-02T15:15:23Z

common/predictive-text/unit_tests/headless/default-word-breaker.js

+
+    broken = model.wordbreak(context);
+
+    assert.strictEqual(broken, 'jumped');  // Current result:  'jum'.    


Ah! So you're proposing "current word" semantics! That's great! I think I know how to implement that! This is a big step up from SwiftKey which strictly considers only what's left of the cursor.

It's something quite natural to do when a user places the caret within a word, so I figured that'd be a usability thing we'd want to see happen eventually.

Also, I figured that's why we have LMLayer requesting text to the right-hand side of the caret for part of the Context spec.

common/predictive-text/unit_tests/headless/default-word-breaker.js

common/predictive-text/unit_tests/in_browser/cases/worker-dummy-integration.js

common/predictive-text/worker/models/trie-model.ts

common/predictive-text/worker/worker-interfaces.ts

common/predictive-text/message.d.ts

eddieantonio · 2019-07-02T15:46:08Z

common/predictive-text/message.d.ts

+   * Indicates if this option represents preservation of the user's 
+   * originally-input text (pre-prediction).
+   */
+  isKeep?: boolean;


Shall we implement the generalized suggestion tag here? e.g.,

Suggested change

isKeep?: boolean;

tag?: 'keep';

This is to accommodate more suggestion types such as emoji and correction. Recall this document. I also think clearer names than keep may be verbatim or original

Hmm. That's a pretty reasonable idea, though I'd want to be sure we've thought everything through clearly regarding tags first. Admittedly, I forgot about the specific document since it's seemed like major back-burner stuff for a future version, so thanks for bringing it up.

Can a Suggestion have more than one tag?

Suppose the text smile.

smile itself is the obvious default 'keep' option.

However, 😄 would be a pretty reasonable suggestion, and semantically might be considered a 'keep.' But yeah, that's probably a 'prediction'.

Now consider the text smilf.

smile would be an obvious 'correction'.

But what about 😄 here? It's both an 'emoji' and a 'correction' - do I choose 'emoji' and drop the 'correction' metadata component?

If we block 'correction' but allow emojis, we risk the situation where smilf does not suggest smile (the 'correction') but does suggest 😄 (the 'emoji')... that doesn't seem right.

I'd prefer to implement the typing here once, so the above point is to ask this - should it be a single string, or would we want an array of tags?

A single string. This will make implementation, for us and for developers of custom models, easier. In this specific example, I think the loss of information— that 😄 is a correction of "smilr"— is okay. I assume that emoji would always be the rightmost suggestion, regardless if it is a "correction" or not. I think the "keep" tag should only be emitted when every suggestion the model(s) make suggest something other than the text that should be kept.

I think the "keep" tag should only be emitted when every suggestion the model(s) make suggest something other than the text that should be kept.

I disagree with this part, simply to ensure we can reliably detect keep and preserve it as the first option. Otherwise, there's a slim chance 'keep' could be an obscure and unlikely word and become buried past the first three suggestions.

Otherwise, 👍 .

jahorton · 2019-07-03T02:51:18Z

Additionally, are we still maintaining this document: https://github.com/keymanapp/keyman/blob/master/common/predictive-text/docs/worker-communication-protocol.md?

If so, this pull request should also add the wordbreak and word message to that document. If we're not maintaining that document anymore, then we should delete it altogether.

👍

Totally forgot about that before; thanks for the reminder.

eddieantonio

LGTM!

I think we might want to address my remaining comments in future PRs:

using a generalized "tag" mechanism for suggestions and nailing down exactly what those semantics are
Spacing in the model compositor

common/predictive-text/message.d.ts

common/predictive-text/unit_tests/headless/default-word-breaker.js

eddieantonio · 2019-07-03T15:30:57Z

common/predictive-text/worker/model-compositor.ts

+    })[0].sample;
+
+    // Only allow new-word suggestions if space was the most likely keypress.
+    let allowSpace = inputTransform.insert == " " || inputTransform.insert == "\n";


Maybe I don't understand the intent of this code. It's checking for a space or newline because it assumes that...?

If you're looking specifically for whitespace, then use all characters with Unicode general property Z*:

/[\u0020\u00a0\u1680\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u2028\u2029\u202f\u205f\u3000]/

These DO NOT include CR, LF, and Tab. So you could use this regex to include them:

/[\u0009\u000A\u000D\u0020\u00a0\u1680\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u2028\u2029\u202f\u205f\u3000]/

We can address this in a future PR, though! 😅

jahorton · 2019-07-04T01:47:14Z

LGTM!

I think we might want to address my remaining comments in future PRs:
1. using a generalized "tag" mechanism for suggestions and nailing down _exactly_ what those semantics are

2. Spacing in the model compositor

I went ahead and did something basic for them now, though it's quite possible we'd want to address those points further at a later point.

Re "spacing in the model compositor" and the following related concern:

Maybe I don't understand the intent of this code. It's checking for a space or newline because it assumes that...?

If a user didn't (intentionally) input a whitespace keystroke, we shouldn't give them predictions for entirely new words - predictions that would result if we allow a 'possible alternate' whitespace transform to be the root of possible 'corrections'. It didn't "feel right" during my development testing after fixing the old backspace-prediction issue to allow them. Of course, they should be allowed when they're actually intended - so if our 'most likely' input transform is whitespace, we then don't block them.

Example case: With "total" + "l", you'd see the prediction "totally"... and then "the", "and", "of" because there were no other predictions rooted on "total" that didn't start new words. The intuitive interpretation would likely be "why would I want to replace 'totall' with 'and'?" - something that'd be technically wrong and result in a bad impression for the user.

Similarly, predictions get a little weird if you allow 'correcting' non-backspace keystrokes to a possible backspace keystroke. It actually does improve the range of correction and prediction, but there was a little bug that was happening for those BKSP-rooted suggestions and the behavior of suggestions didn't quite feel consistent, so leaving it on didn't quite "feel right" either.

If you want to experience the difference, go to /web/testing/prediction-ui/index.html in Chrome (for emulated mobile testing) after disabling the related filtering checks.

Update: may have to do the whitespace update later; looks like IE may not 'appreciate' something about that regex.

jahorton · 2019-07-04T02:25:22Z

common/predictive-text/worker/model-compositor.ts

@@ -6,15 +6,53 @@ class ModelCompositor {
    this.lexicalModel = lexicalModel;
  }

+  protected isWhitespace(transform: Transform): boolean {
+    // Matches prefixed text + any instance of a character with Unicode general property Z* or the following: CR, LF, and Tab.
+    let whitespaceRemover = /.*[\u0009\u000A\u000D\u0020\u00a0\u1680\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u2028\u2029\u202f\u205f\u3000]/iu;


https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicode

IE not supported. :(

jahorton added 7 commits July 1, 2019 09:50

Initial implementation of direct wordbreak API

7f64230

Adds integrated wordbreaking test case

0a0cfd0

Fixes backspace handling and duplicated suggestions.

bb3a83b

Detection of the 'keep' option.

db6f59f

LMLayer generation of 'keep' Suggestions

8725428

KMW now uses LMLayer's new 'keep' suggestion

96225f1

KMW wordbreak request for reversion option

549dcb3

jahorton added bug enhancement web/ common/models/ labels Jul 1, 2019

jahorton added this to the P7S1 milestone Jul 1, 2019

eddieantonio self-requested a review July 1, 2019 22:27

jahorton added 4 commits July 2, 2019 08:13

Bypasses IE unit test failure for wordbreaks.

7b1c036

Initial implementation of revert suggestion option

bbb1301

Removes post-accept BSKP swallowing

2e4d42e

Corrects reversion transform post-bksp change

9c769a5

jahorton marked this pull request as ready for review July 2, 2019 03:17

eddieantonio suggested changes Jul 2, 2019

View reviewed changes

mcdurdin modified the milestones: P7S1, P7S2 Jul 2, 2019

jahorton and others added 2 commits July 3, 2019 09:41

Addresses most concerns of recent PR review.

aec9634

Update worker-communication-protocol.md

ea83a07

eddieantonio approved these changes Jul 3, 2019

View reviewed changes

jahorton added 2 commits July 4, 2019 08:09

Moves .isKeep prop to .tag == 'keep'.

2e3e205

Better whitespace detection in compositor.

3a77285

jahorton commented Jul 4, 2019

View reviewed changes

Patches out IE-incompatible whitespace regex.

5c084c5

jahorton merged commit 715729f into master Jul 4, 2019

jahorton deleted the lmlayer-context-management branch July 4, 2019 04:08

jahorton mentioned this pull request Jul 4, 2019

ModelCompositor enhancements #1860

Open

jahorton mentioned this pull request Sep 8, 2022

fix(common/models): fixes quote-adjacent pred-text suggestions #7205

Merged

mcdurdin mentioned this pull request Sep 8, 2022

chore(web): TransformUtils.insertWhitespace seems convoluted #7232

Closed

jahorton mentioned this pull request Sep 16, 2022

fix(web): fixes unintended auto-acceptance of suggestion after reverting #7305

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LMLayer, Web] Context Management + Suggestion UX #1851

[LMLayer, Web] Context Management + Suggestion UX #1851

jahorton commented Jul 1, 2019 •

edited

Loading

keyman-server commented Jul 1, 2019

eddieantonio left a comment

eddieantonio Jul 2, 2019

jahorton Jul 3, 2019

jahorton Jul 3, 2019

eddieantonio Jul 2, 2019

jahorton Jul 3, 2019

eddieantonio Jul 3, 2019

jahorton Jul 4, 2019

jahorton commented Jul 3, 2019

eddieantonio left a comment

eddieantonio Jul 3, 2019

jahorton commented Jul 4, 2019 •

edited

Loading

jahorton Jul 4, 2019


		broken = model.wordbreak(context);

		assert.strictEqual(broken, 'jumped'); // Current result: 'jum'.

[LMLayer, Web] Context Management + Suggestion UX #1851

[LMLayer, Web] Context Management + Suggestion UX #1851

Conversation

jahorton commented Jul 1, 2019 • edited Loading

keyman-server commented Jul 1, 2019

eddieantonio left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jahorton commented Jul 3, 2019

eddieantonio left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jahorton commented Jul 4, 2019 • edited Loading

Choose a reason for hiding this comment

jahorton commented Jul 1, 2019 •

edited

Loading

jahorton commented Jul 4, 2019 •

edited

Loading