Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LMLayer, Web] Context Management + Suggestion UX #1851

Merged
merged 16 commits into from
Jul 4, 2019

Conversation

jahorton
Copy link
Contributor

@jahorton jahorton commented Jul 1, 2019

Fixes #1849.

This PR performs the following enhancements:

  • Adds a wordbreaking API to the LMLayer for use by KMW.
  • LMLayer now generates 'keep' Suggestions and prevents Suggestion duplication.
  • KMW now relies on the new 'keep' Suggestion from LMLayer; no more <keep> text!
  • The LMLayer now properly processes BKSP keystrokes. (The fixed 'bug'.)
  • KMW now requests wordbreaking to get display text for reversion operations.
  • On BKSP, KMW displays a reversion option instead of auto-reverting.
  • BKSP is no longer blocked post-acceptance, while still allowing reversion.

@keyman-server
Copy link
Collaborator

OSK renders for Windows

@eddieantonio eddieantonio self-requested a review July 1, 2019 22:27
@jahorton jahorton marked this pull request as ready for review July 2, 2019 03:17
Copy link
Contributor

@eddieantonio eddieantonio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like good progress! These improvements/fixes are definitely highly-anticipated! There are definitely a few things to clean up at this point so far though; please refer to my inline comments.

Additionally, are we still maintaining this document: https://github.com/keymanapp/keyman/blob/master/common/predictive-text/docs/worker-communication-protocol.md?

If so, this pull request should also add the wordbreak and word message to that document. If we're not maintaining that document anymore, then we should delete it altogether.

Also, I make a number of suggestions to solve the second most difficult problem in Computer Science. Please have a go at it! 😄

common/predictive-text/message.d.ts Outdated Show resolved Hide resolved
common/predictive-text/message.d.ts Outdated Show resolved Hide resolved

broken = model.wordbreak(context);

assert.strictEqual(broken, 'jumped'); // Current result: 'jum'.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! So you're proposing "current word" semantics! That's great! I think I know how to implement that! This is a big step up from SwiftKey which strictly considers only what's left of the cursor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's something quite natural to do when a user places the caret within a word, so I figured that'd be a usability thing we'd want to see happen eventually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I figured that's why we have LMLayer requesting text to the right-hand side of the caret for part of the Context spec.

common/predictive-text/message.d.ts Outdated Show resolved Hide resolved
common/predictive-text/message.d.ts Outdated Show resolved Hide resolved
* Indicates if this option represents preservation of the user's
* originally-input text (pre-prediction).
*/
isKeep?: boolean;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we implement the generalized suggestion tag here? e.g.,

Suggested change
isKeep?: boolean;
tag?: 'keep';

This is to accommodate more suggestion types such as emoji and correction. Recall this document. I also think clearer names than keep may be verbatim or original

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. That's a pretty reasonable idea, though I'd want to be sure we've thought everything through clearly regarding tags first. Admittedly, I forgot about the specific document since it's seemed like major back-burner stuff for a future version, so thanks for bringing it up.

  • Can a Suggestion have more than one tag?
    • Suppose the text smile.
      • smile itself is the obvious default 'keep' option.
      • However, 😄 would be a pretty reasonable suggestion, and semantically might be considered a 'keep.' But yeah, that's probably a 'prediction'.
    • Now consider the text smilf.
      • smile would be an obvious 'correction'.
      • But what about 😄 here? It's both an 'emoji' and a 'correction' - do I choose 'emoji' and drop the 'correction' metadata component?
      • If we block 'correction' but allow emojis, we risk the situation where smilf does not suggest smile (the 'correction') but does suggest 😄 (the 'emoji')... that doesn't seem right.
  • I'd prefer to implement the typing here once, so the above point is to ask this - should it be a single string, or would we want an array of tags?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A single string. This will make implementation, for us and for developers of custom models, easier. In this specific example, I think the loss of information— that 😄 is a correction of "smilr"— is okay. I assume that emoji would always be the rightmost suggestion, regardless if it is a "correction" or not. I think the "keep" tag should only be emitted when every suggestion the model(s) make suggest something other than the text that should be kept.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the "keep" tag should only be emitted when every suggestion the model(s) make suggest something other than the text that should be kept.

I disagree with this part, simply to ensure we can reliably detect keep and preserve it as the first option. Otherwise, there's a slim chance 'keep' could be an obscure and unlikely word and become buried past the first three suggestions.

Otherwise, 👍 .

@mcdurdin mcdurdin modified the milestones: P7S1, P7S2 Jul 2, 2019
@jahorton
Copy link
Contributor Author

jahorton commented Jul 3, 2019

Additionally, are we still maintaining this document: https://github.com/keymanapp/keyman/blob/master/common/predictive-text/docs/worker-communication-protocol.md?

If so, this pull request should also add the wordbreak and word message to that document. If we're not maintaining that document anymore, then we should delete it altogether.

👍

Totally forgot about that before; thanks for the reminder.

Copy link
Contributor

@eddieantonio eddieantonio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I think we might want to address my remaining comments in future PRs:

  1. using a generalized "tag" mechanism for suggestions and nailing down exactly what those semantics are
  2. Spacing in the model compositor

common/predictive-text/message.d.ts Show resolved Hide resolved
})[0].sample;

// Only allow new-word suggestions if space was the most likely keypress.
let allowSpace = inputTransform.insert == " " || inputTransform.insert == "\n";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I don't understand the intent of this code. It's checking for a space or newline because it assumes that...?

If you're looking specifically for whitespace, then use all characters with Unicode general property Z*:

/[\u0020\u00a0\u1680\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u2028\u2029\u202f\u205f\u3000]/

These DO NOT include CR, LF, and Tab. So you could use this regex to include them:

/[\u0009\u000A\u000D\u0020\u00a0\u1680\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u2028\u2029\u202f\u205f\u3000]/

We can address this in a future PR, though! 😅

@jahorton
Copy link
Contributor Author

jahorton commented Jul 4, 2019

LGTM!

I think we might want to address my remaining comments in future PRs:

1. using a generalized "tag" mechanism for suggestions and nailing down _exactly_ what those semantics are

2. Spacing in the model compositor

I went ahead and did something basic for them now, though it's quite possible we'd want to address those points further at a later point.

Re "spacing in the model compositor" and the following related concern:

Maybe I don't understand the intent of this code. It's checking for a space or newline because it assumes that...?

If a user didn't (intentionally) input a whitespace keystroke, we shouldn't give them predictions for entirely new words - predictions that would result if we allow a 'possible alternate' whitespace transform to be the root of possible 'corrections'. It didn't "feel right" during my development testing after fixing the old backspace-prediction issue to allow them. Of course, they should be allowed when they're actually intended - so if our 'most likely' input transform is whitespace, we then don't block them.

Example case: With "total" + "l", you'd see the prediction "totally"... and then "the", "and", "of" because there were no other predictions rooted on "total" that didn't start new words. The intuitive interpretation would likely be "why would I want to replace 'totall' with 'and'?" - something that'd be technically wrong and result in a bad impression for the user.

Similarly, predictions get a little weird if you allow 'correcting' non-backspace keystrokes to a possible backspace keystroke. It actually does improve the range of correction and prediction, but there was a little bug that was happening for those BKSP-rooted suggestions and the behavior of suggestions didn't quite feel consistent, so leaving it on didn't quite "feel right" either.

If you want to experience the difference, go to /web/testing/prediction-ui/index.html in Chrome (for emulated mobile testing) after disabling the related filtering checks.


Update: may have to do the whitespace update later; looks like IE may not 'appreciate' something about that regex.

@@ -6,15 +6,53 @@ class ModelCompositor {
this.lexicalModel = lexicalModel;
}

protected isWhitespace(transform: Transform): boolean {
// Matches prefixed text + any instance of a character with Unicode general property Z* or the following: CR, LF, and Tab.
let whitespaceRemover = /.*[\u0009\u000A\u000D\u0020\u00a0\u1680\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u2028\u2029\u202f\u205f\u3000]/iu;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Web] Various tweaks to predictive text
4 participants