-
-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LMLayer, Web] Context Management + Suggestion UX #1851
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like good progress! These improvements/fixes are definitely highly-anticipated! There are definitely a few things to clean up at this point so far though; please refer to my inline comments.
Additionally, are we still maintaining this document: https://github.com/keymanapp/keyman/blob/master/common/predictive-text/docs/worker-communication-protocol.md?
If so, this pull request should also add the wordbreak
and word
message to that document. If we're not maintaining that document anymore, then we should delete it altogether.
Also, I make a number of suggestions to solve the second most difficult problem in Computer Science. Please have a go at it! 😄
|
||
broken = model.wordbreak(context); | ||
|
||
assert.strictEqual(broken, 'jumped'); // Current result: 'jum'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! So you're proposing "current word" semantics! That's great! I think I know how to implement that! This is a big step up from SwiftKey which strictly considers only what's left of the cursor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's something quite natural to do when a user places the caret within a word, so I figured that'd be a usability thing we'd want to see happen eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I figured that's why we have LMLayer requesting text to the right-hand side of the caret for part of the Context
spec.
common/predictive-text/unit_tests/in_browser/cases/worker-dummy-integration.js
Show resolved
Hide resolved
common/predictive-text/message.d.ts
Outdated
* Indicates if this option represents preservation of the user's | ||
* originally-input text (pre-prediction). | ||
*/ | ||
isKeep?: boolean; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we implement the generalized suggestion tag here? e.g.,
isKeep?: boolean; | |
tag?: 'keep'; |
This is to accommodate more suggestion types such as emoji
and correction
. Recall this document. I also think clearer names than keep
may be verbatim
or original
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. That's a pretty reasonable idea, though I'd want to be sure we've thought everything through clearly regarding tags first. Admittedly, I forgot about the specific document since it's seemed like major back-burner stuff for a future version, so thanks for bringing it up.
- Can a Suggestion have more than one tag?
- Suppose the text
smile
.smile
itself is the obvious default 'keep' option.- However, 😄 would be a pretty reasonable suggestion, and semantically might be considered a 'keep.' But yeah, that's probably a 'prediction'.
- Now consider the text
smilf
.smile
would be an obvious 'correction'.- But what about 😄 here? It's both an 'emoji' and a 'correction' - do I choose 'emoji' and drop the 'correction' metadata component?
- If we block 'correction' but allow emojis, we risk the situation where
smilf
does not suggestsmile
(the 'correction') but does suggest 😄 (the 'emoji')... that doesn't seem right.
- Suppose the text
- I'd prefer to implement the typing here once, so the above point is to ask this - should it be a single string, or would we want an array of tags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A single string. This will make implementation, for us and for developers of custom models, easier. In this specific example, I think the loss of information— that 😄 is a correction of "smilr"— is okay. I assume that emoji would always be the rightmost suggestion, regardless if it is a "correction" or not. I think the "keep" tag should only be emitted when every suggestion the model(s) make suggest something other than the text that should be kept.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the "keep" tag should only be emitted when every suggestion the model(s) make suggest something other than the text that should be kept.
I disagree with this part, simply to ensure we can reliably detect keep and preserve it as the first option. Otherwise, there's a slim chance 'keep' could be an obscure and unlikely word and become buried past the first three suggestions.
Otherwise, 👍 .
👍 Totally forgot about that before; thanks for the reminder. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I think we might want to address my remaining comments in future PRs:
- using a generalized "tag" mechanism for suggestions and nailing down exactly what those semantics are
- Spacing in the model compositor
})[0].sample; | ||
|
||
// Only allow new-word suggestions if space was the most likely keypress. | ||
let allowSpace = inputTransform.insert == " " || inputTransform.insert == "\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I don't understand the intent of this code. It's checking for a space or newline because it assumes that...?
If you're looking specifically for whitespace, then use all characters with Unicode general property Z*:
/[\u0020\u00a0\u1680\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u2028\u2029\u202f\u205f\u3000]/
These DO NOT include CR, LF, and Tab. So you could use this regex to include them:
/[\u0009\u000A\u000D\u0020\u00a0\u1680\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u2028\u2029\u202f\u205f\u3000]/
We can address this in a future PR, though! 😅
I went ahead and did something basic for them now, though it's quite possible we'd want to address those points further at a later point. Re "spacing in the model compositor" and the following related concern:
If a user didn't (intentionally) input a whitespace keystroke, we shouldn't give them predictions for entirely new words - predictions that would result if we allow a 'possible alternate' whitespace transform to be the root of possible 'corrections'. It didn't "feel right" during my development testing after fixing the old backspace-prediction issue to allow them. Of course, they should be allowed when they're actually intended - so if our 'most likely' input transform is whitespace, we then don't block them. Example case: With "total" + "l", you'd see the prediction "totally"... and then "the", "and", "of" because there were no other predictions rooted on "total" that didn't start new words. The intuitive interpretation would likely be "why would I want to replace 'totall' with 'and'?" - something that'd be technically wrong and result in a bad impression for the user. Similarly, predictions get a little weird if you allow 'correcting' non-backspace keystrokes to a possible backspace keystroke. It actually does improve the range of correction and prediction, but there was a little bug that was happening for those BKSP-rooted suggestions and the behavior of suggestions didn't quite feel consistent, so leaving it on didn't quite "feel right" either. If you want to experience the difference, go to /web/testing/prediction-ui/index.html in Chrome (for emulated mobile testing) after disabling the related filtering checks. Update: may have to do the whitespace update later; looks like IE may not 'appreciate' something about that regex. |
@@ -6,15 +6,53 @@ class ModelCompositor { | |||
this.lexicalModel = lexicalModel; | |||
} | |||
|
|||
protected isWhitespace(transform: Transform): boolean { | |||
// Matches prefixed text + any instance of a character with Unicode general property Z* or the following: CR, LF, and Tab. | |||
let whitespaceRemover = /.*[\u0009\u000A\u000D\u0020\u00a0\u1680\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u2028\u2029\u202f\u205f\u3000]/iu; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixes #1849.
This PR performs the following enhancements:
Suggestion
s and preventsSuggestion
duplication.Suggestion
from LMLayer; no more<keep>
text!