Skip to content

Conversation

@shiny-comic
Copy link
Contributor

This closes #5503

Previous code use UTF-8 to count characters however Emojis are UTF-16 units.
This difference leads to misalignment of index offsets.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a text truncation bug where the final character(s) of YouTube-provided text (notably comments) could be dropped when the text contains emoji/SMP codepoints, by aligning string-length accounting with YouTube’s UTF-16 indexing semantics.

Changes:

  • Added a helper to compute UTF-16 code unit length for a String.
  • Updated parse_description to use UTF-16 code unit length instead of String#size when copying/escaping content.
  • Updated remaining-length calculation to be consistent with UTF-16 indexing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] The last character of comments disappear if emoji are included.

1 participant