Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix whitespace replacement bug #24

Merged
merged 1 commit into from
Mar 29, 2024

Conversation

FanhaiLu1
Copy link
Collaborator

This PR fix whitespace replacement bug in current code.

Current code replace both "▁" (U+2581) and "_" (under score) as whitespace. The Sentencepiece scapes the whitespace with a meta symbol "▁" (U+2581), we should only replace "▁" (U+2581) with whitespace.

Here is the example of the output before vs after change for token id 21326:

  • '
  • `__

@FanhaiLu1 FanhaiLu1 merged commit 970b529 into AI-Hypercomputer:main Mar 29, 2024
3 checks passed
@FanhaiLu1 FanhaiLu1 deleted the space_fix branch April 3, 2024 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants