Skip to content

Conversation

@qijiaxing
Copy link
Collaborator

\xa0 may be a part of a Chinese character utf-8 encoding. Thus replacing it by space causes errors for a Chinese dataset.

\xa0 may be a part of a Chinese character utf-8 encoding. Thus replacing it by space causes errors for a Chinese dataset.

Signed-off-by: Jiaxing Qi <[email protected]>
@chiphuyen chiphuyen merged commit 3b78350 into NVIDIA-NeMo:master Oct 25, 2019
@chiphuyen
Copy link
Contributor

Thank you!

dcurran90 pushed a commit to dcurran90/NeMo that referenced this pull request Oct 15, 2024
feat: Use diff for lab list and generate
blisc pushed a commit that referenced this pull request May 5, 2025
…codes (#66)

* [magpie][wandb] add loggings for pad ratios for text tokens and audio codes.

Signed-off-by: Xuesong Yang <[email protected]>

* [magpie][wandb] fix pad ratio calculation

Signed-off-by: Xuesong Yang <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
blisc pushed a commit that referenced this pull request May 5, 2025
…codes (#66)

* [magpie][wandb] add loggings for pad ratios for text tokens and audio codes.

Signed-off-by: Xuesong Yang <[email protected]>

* [magpie][wandb] fix pad ratio calculation

Signed-off-by: Xuesong Yang <[email protected]>

---------

Signed-off-by: Xuesong Yang <[email protected]>
Co-authored-by: Xuesong Yang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants