-
-
Notifications
You must be signed in to change notification settings - Fork 8.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Output HTML contains NULL chracters in at least CJK languages #9985
Comments
Have you checked if it's an MDX issue? Hard to believe Docusaurus has anything to do here. I can also test later. |
I will check other CJK sites built with other software (e.g. Astro & Nextra). |
When I'm debugging this, I usually isolate an MDX compiler with the same setup as Docusaurus, and invoke it programmatically. |
None of Astro & Nextra sites seem to be affected.
Rspress, which also uses MDX (maybe uses mdxjs-rs or markdown-rs instead), is not affected. However, The document of Ant Design is affected. (They do not use Docusaurus or MDX but only remark. Also, the demo of
|
Hey To be honest I'm not super familiar with any of those concepts and won't have the bandwidth to investigate much 😅 I was just wondering, couldn't this be a Crowdin translation issue? I'm not super skilled in |
No NULL characters are found in html, md, mdx, json, or css files in your ZIP archive.
I found this issue in my (our) site where i18n is not applied, so I am convinced that Crowdin is not concerned with it. |
Thanks for investigating. Also worth giving a try to use this env variable on your site when building: |
Neither of |
https://typescriptbook.jp/ (https://github.com/yytypescript/book) This site uses Docusaurus 2.4.1, and NULL chars are not found there. |
I will check this afternoon. There's a chance that there's something environment specific. |
I found both Docusaurus and Ant Design website have And looks like https://ant.design/docs/blog/line-ellipsis-cn doesn't contain NULL now. |
In the pnpm Japanese documents, (only) the following pages contains NULL:
Blog and older versions have not been checked. Some pages contain but some don't. |
I found the top page of the Docusaurus homepage in some languages has NULL: |
Have you read the Contributing Guidelines on issues?
Prerequisites
npm run clear
oryarn clear
command.rm -rf node_modules yarn.lock package-lock.json
and re-installing packages.Description
Docusarus sometimes contaminate output HTMLs with NULL chracters.
NULL characters confuses some HTML parsers used in some document scraper like https://github.com/meilisearch/docs-scraper. (it uses lxml written in Python)
Also it prevents Windows' copy-and-paste feature from copying the complete source code.
Reproducible demo
No response
Steps to reproduce
Note
rg
is ripgrep.For your own documents
Write your documents in CJK or possibly other non-latin languages and then do:
Note
Built JS files do not seem to be affected. (no NULs are found there)
Expected behavior
No outputs (NULL characters are not found)
Actual behavior
🇨🇳
🇯🇵
🇰🇷
Note
Your environment
First found private document site written in Japanese:
The above commands are run in Ubuntu 22.04 on WSL on Windows 11.
Self-service
The text was updated successfully, but these errors were encountered: