Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autogenerated sidebar fails to parse category indeces when categories have non-latin characters #8124

Closed
6 of 7 tasks
birjj opened this issue Sep 22, 2022 · 6 comments · Fixed by #8137
Closed
6 of 7 tasks
Labels
bug An error in the Docusaurus core causing instability or issues with its execution

Comments

@birjj
Copy link
Contributor

birjj commented Sep 22, 2022

Have you read the Contributing Guidelines on issues?

Prerequisites

  • I'm using the latest version of Docusaurus.
  • I have tried the npm run clear or yarn clear command.
  • I have tried rm -rf node_modules yarn.lock package-lock.json and re-installing packages.
  • I have tried creating a repro with https://new.docusaurus.io.
  • I have read the console error message carefully (if applicable).

Description

The sidebar autogenerator normally checks for category indeces, converting the category into a link if there is nothing but an index in it. It does this by checking if the file is named "index", "readme", or the name of the category itself.

This process fails if the category has a name that contains non-latin characters, e.g. "æ". In that case the file will not be seen as an index (regardless of its name) [edit: if running on Windows; the issue does not reproduce on Unix]

Reproducible demo

https://github.com/birjj/docusaurus-repro-8124

Steps to reproduce

  1. Create a new folder in docs, name it something using only latin characters (e.g. docs/abc).
  2. Inside of this folder, create an index.md (e.g. docs/abc/index.md). Docusaurus should now show the category as a simple link, since index.md is seen as the category index, and no other doc is contained in the category.
  3. Duplicate the folder, name it something containing non-latin characters (e.g. docs/æøå). Make sure the index file is also copied over (e.g. docs/æå/index). This folder will not be collapsed, even though the only difference from before is its name.

Your docs file structure should look like:

docs
 ├─ abc
 │  └─ index.md
 └─ æøå
    └─ index.md

Expected behavior

The category index should be detected regardless of the category name.

Actual behavior

The category index is not detected for the category with non-latin characters in its name. The category therefore isn't show as a simple link:

Screenshot of the Docusaurus sidebar, showing a collapsible 'æøå' category and a link named 'ABC'

Your environment

Self-service

  • I'd be willing to fix this bug myself.
@birjj birjj added bug An error in the Docusaurus core causing instability or issues with its execution status: needs triage This issue has not been triaged by maintainers labels Sep 22, 2022
@birjj
Copy link
Contributor Author

birjj commented Sep 22, 2022

This appears to only be an issue on Windows. When building on a Unix server (e.g. the public URL) this issue does not reproduce.
I have confirmed it to reproduce in both dev and build on Windows.

@birjj
Copy link
Contributor Author

birjj commented Sep 22, 2022

After providing my own isCategoryIndex, and logging the arguments it is given, I see a difference in the fileName value:

presets: [
  [
    "classic",
    /** @type {import('@docusaurus/preset-classic').Options} */
    ({
      docs: {
        /* ... */
        async sidebarItemsGenerator({ defaultSidebarItemsGenerator, isCategoryIndex, ...args }) {
          const newIsCategoryIndex = (args) => {
            const outp = isCategoryIndex(args);
            console.log(args, outp);
            return outp;
          };
          const sidebar = defaultSidebarItemsGenerator({
            ...args,
            isCategoryIndex: newIsCategoryIndex,
          });
          return sidebar;
        },
      }
    }),
  ],
],
{
  fileName: 'index',
  extension: '.md',
  directories: [ 'abc' ]
} true
{
  fileName: 'docs\\æøå\\index',
  extension: '.md',
  directories: [ 'æøå' ]
} false

It looks like the fileName is not correctly split when the path contains non-latin characters

@Josh-Cena
Copy link
Collaborator

Thanks for the detailed report and taking time to look into this! We have this weird logic:

/**
* Convert Windows backslash paths to posix style paths.
* E.g: endi\lie -> endi/lie
*
* Returns original path if the posix counterpart is not valid Windows path.
* This makes the legacy code that uses posixPath safe; but also makes it less
* useful when you actually want a path with forward slashes (e.g. for URL)
*
* Adopted from https://github.com/sindresorhus/slash/blob/main/index.js
*/
export function posixPath(str: string): string {
const isExtendedLengthPath = str.startsWith('\\\\?\\');
// Forward slashes are only valid Windows paths when they don't contain non-
// ascii characters.
// eslint-disable-next-line no-control-regex
const hasNonAscii = /[^\u0000-\u0080]+/.test(str);
if (isExtendedLengthPath || hasNonAscii) {
return str;
}
return str.replace(/\\/g, '/');
}

I don't have access to a Windows device so I can't tell if that logic is still relevant today. Do you have sources on what kind of characters are accepted by Windows if the path is POSIX?

@birjj
Copy link
Contributor Author

birjj commented Sep 22, 2022

I do not have any official information, and have been unable to find any. The closest information I've gotten was from the .NET documentation for Path.AltDirectorySeparatorChar which mentions that

Windows supports either the forward slash (which is returned by the AltDirectorySeparatorChar field) or the backslash (which is returned by the DirectorySeparatorChar field) as path separator characters, while Unix-based systems support only the forward slash

Another interesting resource is from IBM, stating that

In Windows the forward slashes and back slashes are equivalent, but many programs will accept only backward slashes

On an anecdotal basis I can confirm that both C:/Users/me/Documents/dev/test/abc and C:/Users/me/Documents/dev/test/æøå work in Windows File Explorer and in Node.js to access the folder at the given location:

import fs from "fs";
console.log(fs.statSync("C:/Users/me/Documents/dev/test/abc/test.txt")); // { ... }
console.log(fs.statSync("C:/Users/me/Documents/dev/test/æøå/test.txt")); // { ... }

@Josh-Cena
Copy link
Collaborator

Okay... hmmm... we may be better off normalizing all backslashes. Can simplify a lot of logic. We can see if someone uses arcane characters and complains.

@Josh-Cena Josh-Cena removed the status: needs triage This issue has not been triaged by maintainers label Sep 22, 2022
@birjj
Copy link
Contributor Author

birjj commented Sep 22, 2022

I have done a bit of digging and found:

  1. The code in Docusaurus is adapted from sindresorhus/slash, which contains the ASCII limitation. That repository links to this SuperUser answer to justify the limitation, but that answer doesn't mention any ASCII limitations.

  2. Another answer to the SuperUser question links to Microsoft documentation they claim says that Windows supports both backslash and forwards slash as path separator. That specific documentation is only found on a previous version of the documentation page, but has since been moved to the documentation page for the Maximum Path Length Limitation. It says:

    Note: File I/O functions in the Windows API convert "/" to "\" as part of converting the name to an NT-style name, except when using the "\\?\" prefix as detailed in the following sections.

    Assuming that Node.js uses the native File I/O functions to interact with the file system, this should apply to JS code too.

That is to say: all Windows paths should support forward slash as path separator, except if it's an extended-length path (identified by the prefix \\?\).

I have asked for the justification of the ASCII limitation in the original project (see sindresorhus/slash#19).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An error in the Docusaurus core causing instability or issues with its execution
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants