Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add US English spell checker/linter for markdown #12522

Closed
hamishwillee opened this issue Jan 30, 2022 · 16 comments · Fixed by #35224
Closed

Add US English spell checker/linter for markdown #12522

hamishwillee opened this issue Jan 30, 2022 · 16 comments · Fixed by #35224
Labels
infra Infrastructure issues (npm, GitHub Actions, linting) for this repo needs decision The task needs consensus through discussion

Comments

@hamishwillee
Copy link
Collaborator

We should have a US English spell checker.

#373 changed all know UK English spellings to US English, as per MDN policy. However this is likely to rot. As discussed in #10787 we should also have an automated spell checker system in place for English content.

@hamishwillee hamishwillee added needs triage Triage needed by staff and/or partners. Automatically applied when an issue is opened. infra Infrastructure issues (npm, GitHub Actions, linting) for this repo and removed needs triage Triage needed by staff and/or partners. Automatically applied when an issue is opened. labels Jan 30, 2022
@nschonni
Copy link
Contributor

FYI, here are the config files I've been using with cspell to look at the diffs on the custom dictionary to find new misspellings https://gist.github.com/nschonni/4d271853c2af1612bcf9f5c739620ce5

@bsmth
Copy link
Member

bsmth commented Aug 10, 2023

We may consider checking in a project dictionary under ./vscode that can be updated incrementally using CSpell? This would work well with https://github.com/streetsidesoftware/cspell-action which can also run incrementally (on changed files only).

In my experience, content is looking okay (not overwhelmed with errors) with config dictionaries like this when working on PRs that edit a few markdown files only:

{
  "$schema": "https://raw.githubusercontent.com/streetsidesoftware/cspell/main/cspell.schema.json",
  "version": "0.2",
  "language": "en-US",
  "languageId": "*",
  "dictionaries": [
    "bash",
    "css",
    "cpp",
    "django",
    "filetypes",
    "fonts",
    "fullstack",
    "html",
    "latex",
    "lorem-ipsum",
    "markdown",
    "node",
    "npm",
    "project-words",
    "python",
    "softwareTerms",
    "svelte",
    "typescript"
  ],
  "ignorePaths": [
    ".vscode/cspell.json",
    ".vscode/extensions.json",
    ".markdownlint-cli2.jsonc"
  ],
  "allowCompoundWords": true,
  "dictionaryDefinitions": [
    {
      "name": "project-words",
      "path": "./project-words.txt",
      "addWords": true
    }
  ]
}

If we extend using the ignorelist from @nschonni, I think we're in a good place.

@nschonni
Copy link
Contributor

I believe there is/was a performance difference with a large custom dictionary inside the config file vs the external TXT

@nschonni
Copy link
Contributor

I normally use cspell --no-progress --words-only --unique '**' > project-words.txt || sort project-words.txt -o project-words.txt to generate the missing words dictionary. I would previously stage this file and git, then review the diffs to find new misspellings too

@bsmth
Copy link
Member

bsmth commented Aug 10, 2023

I believe there is/was a performance difference with a large custom dictionary inside the config file vs the external TXT

I see, so having a large external text file (project-words.txt) incurs a performance knock, does it?

@nschonni
Copy link
Contributor

nschonni commented Aug 11, 2023

No, I believe the external txt is more performant, as it gets compiled once, where the JSON file gets repeatedly parsed, so smaller is better for that file

@bsmth
Copy link
Member

bsmth commented Aug 11, 2023

No, I believe the external txt is more performant, as it gets compiled one, where the JSON file gets repeatedly parsed

Very good, I'm all for trying this out in parallel with our usual day-to-day and evaluating how it fits into our workflow after some time.

@bsmth
Copy link
Member

bsmth commented Aug 17, 2023

Minor update that the config is checked for local use:

@OnkarRuikar
Copy link
Contributor

OnkarRuikar commented Aug 28, 2023

I've gathered 5424 word list for this repo so far. If you wish we can merge this or use personally in local environment. Here is the config I used: https://github.com/OnkarRuikar/temp/blob/main/cspell.json.

I think we can start using cspell in our automations.

@Josh-Cena
Copy link
Member

Is everyone onboard with doing this in CI? Do we need a discussion?

@Josh-Cena Josh-Cena added the needs decision The task needs consensus through discussion label Jul 18, 2024
@OnkarRuikar
Copy link
Contributor

Is everyone onboard with doing this in CI? Do we need a discussion?

We can't fail a PR if someone adds a new acronym(like HDCP) or uses a word like unzoomed. We can make it a nightly/weekly check and file an issue if new/wrong words are found.

@Josh-Cena
Copy link
Member

Sure, that also sounds fine to me, like how you are regularly sending typo-fix PRs.

@OnkarRuikar
Copy link
Contributor

I can simply move the automation from my temp repo to mdn/content. This won't affect anything but let's discuss this in the next weekly meeting.

@bsmth
Copy link
Member

bsmth commented Jul 19, 2024

I've been using this action on different repositories and it's been quite good, something to consider:

jobs:
  spellcheck:
    name: "Check spelling"
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: streetsidesoftware/cspell-action@v6
        with:
          inline: error
          config: ".vscode/cspell.json"
          verbose: true
          # Limit the files checked to the ones in the pull request or push.
          incremental_files_only: true

@OnkarRuikar
Copy link
Contributor

I've been using this action on different repositories and it's been quite good, something to consider:

Could you share an example of it catching errors? How does it handle new words?

@bsmth
Copy link
Member

bsmth commented Aug 2, 2024

I've been using this action on different repositories and it's been quite good, something to consider:

Could you share an example of it catching errors? How does it handle new words?

It leaves annotations for spelling errors on the job, like:

spellcheck: content/posts/thing#L25
Unknown word (argaman)
spellcheck: files/thing.css#L115
Unknown word (diabled)
spellcheck
2 spelling issues found in 2 of the 44 files checked.

All of the failure logs from this workflow are expired, so I don't have an example available, but I can recommend trying it out as there's a lot of configuration possible that we won't have to re-roll: https://github.com/streetsidesoftware/cspell-action

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infra Infrastructure issues (npm, GitHub Actions, linting) for this repo needs decision The task needs consensus through discussion
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants