Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New N-API bindings (stable ABI) to the Windows and macOS native spell-checker APIs #595

Open
rotemdan opened this issue Nov 1, 2024 · 2 comments

Comments

@rotemdan
Copy link
Contributor

rotemdan commented Nov 1, 2024

Earlier today I looked at the Windows code in the deprecated atom/node-spellchecker package that's used in the extension.

Given I'm a Windows user, and I have a working example in atom/node-spellchecker, and some recent experience with N-API development, I thought I could write a new binding, but make it simpler and more maintainable, since the current one is using outdated and unstable APIs.

It's now published as the windows-spellchecker npm package (MIT license). Repository is here.

The core C++ addon is this single .cpp file.

Main differences to the approach in node-spellchecker:

  • Uses the newer napi.h C++ API (and tried to ensure only basic ones). With NAPI_VERSION = 8, which is stable across Node.js versions starting at Node v12.0.0 up to now and future (Edit: the headers currently used support Node v18.0.0 and Upwards). It should also work across different Electron.js versions (not tested yet - will make any changes needed for that)
  • No C++ threads. It's adds way too much complexity and is really not necessary, especially with caching and single-word lookups. Multithreading can be done using worker threads on the JavaScript side
  • Only basic, single-word operations (testSpelling and getSpellingSuggestions). addWord and removeWord (Win10+) are implemented but will add and remove from the user-level system dictionary!
  • No fallback to Hunspell. There's the WASM version now.
  • No node-gyp, MAKE, CMAKE or anything of that sort. It's built using plain cl.exe (MSVC compiler) and dlltool (part of the binutils package in msys2) to produce node_api.lib. Edit: Although it was working fine in multiple Node.js version with the basic approach, it was very hard to get it working in Electron.js without node-gyp, so I eventually switched to node-gyp
  • No validation of JavaScript arguments in C++, it's all in done in JavaScript wrappers
  • Includes support for removeWord, which is only supported in Windows 10 or newer

I'll integrate this one as well, and test it along with the WebAssembly one (which I'm currently continuously testing).

Once it's integrated and working, it'll reduce the need to rebuild the binding over and over again.

The macOS binding can also be rewritten this way, but that's for the future (I don't have a macOS machine, so I can only test it in a VM - by repeatedly syncing source files using SSH - trying to use a development UI in the VM is a completely unusable experience).

@rotemdan rotemdan changed the title New minimalist N-API binding (stable ABI) to the Windows native spell-checker API New N-API binding (stable ABI) to the Windows native spell-checker API Nov 3, 2024
@rotemdan
Copy link
Contributor Author

rotemdan commented Nov 3, 2024

Some updates

  • I renamed the npm package to windows-spellchecker, since I realized, given the work already done, it would not be difficult to cover more features in the future - in particular being able to check spelling for entire text segments, not just single words (since that's likely more efficient and better reflects the native API)
  • The initial build worked in multiple Node.js versions, but had trouble in Electron.js. I switched to use node-gyp because there's some complexity in Windows to add a particular DLL loading hook that makes it work in Electron.js and other alternative runtimes (very hard to do manually since it apparently requires precise compiler arguments to work). Once I made the switch, the same build works in Electron.js without any problems
  • I've verified the single published .node addon works in multiple versions of Node.js (18, 20, 22, 23) and Electron.js (32, 33)
  • I've integrated it and continuously testing it in vscode-spellright
  • The call rate to testSpelling in the Windows binding is roughly 7 to 10 times slower than hunspell-wasm (I'll run more accurate benchmarks in the future). The approaches I see to make it faster are of course caching (for already-seen words) - which I've already implemented in the VSCode extension, or passing "batches" of words at once, separated by line-breaks, like word1 \n word2 \n word3 \n word4 \n ..., or pass an entire segment of the text (like a paragraph) and extract the word ranges from the results (that's the way the Windows API processes it). That should reduce the overhead of the individuals calls for each word, which is now likely a bit high
  • Some of the knowledge I gained from working on this is now applied back in my audio-io package. I'm getting the 3 addons there (Windows, macOS, Linux) working in Electron.js, which requires some modifications of the C++ code due to Electron.js (20+) N-API restrictions on allocating ArrayBuffers with external memory, which I didn't know about before

@rotemdan rotemdan changed the title New N-API binding (stable ABI) to the Windows native spell-checker API New N-API bindings (stable ABI) to the Windows and macOS native spell-checker APIs Nov 4, 2024
@rotemdan
Copy link
Contributor Author

rotemdan commented Nov 4, 2024

More updates

  • Got the macOS bindings also redone using N-API! The npm package is published as macos-spellchecker and the repository is here. It includes addons for both x64 and arm64.
  • Writing the macOS addon turned out to be simpler than the Windows one (mostly because Windows COM is not very nice to work with). It took about 4 hours in total, and although I don't know any Objective C, I still got it to work, mostly with the help of Gemini 1.5 Pro and the example code from Atom's addons (LLMs can sometimes a good job if you give them working code examples)
  • The new macOS addon is actually more safe due the fact it creates a private instance of NSSpellChecker rather than using an instance shared for the current process (which the Atom addon uses). It prevent potential collisions between different instances running on the same process - for example, each one selects a different language.
  • Edit: Added comprehensive error checking to the N-API macOS addon, improved over the one in the Atom addon (it also catches Objective C exceptions). The changes were assisted by Claude 3.5 Sonnet (New) LLM, which is likely the current best LLM for programming
  • Got arm64 addon built for the Windows spell-checker (for Windows 11 arm64 versions), which turned out not to be that difficult (required just installing arm64 build tools through the Visual Studio installer)

Next steps: decoupled, reusable, backend module

Now that we've got the addons redone, I think the best way to use them is not actually to integrate them directly to the VSCode extension, but to write a kind of a "backend" library that acts as a kind of "spellcheck provider" or "spellcheck server" (though it wouldn't necessarily run as an actual server).

My general idea is that this provider would be a reusable npm package that would create a background worker, like a new Node.js worker thread (or WebWorker), and load the addons in the worker (and the WASM one as well) in the background (also can be potentially multithreaded and create multiple workers, but that's for the future). The backend would be fully independent from any UI, and can also be tested and used as a plain library, or from a CLI app.

The backend would provide higher-level tools, I'm imagining a kind of "diff-based" approach (inspired by React), where:

  • You create a "document context"
  • You send a full document to the backend, for that context
  • The backend asynchronously streams the spelling errors it finds (engine can be configured)
  • Every time the document is updated, you drop another full version of the document
  • The backend is able to cache segments (not necessarily just words) of the previous version of the document, for example, it can compute hashes of individual lines, and if the hash of a line matches a previous one (not necessarily at the same location), it loads the errors found in it from the cache. In this way it can be made even faster than caching individual words, and scale to longer documents
  • The backend would also be able to incrementally return only errors that weren't know before (using a kind of "diff"). That is a bit more difficult to implement, but I worked with diff-like algorithms before

In the backend I can also perform more accurate word segmentation, like with the cldr-segmentation library, which I heavily use in Echogarden (a speech toolset I'm developing - which also includes several natural language processing features, as part of it). cldr-segmentation has language-dependent datasets of abbreviations that can more accurately distinguish between . characters used as part of word, to ones that signify sentence ending or other punctuation, as well as various numeric patterns that may include ..

Better word segmentation can also help in reducing the number of false positives due to things like special characters, separators, etc. like is seen in the "Alice in Wonderland" example with Hunspell (it likely happens because Hunspell doesn't properly trim the non-word characters sent to it - it can be fixed by using more accurate word segmentation which would naturally "trim" the unwanted characters from the word))

The word segmentation itself can also be cached in the backend. We'll see.

I don't see the amount of work to implement this to be very large. I think actually redoing the native addons was generally more difficult.

Edit: forgot to mention. Very soon I'm publishing a new language detection library called echo-ld (Echogarden Language Detection library). It is very accurate (for example can differentiate between different Italian dialects and even between Norwegian Nynorsk vs Bokmål - to some limited degree). I'll likely integrate that to the backend as well - it could technically be used detect spelling errors even in a document that contains different languages (by detecting the language of different segments independently and applying the correct dictionary for that language).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant