Spell Check Your Docusaurus Site with Unified Engine and Retext

Chris Whong
3 min readAug 27, 2024

--

Unified.js is “a friendly interface backed by an ecosystem of plugins built for creating and manipulating content”.

It converts text in to structured data, allowing you to check for patterns, enforce rules, manipulate, transform to other formats, etc. To work with it, you must assemble a “process”, or a series of plugins to parse, transform, and then stringify input text.

Below is a minimal node.js script to spell check content on a docusaurus website using unified. To run it, create a file scripts/spell-check.mjs and install the dependencies npm i --save-dev unified-engine unified remark-parse remark-retext retext-english retext-spell dictionary-en

Run it withnode scripts/spell-check.mjs Note that it runs independently and isn’t doing anything docusurus-specific (so it should also work with any collection of markdown files).

import { engine } from 'unified-engine';
import { unified } from 'unified';
import remarkParse from 'remark-parse';
import remarkRetext from 'remark-retext';
import retextEnglish from 'retext-english';
import retextSpell from 'retext-spell';
import dictionary from 'dictionary-en';

engine(
{
processor: unified(),
files: ['./docs/**/*.md'],
color: true,
defaultConfig: {
plugins: [ // define the process
remarkParse, // parse markdown/mdx
[
remarkRetext, // extract text from the parsed markdown
unified().use({ // pass the text into a retext process
plugins: [
retextEnglish, // parse english text
[
retextSpell, // spell check the text against the dictionary
{
dictionary,
personal: [ // define a personal dictionary
'docusaurus',
'dropdown'
].join('\n')
}
]
]
})
]
]
}
},
(error, code) => {
if (error) console.error(error);
process.exit(code);
}
);

What’s happening here?

unified lets you us a process on a single file, but unified-engine allows us to run the same process on many files. Here we specify a glob pattern for the docusaurus docs files as the files option when calling unified engine, so it will run the process on all of the .md files in ./docs

The top-level process uses remark plugins which handle markdown syntax. remarkParse parses the markdown and passes it as structured data to remarkRetext, which has a second process (a `retext` process) defined in its options. retext is all about quality control for written text, and spits out console warnings when any of the rules defined in its plugins are violated.

In the retext process, we parse English text with retextEnglish and then check it against a dictionary using retextSpell . We also add a personal dictionary as a string for words we don’t want retextSpell to warn on.

The script does its thing and spits out a bunch of warnings in the console. Here’s what it finds on a clean install of docusaurus.

To show it catching actual misspelled words, I’ll add the following snippet to translate-your-site.md :

“Thes quik brawn focx jums oover tha la-z dawg”

If you add each offending word to the personal dictionary, or otherwise fix/remove it by editing the text in the markdown files, you’ll see satisfying green messages in the console after running the script.

You could opt for errors instead of warnings and run a script like this as part of your CI checks to make sure no misspelled words make it to your production site.

You can also add other retextplugins to check for things like:

…and lots more.

My predecessors at Mapbox created sophisticated tooling based on unified , remark , and retext to help maintain high standards in our product documentation. In addition to spell checking, we also add custom MDX linting and prose style rules to enforce consistency in a complex static site documentation platform that may receive dozens of updates daily from contributors spread across the organization.

I hope this snippet helps you improve your docs with unified!

--

--

Chris Whong

Urbanist, Technologist, Mapmaker. Developer Relations @Mapbox