Better word counts and reading time in Eleventy (11ty)
I recently switched to using Eleventy to generate my blog. As part of this process, I needed to implement word counts for each of my posts. I made my own plugin for this as I was unable to find an existing one that met my requirements - they either used regex to parse HTML or included non-text like scripts and code snippets in the count.
The below code should work with any template engine, including Liquid and Nunjucks.
What exactly is a word? #
First of all, we need to work out exactly how we are going to count words.
A word is a unit of prose or writing. We need to make sure we exclude code snippets and scripts from the word count.
A naive approach would be to split text by spaces and count all the parts. However, this misses words that are joined together like “Dog/Cat”. Another approch would be to split text by all punctuation, but this would count words like “self-hosting” as two words.
To validate my word counter, I made a page where I dumped all the detected words from a post. I used this to refine the list of delimators.
Counting words #
Dependencies #
You need to install JSDom:
npm install --save jsdom
.eleventy.js #
In the eleventy config, you need to add our new plugin:
const pluginWordcount = require("./plugins/wordcount.js");
module.exports = function(eleventyConfig) {
eleventyConfig.addPlugin(pluginWordcount);
// You can only have one module.exports in a configuration file,
// so make sure you add the above line to your existing one.
}
wordcount.js plugin #
This is the file for the plugin. It contains extractText
to get all the text
from HTML, and countWords to count the words in a piece of text.
const { UserConfig } = require("@11ty/eleventy");
const { JSDOM } = require("jsdom");
const TO_STRIP = [
"code",
"pre code",
"script",
".header-anchor",
];
function extractText(html) {
const dom = new JSDOM(html);
const document = dom.window.document;
// Remove non-text elements
document.querySelectorAll(TO_STRIP.join(", ")).forEach(child => child.remove());
return document.body.textContent;
}
const cache = {};
function countWords(value) {
if (cache[value]) {
return cache[value];
}
const result = extractText(value)
.split(/[\s;/\\]/)
.map(x => x.trim())
// Word is non-empty with at least one letter or number
.filter(x => x.match(/.*[a-z0-9].*/i))
.length;
cache[value] = result;
return result;
}
module.exports = eleventyConfig => {
eleventyConfig.addFilter("wordcount", countWords);
};
Inside post layout #
Here’s how you might use the wordcount filter inside a post layout that uses liquid:
{% assign wordcount = content | wordcount %}
{{ wordcount | divided_by: 238 | round }} min read
({{ wordcount }} words)
Unit tests #
As a bonus, here’s the unit tests I used when writing the word counter:
const { describe } = require("mocha");
const { expect } = require("chai");
const { count } = require("./wordcount");
const parameterisedTests = {
"empty": {
html: "",
expected: 0,
},
"just symbols": {
html: ". . -//!\"$%^&*()\\`",
expected: 0,
},
"single word paragraph": {
html: "<p> Hey! </p>",
expected: 1,
},
"punctuation": {
html: "<p>Hello world! This is a test, of the word/counter</p>",
expected: 10,
},
"strips scripts": {
html: `
<p>Hello world! This is a test, of the word/counter</p>
<script>
alert("Hello world!")
</script>
`,
expected: 10,
},
"strips code blocks": {
html: `
<p>Hello world! This is a test, of the word/counter</p>
<pre>
<code>
alert("Hello world!")
</code>
</pre>
`,
expected: 10,
},
"strips inline code": {
html: `
<p>Hello world! This is a <code>test</code>, of the word/counter</p>
`,
expected: 9,
},
"strips heading anchors": {
html: `
<h2>A heading</h2>
<a class="header-anchor">1</a>
<p>Hello world! One two</p>
`,
expected: 6,
},
"counts numbers but not symbols": {
html: `
<p>Hello world! You are 26.0 today - or so</p>
`,
expected: 8,
},
"words can contain hyphens": {
html: `
<p>Hello world! One-two three</p>
`,
expected: 4,
},
}
describe("countWords", () => {
Object.entries(parameterisedTests).forEach(([key, data]) => {
it(key, () => {
expect(count(data.html)).to.equal(data.expected);
});
});
});
Comments
Great post! I recommend using linkedom for parsing/processing HTML instead of JSDom though. Lightweight and fast!