Skip to main content

Better word counts and reading time in Eleventy (11ty)

Sidebar

I recently [switched to using Eleventy](/2023/10/27/switched-to-eleventy/) to generate my blog. As part of this process, I needed to implement word counts for each of my posts. I made my own plugin for this as I was unable to find an existing one that met my requirements - they either used regex to parse HTML or included non-text like scripts and code snippets in the count.

I recently switched to using Eleventy to generate my blog. As part of this process, I needed to implement word counts for each of my posts. I made my own plugin for this as I was unable to find an existing one that met my requirements - they either used regex to parse HTML or included non-text like scripts and code snippets in the count.

The below code should work with any template engine, including Liquid and Nunjucks.

What exactly is a word? #

First of all, we need to work out exactly how we are going to count words.

A word is a unit of prose or writing. We need to make sure we exclude code snippets and scripts from the word count.

A naive approach would be to split text by spaces and count all the parts. However, this misses words that are joined together like “Dog/Cat”. Another approch would be to split text by all punctuation, but this would count words like “self-hosting” as two words.

To validate my word counter, I made a page where I dumped all the detected words from a post. I used this to refine the list of delimators.

Counting words #

Dependencies #

You need to install JSDom:

npm install --save jsdom

.eleventy.js #

In the eleventy config, you need to add our new plugin:

const pluginWordcount = require("./plugins/wordcount.js");

module.exports = function(eleventyConfig) {
    eleventyConfig.addPlugin(pluginWordcount);
    // You can only have one module.exports in a configuration file,
    // so make sure you add the above line to your existing one.
}

wordcount.js plugin #

This is the file for the plugin. It contains extractText to get all the text from HTML, and countWords to count the words in a piece of text.

const { UserConfig } = require("@11ty/eleventy");
const { JSDOM } = require("jsdom");

const TO_STRIP = [
    "code",
    "pre code",
    "script",
    ".header-anchor",
];

function extractText(html) {
    const dom = new JSDOM(html);
    const document = dom.window.document;

    // Remove non-text elements
    document.querySelectorAll(TO_STRIP.join(", ")).forEach(child => child.remove());

    return document.body.textContent;
}

const cache = {};

function countWords(value) {
    if (cache[value]) {
        return cache[value];
    }

    const result = extractText(value)
        .split(/[\s;/\\]/)
        .map(x => x.trim())
        // Word is non-empty with at least one letter or number
        .filter(x => x.match(/.*[a-z0-9].*/i))
        .length;

    cache[value] = result;
    return result;
}

module.exports = eleventyConfig => {
    eleventyConfig.addFilter("wordcount", countWords);
};

Inside post layout #

Here’s how you might use the wordcount filter inside a post layout that uses liquid:

{% assign wordcount = content | wordcount %}

{{ wordcount | divided_by: 238 | round }} min read
({{ wordcount }} words)

Unit tests #

As a bonus, here’s the unit tests I used when writing the word counter:

const { describe } = require("mocha");
const { expect } = require("chai");
const { count } = require("./wordcount");

const parameterisedTests = {
    "empty": {
        html: "",
        expected: 0,
    },

    "just symbols": {
        html: ". . -//!\"$%^&*()\\`",
        expected: 0,
    },

    "single word paragraph": {
        html: "<p> Hey! </p>",
        expected: 1,
    },

    "punctuation": {
        html: "<p>Hello world! This is a test, of the word/counter</p>",
        expected: 10,
    },

    "strips scripts": {
        html: `
            <p>Hello world! This is a test, of the word/counter</p>
            <script>
                alert("Hello world!")
            </script>
        `,
        expected: 10,
    },

    "strips code blocks": {
        html: `
            <p>Hello world! This is a test, of the word/counter</p>
            <pre>
                <code>
                    alert("Hello world!")
                </code>
            </pre>
        `,
        expected: 10,
    },

    "strips inline code": {
        html: `
            <p>Hello world! This is a <code>test</code>, of the word/counter</p>
        `,
        expected: 9,
    },

    "strips heading anchors": {
        html: `
            <h2>A heading</h2>
            <a class="header-anchor">1</a>
            <p>Hello world! One two</p>
        `,
        expected: 6,
    },

    "counts numbers but not symbols": {
        html: `
            <p>Hello world! You are 26.0 today - or so</p>
        `,
        expected: 8,
    },

    "words can contain hyphens": {
        html: `
            <p>Hello world! One-two three</p>
        `,
        expected: 4,
    },
}

describe("countWords", () => {
    Object.entries(parameterisedTests).forEach(([key, data]) => {
        it(key, () => {
            expect(count(data.html)).to.equal(data.expected);
        });
    });
});
rubenwardy's profile picture, the letter R

Hi, I'm Andrew Ward. I'm a software developer, an open source maintainer, and a graduate from the University of Bristol. I’m a core developer for Luanti, an open source voxel game engine.

Comments

Leave comment

Shown publicly next to your comment. Leave blank to show as "Anonymous".
Optional, to notify you if rubenwardy replies. Not shown publicly.
Max 1800 characters. You may use plain text, HTML, or Markdown.