Better word counts and reading time in Eleventy (11ty)

29 October 2023 1 min read (313 words)

Snippets Blogging Tutorials Full-stack dev

I recently switched to using Eleventy to generate my blog. As part of this process, I needed to implement word counts for each of my posts. I made my own plugin for this as I was unable to find an existing one that met my requirements.

The below code should work with any template engine, including Liquid and Nunjucks.

What’s wrong with other implementations? #

Other implementations either used regex to parse HTML or included non-text like scripts and code snippets in the count. Both of these result in inaccurate word counts.

What exactly is a word? #

First of all, we need to work out exactly how we are going to count words.

A word is a unit of prose or writing. We need to make sure we exclude code snippets and scripts from the word count.

A naive approach would be to split text by spaces and count all the parts. However, this misses words that are joined together like “Dog/Cat”. Another approch would be to split text by all punctuation, but this would count words like “self-hosting” as two words.

To validate my word counter, I made a page where I dumped all the detected words from a post. I used this to refine the list of delimators.

Counting words #

Dependencies #

You need to install JSDom:

npm install --save jsdom

.eleventy.js #

In the eleventy config, you need to add our new plugin:

const pluginWordcount = require("./plugins/wordcount.js");

module.exports = function(eleventyConfig) {
    eleventyConfig.addPlugin(pluginWordcount);
    // You can only have one module.exports in a configuration file,
    // so make sure you add the above line to your existing one.
}

wordcount.js plugin #

This is the file for the plugin. It contains extractText to get all the text from HTML, and countWords to count the words in a piece of text.

const { UserConfig } = require("@11ty/eleventy");
const { JSDOM } = require("jsdom");

const TO_STRIP = [
    "code",
    "pre code",
    "script",
    ".header-anchor",
];

function extractText(html) {
    const dom = new JSDOM(html);
    const document = dom.window.document;

    // Remove non-text elements
    document.querySelectorAll(TO_STRIP.join(", ")).forEach(child => child.remove());

    return document.body.textContent;
}

const cache = {};

function countWords(value) {
    if (cache[value]) {
        return cache[value];
    }

    const result = extractText(value)
        .split(/[\s;/\\]/)
        .map(x => x.trim())
        // Word is non-empty with at least one letter or number
        .filter(x => x.match(/.*[a-z0-9].*/i))
        .length;

    cache[value] = result;
    return result;
}

module.exports = eleventyConfig => {
    eleventyConfig.addFilter("wordcount", countWords);
};

Inside post layout #

Here’s how you might use the wordcount filter inside a post layout that uses liquid:

{% assign wordcount = content | wordcount %}

{{ wordcount | divided_by: 238 | round }} min read
({{ wordcount }} words)

Unit tests #

As a bonus, here’s the unit tests I used when writing the word counter:

const { describe } = require("mocha");
const { expect } = require("chai");
const { count } = require("./wordcount");

const parameterisedTests = {
    "empty": {
        html: "",
        expected: 0,
    },

    "just symbols": {
        html: ". . -//!\"$%^&*()\\`",
        expected: 0,
    },

    "single word paragraph": {
        html: "<p> Hey! </p>",
        expected: 1,
    },

    "punctuation": {
        html: "<p>Hello world! This is a test, of the word/counter</p>",
        expected: 10,
    },

    "strips scripts": {
        html: `
            <p>Hello world! This is a test, of the word/counter</p>
            <script>
                alert("Hello world!")
            </script>
        `,
        expected: 10,
    },

    "strips code blocks": {
        html: `
            <p>Hello world! This is a test, of the word/counter</p>
            <pre>
                <code>
                    alert("Hello world!")
                </code>
            </pre>
        `,
        expected: 10,
    },

    "strips inline code": {
        html: `
            <p>Hello world! This is a <code>test</code>, of the word/counter</p>
        `,
        expected: 9,
    },

    "strips heading anchors": {
        html: `
            <h2>A heading</h2>
            <a class="header-anchor">1</a>
            <p>Hello world! One two</p>
        `,
        expected: 6,
    },

    "counts numbers but not symbols": {
        html: `
            <p>Hello world! You are 26.0 today - or so</p>
        `,
        expected: 8,
    },

    "words can contain hyphens": {
        html: `
            <p>Hello world! One-two three</p>
        `,
        expected: 4,
    },
}

describe("countWords", () => {
    Object.entries(parameterisedTests).forEach(([key, data]) => {
        it(key, () => {
            expect(count(data.html)).to.equal(data.expected);
        });
    });
});

Full example #

You can see a full working example in my eleventy writing stats example repo: https://gitlab.com/rubenwardy/eleventy-stats-example.

Comments

Reply by email Reply by contacting me Reply on Mastodon

29 October 2023 at 13:20 UTC

uncenter

Great post! I recommend using linkedom for parsing/processing HTML instead of JSDom though. Lightweight and fast!