Building the mother of all content inventories

(Originally written in July 2012 for the EU’s Waltzing Matilda blog. I updated it in November 2014)

What is a content inventory?

A list of all pages on your site. It is an enormous spreadsheet.

Why it is important

Content inventory is essential for all sites. Here’s what it can unearth:

  • Content gaps: by knowing what you have, you’ll see what you need
  • Overlap: are two pages about the same topic?
  • Content that is no longer needed
  • Out of date or inaccurate content
  • Broken links: are you sending people to pages that have changed their emphasis or no longer exist?
  • Navigation and structural inconsistencies
  • Bias and balance issues: is there too much content about certain aspects or topics?
  • The content eco-system: how is content made? Is the structure straining or broken? What are the problems and issues staff face? What are the main content problems?
  • All kinds of oddities such as unfinished pages, content in the wrong place, content that hasn’t been written for the web.

In her book, Content Strategy for the Web, Kristina Halvorson says:

“If you don’t know what content you have now, you can’t make smart decisions about what needs to happen next.”

It’s an arduous, fiddly and time-consuming but valuable process to go through. I’m currently building a content inventory template that could be used at all stages of the content life-cycle: review, creation and maintenance.

Here’s the content inventory template. It is a work-in-progress. So I’d be keen to know what readers think (make your comments below). Are there any fields missing? Any fields that need further explanation?

How to use the template

Have a look at the notes sheet and the example first. Click through to every page of your website. Record what you find on the blank sheet.

How to select pages to add to your inventory:

  • use your website navigation: keep drilling down to each level
  • check against the site map and A-Z index
  • Click links on the website in case they go to buried pages!

What to record:

The amount of detail for each page is up to you, your role on the website and the reason for the inventory.  Here are the essential fields you need to fill in to capture qualitative information about the size and shape of your website.

  • An ID number for each page. This shows where in the structure each page currently lives.
  • Navigation title: The name of the page as it is displayed in main navigation (usually from a navigation bar). Each level is indented to more easily see the hierarchy. The navigation title may not be the same as the page title
  • Page title: This column shows the page title – the text shown at the top of the browser and in search results
  • Link current URL: The page filename. Shown in the browser. You could use this column to provide hyperlinks to the individual pages
  • Meta description: These provide a concise explanation of the content of a web page. They are often shown by Google and Bing on their search results pages
  • What is this page about? A short description of the information on the page
  • Files and links: What pages or files does this page link to?
  • Last updated: a date
  • Expiry or ‘review by’ date: When this page needs reviewing
  • Content creator: Who created this content?
  • Page owner: Who maintains this page?

The example shows useful Excel features:

  • Group & outline: Show and hide the deeper pages in each section.
  • Sort: You could sort by page owner to see whether common problems occur. You could sort by quality to see which sections have the best content.

From ‘inventory’ to ‘audit’ to ‘tracker’

This template not only includes columns for quantitative information (outlined above), but the further right you go, the columns capture qualitative information such as how up to date it is, any overlap, other issues such as web writing quality and navigational inconsistencies. This turns the tool into more of a content audit.

On the far right are columns that turn this into a working document – a content tracker – for when you’ve decided which pages should be deleted, which kept and which revised.

This document could then form the basis for an ongoing process. As pages are created, edited, deleted, or moved, you can adjust your spreadsheet accordingly.

Automated tools

You could kick-start the process by using automated tools. Perhaps your content management system can output a list of pages to Excel. Xenu’s Link Sleuth or the CAT Tool could do a similar thing. These will get all the URLs onto a spreadsheet, so no pages are missed out. But it is useful to have a human spot things. You may miss these by automating the first stages.

Thanks to experienced website writer and editor Nancy Duin for user testing the inventory and help text.

Additional reading

The Content Inventory is Your Friend by Kristina Halvorson

The Rolling Content Inventory by Louis Rosenfeld

A Checklist for Content Work by Erin Kissane

Content Audits and Inventories, a book by Paula Ladenburg Land

Government jargon hit-list

When the Local Government Association put up a list of public sector jargon and their alternatives a few years ago I jumped for joy. Unfortunately they have now hidden most of their web content behind a paywall. Thankfully I found the old pages on

Here is their list. My favourites are ‘predictors of beaconicity’ and ‘mainstreaming’.

Continue reading

EU’s Waltzing Matilda blog

I’ve been a bit quiet over on this blog for a while as I’ve been writing a series of blog posts for the European Union on content strategy.

Here’s my first five posts:

I’m currently writing posts on government and content strategy, usability testing and content, archiving, plus key content strategy processes. I’ll link to them from here when they are live.