Building a Style Guide Checker in Custom GPTs

Checking content against a style guide feels like something that AI can help with. It can - with some hiccups.

Building a Style Guide Checker in Custom GPTs
Claude says: "I've created a humorous SVG illustration showing a frustrated female user interacting with AI. The illustration includes stick figure with long hair (representing you) showing an exaggerated frustrated expression."

In my current freelance editing role, I’m working for a large site that’s rapidly expanding its topical authority. 

I've always worked on established affiliate or brand sites. This is a whole new challenge for me. And I absolutely love it.

We have a bunch of new writers joining the team, and lots of interesting changes all going on in parallel.

One of the challenges for an editor is keeping everyone on the same page with style guides. If your style guides are being built in real time, you have a moving target and a lot of changes to juggle.

Checking content against a style guide feels like something that AI can help with.

I’ve spent the last month testing that theory, with varying degrees of success!

Goal For 2024: Quick & Dirty Style Guide Checking

I wanted to create automated checks for the content writers had completed.

While I love nothing more than maintaining a high level of content quality, I am aware that some parts of my job can be automated. And if I can automate them, that's a good thing, because I can do something more useful with that time.

(Those who know me will understand that I don’t write with AI, but I like to use it to automate away the boring, repetitive parts of content creation.)

To meet our content production goals, I wanted to get an AI workflow in place during November and December. 

We’re producing content at a rapid pace.

  • I’ve had a lot of editing to do. And I don't have the option to involve developers (yet).
  • I wrote a style guide, which is as short and simple as it can possibly be.
  • I have the green light to use any AI tools I want as long as internal information isn't compromised.

I’ve worked with content teams on very large AI projects before, so this is all good news. How hard can it be?

Hold that thought!

Style Guide Checking: the Requirements

In the first instance, I needed:

  • A somewhat automated workflow, not a prompt-sharing doc that would be easy to miss
  • A tool with reasonable accuracy that writers can rely on (with some spot-checks)
  • And, something that was free to access (for the writers, not necessarily me). 

The last point unfortunately rules out Claude, since its Projects tool requires users to sign up to a Teams plan, and I didn’t want to incur costs for anyone on the team.

I can't use an API at this stage because I don't have developers to help. 

So it seemed like a custom GPT would be the best way to test out a new workflow to automate some style guide checks. 

ChatGPT vs Claude: Claude Wins Again

In testing, ChatGPT consistently got the style checks wrong.

The more I fought with it, the more wrong it would be.

At the same time, a Claude Project with the same instructions would get most of my checks right. (At least, it did when it wasn’t limited to Concise responses. But that’s a discussion for another day.)

I scratched my head for a long time trying to figure out why ChatGPT was so bad at doing what I was asking it to do. 

  • I tried putting the style guide in the Knowledge. This kinda worked, but ChatGPT kept repeating prohibited terms from the style guide in its results.
  • I put a simplified version in the Knowledge. This didn't make any difference.
  • I put an even simpler version in the instructions, leaving the Knowledge blank. At this point, the style guide was just a checklist, so it was starting to lose its impact.
  • And, of course, I did what we all resort to when AI doesn't work. I tried adjusting the instructions and the prompt to beg, plead, coerce, and bargain with ChatGPT to stick to its tasks.

During all of this, I tried adjusting the custom GPT manually (by rewriting the instructions) and with the wizard. Despite the fact that it never worked as well as I'd have liked, I will say that adjusting the instructions manually worked better.

In fact, I got Claude to rewrite the custom GPT instructions several times. Claude did a much better job at formatting custom GPT prompts than the custom GPT builder. When I asked Claude to make the instructions more concise, the output from the custom GPT instantly improved.

But it still wasn't accurate enough.

Context Windows Really Matter

During this process, I learned a lot about the strengths and limitations of custom GPTs.

In the past, most of the teams I worked with used the Anthropic and Gemini APIs. ChatGPT didn't feature in my day-to-day content management work. I had only used custom GPTs for one-shot, simple tasks.

Comparing the quality of the results, I came to the realization that the issue was not with my prompt or my rules. It was the context window. I was expecting too much of custom GPTs.

  • GPT 4o has a 128,000-token window
  • Claude 3.5 Sonnet’s window is 200,000 tokens
  • Gemini 1.5 Pro is now up to 2 million.

This likely explains why I’ve had such a hard time getting consistent results from ChatGPT. It just loses track too quickly.

As an aside, I recently spotted MemoryPlugin and I'll be checking it out soon to see if it helps.

Building a Suite of Style Guide GPTs

I managed to get somewhat accurate results in custom GPTs after a lot of trial and error. For transparency, I should also say that we are looking into better options because hallucinations still creep in.

But for now, I broke everything down into chunks to get around the small context window. I have 3 GPTs doing something that Gemini could probably do in one prompt:

GPT #1 checks content for basic style elements: flow, typos, placeholder text, weird grammatical quirks, and so on.

It does not catch everything, but it does a reasonably good job of picking up on the biggest mistakes.

By giving this GPT the text of the article and not the HTML, I can avoid excessive hallucinations. That limits what it can do. (It can't check heading tags, for example).

However, it has occasionally surprised me by finding issues I didn't spot, or picking out "TK" in content when I had overlooked it.

This GPT ends every conversation by linking to the other two GPTs in the workflow.

GPT #2 handles my biggest bugbear: UTM parameter checks.

I need to ensure that all links to a particular domain are formatted the same way so that we get credit for revenue. But I hate checking UTMs in Google Docs because editing links is so awkward.

GPT #2 does a good job of finding the links in an HTML file. It's accurate most of the time.

  • If it catches incorrect UTMs, it will rewrite them in its answer.
  • It will also catch links that are malformed.
  • It can generate links with the correct UTMs on demand.

I have noticed that this GPT doesn't like to review the same file multiple times, or a different file with the same name. If I try to repeat a check, it starts to confuse itself. As a workaround, renaming the file helps to improve the accuracy of the output.

GPT #3 looks for internal linking opportunities.

This one is my favorite of the three.

Internal linking is always tricky. It's especially hard for writers who are brand new to a team. They don't have any historical context to know what to link to, or which posts are part of a topical cluster. In my current team, many don't have direct access to the CMS.

In Claude, I can paste a sitemap into a Project or a conversation and it will do a fantastic job of inferring the topic from the URLs. If I give it content at the same time, it will suggest good links, suggest good anchor text, and tell me exactly where it thinks the links should go.

In contrast, ChatGPT is really bad at inferring context. In the first few iterations, it was suggesting very weird links and writing nonsense to get them to fit in pretty much anywhere. I think this is down to the context window being small, but it's also an example of ChatGPT just being worse at content-related tasks in general.

So this part of the checking was the most difficult to 'convert' to a custom GPT.

After a lot of experimentation, I was able to improve the accuracy by:

  • Giving my custom GPT the URL and the title tag for each page, not just a list of URLs on their own
  • Using a CSV file with clear headings, not a sitemap file in XML format
  • Cutting the sitemap down by about one-third so that it had fewer URLs to choose from. Realistically, I don't need to link blog posts about B2B marketing to a random topic about cheese, so there's no point in even giving it that option.

I'm going to keep working on GPT #3 to improve it, but I know I'll be limited in what I can do with this one.

Next Test For the Team: Lex

My GPT workflow is doing the job for now. But I still find myself going back to Claude when I’m running checks on my own.

It consistently gets things right more often than ChatGPT. I know there is something better out there.

In the next few weeks, I’ll be looking at Lex. I know a few writers have recommended Lex as an alternative to Google Docs. The fact that we can share prompts would get me part-way to the result I was originally looking for.

The fact that it supports the Anthropic API sounds promising as well.

To close, I have to ask:

  • Have you tried Lex? I’d be interested to know! I’m especially keen to hear if you’ve replaced Google Docs with Lex, and whether you get good suggestions when writing.
  • Are you interested to learn more about the AI workflows I’m developing for the content team? Sign up to get updates! You can subscribe to this site to get my next post via email, and it's free.

More Custom GPTs to Try

I feel like I should close on a high note! Here are some custom GPTs that work well. I use and recommend them all the time.

  • Content Helpfulness and Quality SEO Analyzer by Aleyda Solis. I built my own Helpful Content Checker about a year ago, but Aleyda's version is better and easier to use.
  • Which Pages Impacted? Analyzes two periods from Google Search Console so you can see reasons for traffic drops. From Marie Haynes.
  • Reputation Builder Brainstormer. Another custom GPT from Marie Haynes. This one helps you get ideas to increase your topical authority (I have this one pinned!)
  • Next Question?! All of Steve Toth's GPTs are essential for SEO experimentation. This one looks at keyword intent and generates TOFU, MOFU, and BOFU questions.