How to build SEO agent skills that actually work

How to build SEO agent skills that actually work

I’ve constructed 10+ SEO agent expertise in 34 days. Six labored on the primary strive. The opposite 4 taught me all the things I’m about to point out you concerning the folder construction most LinkedIn posts about AI website positioning expertise gloss over.

What makes these brokers dependable isn’t higher prompts. It’s the structure behind them. Right here’s methods to construct an agent from scratch, take a look at it, repair it, and ship it with confidence.

Why most AI website positioning expertise fail

Right here’s what a typical “AI website positioning immediate” appears to be like like on LinkedIn:

You're an website positioning professional. Analyze the next web site and supply a complete audit with suggestions.

That’s it. One immediate. Perhaps some formatting directions. The particular person posts a screenshot of the output, will get 500 likes, and strikes on. The output appears to be like skilled. It reads properly. It’s additionally 40% incorrect.

I do know as a result of I attempted this precise strategy. Early within the construct, I pointed an agent at a web site and stated, “discover website positioning points.” It got here again with 20 findings. Eight didn’t exist. The agent had by no means visited among the URLs it was reporting on.

Three issues kill single-prompt expertise:

  • No instruments: The agent has no method to truly examine the web site. It’s working from coaching information and guessing. Whenever you ask, “Does this web site have canonical tags?” the agent imagines what the positioning in all probability appears to be like like fairly than fetching the HTML and parsing it.
  • No verification: No person checks if the output is true. The agent says, “lacking meta descriptions on 15 pages.” Which 15? Are these pages even listed? Are they noindexed on objective? Nobody asks. Nobody verifies.
  • No reminiscence: Run the identical ability twice, you get totally different output. Totally different construction. Totally different severity labels. Typically totally different findings fully. There’s no consistency as a result of there’s no template, no schema, no report of previous runs.

In case your ability is a immediate in a single file, you don’t have a ability. You will have a coin flip.

Construct website positioning agent expertise as workspaces

Each agent in our system has a workspace. Consider it like a brand new rent’s desk, stocked with all the things they want. Right here’s what the workspace appears to be like like for the agent that crawls web sites and maps their structure:

agent-workspace/
  AGENTS.md          directions, guidelines, output format
  SOUL.md            persona, rules, high quality bar
  scripts/
    crawl_site.js    device the agent calls to crawl
    parse_sitemap.sh device to learn XML sitemaps
  references/
    standards.md      what counts as a problem vs noise
    gotchas.md       identified false positives to observe for
  reminiscence/
    runs.log         previous execution historical past
  templates/
    output.md        anticipated output construction

Six parts. One immediate file would cowl perhaps 20% of this.

AGENTS.md is the instruction handbook 

I wrote hundreds of phrases of methodology into AGENTS.md.  As an alternative of “crawl the positioning,” I laid out the steps: “Begin with the sitemap. If no sitemap exists, examine /sitemap.xml, /sitemap_index.xml, and robots.txt for sitemap references. 

Respect crawl-delay. Use a browser user-agent string, by no means a naked request. In case you get 403s, be aware the sample and take a look at with totally different headers earlier than reporting it as a block.”

Scripts are the agent’s instruments

The agent calls node crawl_site.js –url to investigate web site information. It doesn’t write curl instructions from scratch each time. That’s the distinction between giving somebody a toolbox and telling them to forge their very own wrench.

References are the judgment calls

This comprises standards for what counts as a problem. Identified false positives to observe for. Edge instances that took me 20 years to study. The agent reads these when it encounters one thing ambiguous.

Reminiscence is institutional data

Right here I hold a log of previous runs:

  • What it discovered final time. 
  • How lengthy the crawl took. 
  • What broke. 

The subsequent execution advantages from the final.

Templates implement consistency 

That is the place I get particular concerning the output I need: “Use this precise construction. These precise fields. This severity scale.” Output templates are the distinction between getting the identical high quality in run 14 as you probably did in run 1.

Walkthrough: Constructing the crawler from scratch

Let me present you precisely how I constructed the crawler. It maps a web site’s structure, discovers each web page, and studies what it finds.

Model 1: The naive strategy

I supplied the instruction: “Crawl this web site and listing all pages.”

The agent wrote its personal HTTP requests, used naked curl, and acquired blocked by the primary web site it touched. Each trendy CDN blocks requests with out a browser user-agent string, so it was lifeless on arrival.

Model 2: Added a script

I constructed crawl_site.js utilizing Playwright. This model used a headless browser and an actual user-agent. The agent calls the script as an alternative of writing its personal requests.

This labored on small websites, however it crashed on something over 200 pages. As a result of there was no price limiting and no resume functionality, it hammered servers till they blocked us.

Model 3: Introducing price limiting and resume

I added throttling with a two requests per second default and by no means each two seconds for CDN-protected websites. The agent reads robots.txt and adjusts its pace with out asking permission. I additionally added checkpoint recordsdata so a crashed crawl can resume from the place it stopped.

This labored on most websites, however it failed on websites that require JavaScript rendering.

Model 4: JavaSript rendering

This time, I added a browser rendering mode. The agent detects whether or not a web site is a single-page app (React, Subsequent.js, Angular) and routinely switches to full browser rendering.

It additionally compares rendered HTML towards supply HTML, and I discovered actual points this fashion: Websites the place the supply HTML was an empty shell however the rendered web page was filled with content material. Google may or may not render it correctly. Now we examine each.

This model labored on all the things, however the output was inconsistent between runs.

Model 5: Time for templates and reminiscence

For this model, I added templates/output.md with precise fields: URL rely, sitemap protection, blocked paths, response code distribution, render mode used, and points discovered. This manner each run produces the identical construction.

I additionally added reminiscence/runs.log. The agent appends a abstract after each execution. Subsequent time it runs, it reads the log and might evaluate outcomes, like “Final crawl discovered 485 pages. This crawl discovered 487. Two new pages added.”

Model 5 is what we run right this moment. 5 iterations in someday of constructing.

THE CRAWLER'S EVOLUTION

  v1: Uncooked curl           → blocked in all places
  v2: Playwright script  → crashed on giant websites
  v3: Price limiting      → could not deal with JS websites
  v4: Browser rendering  → inconsistent output
  v5: Templates + reminiscence → steady, constant, dependable

  Time: 1 day. Lesson: the primary model by no means works.

The sample is all the time the identical: Begin small, hit a wall, repair the wall, hit the following wall.

5 variations in someday doesn’t imply 5 failures. It means 5 classes that at the moment are completely encoded. I’ve rebuilt supply methods 4 instances over 20 years. The method doesn’t change. You begin with what’s elegant, then actuality hits, and you find yourself with what works.

Tip: Don’t attempt to construct the proper ability on the primary try. Construct the only factor that would probably work. Run it on actual information and watch it fail. The failures let you know precisely what so as to add subsequent. Each model of our crawler was a direct response to a particular failure. Not a characteristic we imagined. An issue we hit.

Get the publication search entrepreneurs depend on.


That is crucial architectural determination I made.

Whenever you write “use curl to fetch the sitemap” in your directions, the agent generates a curl command from scratch each time. Typically it provides the correct headers. Typically it doesn’t. Typically it follows redirects. Typically it forgets.

Whenever you give the agent a script referred to as parse_sitemap.sh, it calls the script. The script all the time has the correct headers, all the time follows redirects, and all the time handles edge instances. The agent’s judgment goes into WHEN to name the device and WHAT to do with the outcomes. The device handles HOW.

Our brokers have instruments for all the things:

  • crawl_site.js: Playwright-based crawler with price limiting, resume, and rendering
  • parse_sitemap.sh: Fetches and parses XML sitemaps, counts URLs, detects nested indexes
  • check_status.sh: Assessments HTTP response codes with correct user-agent strings
  • extract_links.sh: Pulls inside and exterior hyperlinks from web page HTML

The agent decides which instruments to make use of and what parameters to set. The crawler chooses its personal crawl pace primarily based on what it encounters.  It reads robots.txt and adjusts. It has judgment inside guardrails.

Consider it this fashion: You give a brand new rent a CRM, not directions on methods to construct a database. The instruments are the CRM. The directions are the method for utilizing them.

Progressive disclosure: Don’t dump all the things directly

Right here’s a mistake I made early: I put all the things in AGENTS.md. Each rule. Each edge case. Each gotcha. 1000’s of phrases.

The agent acquired confused. It had an excessive amount of context and it began prioritizing obscure edge instances over frequent duties. It might spend time checking for hash routing points on a WordPress weblog.

The repair: progressive disclosure.

Core guidelines that have an effect on the 80% case go in AGENTS.md. That is what the agent must know for each single run.

Edge instances go in references/gotchas.md. The agent reads this file when it encounters one thing ambiguous. Not earlier than each job. Solely when it wants it.

Standards for severity scoring go in references/standards.md. The agent checks this when it finds a problem and must determine how unhealthy it’s. Not upfront.

This is similar approach a talented worker operates. They know the core course of by coronary heart. They examine the handbook when one thing bizarre comes up. They don’t re-read your entire handbook earlier than answering each e-mail.

In case your agent output is inconsistent however your directions are detailed, the issue is normally an excessive amount of context. Brokers, like new hires, carry out higher with clear priorities and a reference shelf than with a 50-page handbook they need to digest earlier than each job.

The ten gotchas: Failure modes that can burn you

Each one in every of these classes value me hours. They’re now encoded in our brokers’ references/gotchas.md recordsdata to allow them to’t occur once more.

Brokers hallucinate information they will’t confirm 

I requested the analysis agent to search out legislation corporations and rely their attorneys. It made each quantity up. It had by no means visited any of their web sites.

Solely ask brokers to supply information they will truly fetch and confirm. Separate what they know (coaching information) from what they will show (fetched information).

Information doesn’t switch between brokers

This repair I discovered on day one (use a browser user-agent string to keep away from CDN blocks) needed to be re-taught to each new agent. Day 34, a model new agent hit the very same downside.

Brokers don’t share reminiscences. Encode shared classes in a standard gotchas file that a number of brokers can reference.

Output format drifts between runs

The identical immediate may end up in totally different area names: “be aware” vs. “evaluation.” “lead_score” vs. “qualification_rating.” In case you run it twice, get two totally different schemas.

The repair: Create strict output templates with precise area names. Not “write a report.” “Use this precise template with these precise fields.”

Brokers confidently report points that don’t exist

The primary three audits delivered false positives with complete confidence.

The repair wasn’t a greater immediate. It was a greater boss. A devoted reviewer agent whose solely job is to confirm everybody else’s work. The identical purpose code evaluate exists for human builders.

Naked HTTP requests get blocked in all places

Each trendy CDN blocks requests with out a browser user-agent string. The crawler discovered this on audit quantity two when a whole web site returned 403s.

All it required was a one-line repair, and now it’s within the gotchas file. Each new agent reads it on day one.

Don’t guess URL paths

Brokers like to assemble URLs they assume ought to exist: /about-us, /weblog, /contact. Half the time, these URLs 404.

My rule is: Fetch the homepage first, learn the navigation, observe actual hyperlinks. By no means guess.

‘Performed’ vs. ‘in evaluate’ issues 

Brokers marked duties as “performed” when posting their findings. Unsuitable. “Performed” means permitted. “In evaluate” means ready for human verification.

This small distinction has a big impact on workflow readability when you have got 10 brokers posting work concurrently.

Classes have to be hyper-specific

“Fintech” is ineffective for prospecting as a result of it’s too broad. “PI legislation corporations in Houston” works. Each firm in a class ought to immediately compete with each different firm.

My first try at gross sales classes was “Private finance & fintech.” A crypto trade doesn’t compete with a budgeting app. Lesson discovered in 20 minutes.

By no means ask an LLM to compile information

Until you need fabricated outcomes. I requested an agent to summarize findings from 5 separate studies into one doc. It invented findings that weren’t in any of the supply studies.

All the time construct information compilations programmatically. Script it. By no means immediate it.

Brokers will strive belongings you by no means deliberate

The analysis agent tried to name an API we by no means arrange. It assumed we had entry as a result of it knew the API existed.

The repair: Be express about what instruments can be found. If a script doesn’t exist within the scripts folder, the agent can’t use it. Boundaries forestall inventive failures.

Construct the reviewer first

That is counterintuitive. Whenever you’re enthusiastic about constructing, you wish to construct the employees. The crawler. The analyzers. The enjoyable elements.

Construct the reviewer first. And not using a evaluate layer, you haven’t any method to measure high quality. You ship the primary audit and it appears to be like nice. However 40% of the findings are incorrect. You don’t know that till a shopper or a colleague spots it.

Our evaluate agent reads each discovering from each specialist agent. It checks:

  • Does the proof assist the declare?
  • Is the severity acceptable for the precise affect?
  • Are there duplicates throughout totally different specialists?
  • Did the agent examine what it says it checked?

That single agent was the most important high quality enchancment I made. Larger than any immediate tweak. Larger than any new device.

The human approval price throughout 270 inside linking suggestions: 99.6%. That quantity exists as a result of a reviewer verifies each single one.

I’ve seen the identical sample with human website positioning groups for 20 years. The groups that produce nice work aren’t those with one of the best analysts. They’re those with one of the best evaluate course of. The evaluation is desk stakes. The evaluate is the product.

BUILD ORDER (WHAT I LEARNED THE HARD WAY)

  What I did first:     Construct employees → Ship output → Uncover high quality issues → Construct reviewer
  What I ought to have performed: Construct reviewer → Construct employees → Ship reviewed output → Iterate each

  The reviewer defines high quality. Construct it first. Every part else will get measured towards it.

Tip: In case you’re constructing a number of brokers, the reviewer must be the primary agent you construct. Outline what “good output” appears to be like like earlier than you construct the factor that produces output. In any other case, you’re transport hallucinations with formatting. I discovered this throughout three audits that had been embarrassing in hindsight.

The validation commonplace (Our unfair benefit)

The reviewer catches technical errors. However there’s the next bar than “technically appropriate.”

We now have an actual website positioning company with actual purchasers and a group with 50 years of mixed expertise. Each agent discovering will get validated towards one query: “Would we stake our status on this?”

Would we truly ship this to a shopper, put our identify on the report, and inform the developer to construct it?

Beneath are 4 assessments we use for each discovering:

  • The Google engineer take a look at: If this shopper’s cousin works at Google, would they learn this discovering and nod? Would they are saying, “Sure, this can be a actual subject, this is smart”? If the reply isn’t any, it doesn’t ship.
  • The developer take a look at: Can a developer reproduce this with out asking a single follow-up query? “Repair your canonicals” fails. “Change CANONICAL_BASE_URL from http to https in your manufacturing .env” passes.
  • The company status take a look at: Would we defend this discovering in a shopper assembly? If I’d be embarrassed explaining it to a technical CMO, it will get reduce.
  • The implementation take a look at: Is that this particular sufficient to really repair? Not “enhance your web page pace” however “your hero video is 3.4MB, which is 72% of complete web page weight. Serve a compressed model to cellular. Right here’s the file.”

That is our unfair benefit. We’re not constructing brokers in a vacuum. Most individuals constructing AI website positioning instruments have by no means run an actual audit. They don’t know what “good” appears to be like like. We do. We’ve been delivering it for 20 years with actual purchasers. That’s why our approval price is 99.6%.

Sandbox testing: Prepare on planted bugs

You don’t prepare an agent on actual shopper websites. You construct a take a look at setting the place you KNOW the solutions. We constructed two sandbox web sites with website positioning points we planted on objective:

  • A WordPress-style web site with 27+ planted points: lacking canonicals, redirect chains, orphan pages, duplicate content material, damaged schema markup.
  • A Node.js web site simulating React/Subsequent.js/Angular patterns with ~90 planted points: empty SPA shells, hash routing, stale cached pages, hydration mismatches, cloaking.

The coaching loop:

  • Run agent towards sandbox.
  • Examine agent’s findings to identified points.
  • Agent missed one thing? Repair the directions.
  • Agent reported a false optimistic? Add it to gotchas.md.
  • Re-run. Examine once more.
  • Solely when it passes the sandbox constantly does it contact actual information.

Consider it like a driving take a look at course. Each accident on actual roads turns into a brand new impediment on the course. New drivers face each identified problem earlier than they hit the freeway.

The sandbox is a dwelling take a look at suite. Each verified subject from an actual audit will get baked again in. It solely will get more durable. The brokers solely get higher.

Image 19Image 19

Consistency: The unsexy secret

No person writes about this as a result of it’s boring. However consistency is what separates a demo from a product.

Three issues that make output constant:

  • Templates: Each agent has an output template in templates/output.md: Actual fields, construction, and severity scale. If the output appears to be like totally different each run, you don’t want a greater immediate. You want a template file.
  • Run logs: After each execution, the agent appends a abstract to reminiscence/runs.log. Timestamp, web site, pages crawled, points discovered, length. The subsequent run reads this log. It is aware of what occurred final time. It could possibly evaluate and supply outputs like, “Discovered 14 points final run. Discovered 16 this run. 2 new points recognized.”
  • Schema enforcement: Area names are locked: “severity” not “precedence,” “url” not “page_url,” “description” not “abstract.” Whenever you let area names drift, downstream tooling breaks. Templates remedy this completely.

In case your agent output appears to be like totally different each run, you want a template file, not a greater immediate. I can’t stress this sufficient. The one quickest approach to enhance high quality for any agent is a strict output template.

The stack that makes it work

A fast be aware on infrastructure, as a result of the instruments matter.

Our brokers run on OpenClaw. It’s the runtime that handles wake-ups, periods, reminiscence, and power routing. Consider it because the working system the brokers run on. When an agent finishes one job and desires to select up the following, OpenClaw handles that transition. When an agent wants to recollect what it did final session, OpenClaw gives that reminiscence.

Paperclip is the corporate OS. Org charts, objectives, subject monitoring, job assignments. It’s the place brokers coordinate. When the crawler finishes mapping a web site and desires at hand off to the specialist brokers, Paperclip manages that handoff by way of its subject system. Brokers create duties for one another. Auto-wake on task.

Claude Code is the builder. Each script, each agent instruction file, each device was constructed with Claude Code working Opus 4.6. I’m a vibe coder with 20 years of website positioning experience and nil conventional programming coaching. Claude Code turns area data into working software program.

The mixture: OpenClaw runs the brokers. Paperclip coordinates them. Claude Code builds all the things.

The consequence

This course of resulted in 14+ audits accomplished with 12 to twenty developer-ready tickets per audit, together with precise URLs and repair directions. All produced in hours, not weeks.

We now have a 99.6% approval price on inside linking suggestions on 270 hyperlinks throughout two websites, verified by a devoted evaluate course of. 

We accomplished greater than 80 website positioning checks mapped throughout seven specialist brokers. Every examine has anticipated outcomes, proof necessities, and false optimistic guidelines. Each discovering is particular (i.e., “the primary app JavaScript bundle is 78% unused. Listed here are the precise recordsdata to repair”).

That stage of specificity comes from the ability structure. The folder construction. The instruments. The references. The templates. The evaluate layer. Not the immediate.

If you wish to construct website positioning agent expertise that really work, cease writing prompts and begin constructing workspaces. Give your brokers instruments, not directions. Take a look at on sandboxes, not purchasers.

Construct the reviewer first. Implement templates. Log all the things. The primary model will fail. The fifth model will shock you.

That is the way you flip agent output into one thing repeatable. The identical system produces the identical high quality — whether or not it’s the primary audit or the 14th — as a result of each step is structured, verified, and encoded.

Not as a result of the AI is smarter. As a result of the structure is.

Contributing authors are invited to create content material for Search Engine Land and are chosen for his or her experience and contribution to the search neighborhood. Our contributors work beneath the oversight of the editorial staff and contributions are checked for high quality and relevance to our readers. Search Engine Land is owned by Semrush. Contributor was not requested to make any direct or oblique mentions of Semrush. The opinions they specific are their very own.


#construct #website positioning #agent #expertise #work

Leave a Reply

Your email address will not be published. Required fields are marked *