The Whole Point Was The Mess

Semrush put out an infographic final week. The sort constructed to be screenshotted into LinkedIn carousels and pasted into webinar decks. 4 pillars. The fourth one known as “Technical GEO”: schema, structured data, clear structure. The road that justifies it: “Ensures AI engines can parse and join your content material.”

Ensures.

See it live on X/Twitter. Picture Credit score: Pedro Dias

That’s the complete piece in a single phrase. The structure of enormous language fashions is, by design, the alternative of ensured. And schema has nothing to do with whether or not an LLM can parse text. LLMs parse textual content by studying textual content.

Semrush is way from alone. Each SaaS vendor with pores and skin on this recreation is operating variations of the identical play. Web optimization-era controllability, repackaged under a new acronym. The identical percentages, pillars, and pyramids. All dressed for a system that was constructed particularly to not work this fashion.

I’ve made the strategic model of this case earlier than, in “Your AI Strategy Isn’t a Strategy.” This piece is the technical ground beneath it.

Constructed To Learn No matter’s There

Language fashions exist as a result of the net is a multitude. Boards, Wikipedia stubs, weblog posts written at 2 A.M., scraped product copy, machine-translated junk, code feedback, half-formed sentences, typos, contradictions, each register from journal article to subreddit shitpost. Pre-training knowledge is the general public net, and the general public net has by no means been structured.

The transformer architecture handles this by treating language as sequences of tokens. There isn’t any parser contained in the mannequin in search of tags. There isn’t any desire for FAQ markup. The mannequin reads the phrases. That’s the mechanism.

At inference time, the mannequin generates extra tokens conditioned on the enter. None of that pipeline is studying microdata.

Schema.org has actual jobs. It feeds wealthy ends in classical search. It helps entity disambiguation within the information graph. It helps voice assistants pull structured fields. These are well-defined features inside particular programs. They aren’t the mechanism by which an LLM understands a sentence.

So when a vendor claims structured knowledge “ensures AI engines can parse and join your content material,” there’s nothing to make sure. The parsing layer they’re imagining just isn’t there. The mannequin already parsed your sentence. It did so by studying the sentence.

One Trick, Three Model Colours

Take a look at the largest GEO and AEO explainers available in the market proper now, and you discover the identical Web optimization-era playbook with the acronym swapped.

Semrush is already coated. The fourth pillar of its “Technical GEO” presents schema and structured knowledge as guaranteeing one thing that the structure can’t guarantee.

AirOps published a graphic titled “15 Methods to Get Cited by ChatGPT, Perplexity, & Google.” It’s the most numbers-heavy specimen of the style I’ve seen this 12 months. Schema markup will increase quotation chance by 13%. Sequential H2 to H4 tags double your probabilities. Brief paragraphs make content material 49% extra more likely to seem in AI solutions. Perplexity cites UGC in 91% of solutions, versus Gemini’s 7. Learn the supply notes and the methodology path comes dwelling. The numbers within the graphic hint again to AirOps’s personal “2026 State of AI Search Report.” AirOps is citing AirOps on the query of whether or not AirOps’s prescriptions work.

Peec AI does a extra sincere job in locations. Its full guide to GEO acknowledges the probabilistic nature of the system and concedes that basis fashions are already educated, so optimization focuses on the retrieval layer. Then it lands the identical prescriptions: heading hierarchy, bullet lists, FAQ markup, a number of schema sorts layered on every web page, summaries on the prime of sections – all constructed on the chunking declare that lengthy paragraphs lose out as a result of the engine extracts fragments relatively than full articles.

Profound, citing Aleyda Solis’s guidelines, is probably the most express in its piece: “Optimize for Chunk-Stage Retrieval.” Every part, a standalone snippet. Every web page, a buffet from which the engine takes what it desires. The engine, on this telling, is a well mannered visitor who solely takes what’s been laid out.

Three distributors. Similar working assumption: a controllable, prescriptive technical self-discipline sits between a writer and a quotation, and it occupies roughly the identical form as classical Web optimization. Schema, headings, construction, freshness, machine-readable codecs. Acquainted. Billable. Reportable as much as a chief advertising and marketing officer.

What Schema Really Does

Schema just isn’t the goal right here. Schema has real, well-defined uses. Classical Google search makes use of it for wealthy outcomes: costs, scores, occasion occasions, the structured fields that drive search engine outcomes web page options. The information graph makes use of it for entity disambiguation. Voice assistants pull structured fields out of it.

None of that goes away. When you’re chargeable for technical Web optimization, maintain implementing schema the place it earns its maintain.

Schema can’t attain right into a transformer and enhance its comprehension of your prose. The mannequin isn’t architected to learn schema as schema. It receives no matter textual content the engine fetched and selected to incorporate, and processes that textual content as language tokens. The whole GEO/AEO advertising and marketing layer rests on conflating two distinct claims: that schema is helpful in classical search, and that schema feeds the LLM. The primary is true. The second is a class error.

Chunking Is Not Yours To Optimize

The chunking recommendation retains reappearing as a result of it sounds technical, sits neatly inside a flowchart, and offers a content material group one thing concrete to do on Monday morning. It is usually incoherent.

Chunking occurs at retrieval time. Perplexity, ChatGPT, and Gemini every run a retriever over candidate paperwork, cut up them in line with their very own configurations (size, overlap, embedding mannequin, typically semantic boundaries), and feed the top-k chunks into the mannequin’s context. These configurations belong to the engine. They get tuned otherwise throughout programs and retuned on schedules no writer is aware of. The writer’s view of the chunker is the writer’s view of the mannequin: black field, outcomes solely.

So when a vendor says “optimize for chunk-level retrieval,” what is definitely being beneficial is sweet writing. Brief, self-contained paragraphs. Clear definitions close to the highest of sections. Inside logical construction. These are recognizable disciplines: info structure, technical writing, readability. They’ve been recognizable disciplines since lengthy earlier than the transformer was invented. They aren’t a brand new technical layer.

A extra sincere model of the pitch could be: Rent somebody competent at writing for the net. That sentence doesn’t match on a pricing web page.

The Paper They Don’t Learn

There’s an actual academic paper referred to as “GEO.” Aggarwal and co-authors, KDD 2024. It’s the closest factor to a citable supply the SaaS layer has when it sells generative engine optimization as a self-discipline. It is usually, as papers go, simple to skim. 9 “optimization strategies” are examined on a ten,000-query benchmark, with outcomes.

What did the paper discover labored?

Adding citations from credible sources. Including quotations from related sources. Including statistics. Enhancing fluency. Making prose simpler to know. The strategies that produced the biggest visibility lifts had been basically: write content material with extra proof in cleaner prose.

What did the paper check and discover didn’t work?

Key phrase stuffing, the closest analogue within the paper to the Web optimization-era playbook the present GEO and AEO distributors have repackaged. End result: beneath baseline. The paper’s authors notice in plain phrases that strategies efficient in engines like google “could not translate to success on this new paradigm.”

Discover what just isn’t within the listing of 9 strategies. Schema. Structured knowledge. FAQ markup. Heading hierarchy. Machine-readable codecs. None of those are examined within the paper, as a result of none of them are the optimization floor the paper research. The paper is learning content-level interventions: what you place within the phrases, not metadata layered across the phrases.

The SaaS layer borrowed the acronym. The findings stayed within the paper. “Technical GEO” is the Web optimization playbook with totally different stickers on the identical containers, offered in opposition to analysis that factors the opposite method.

The Assumption Smuggled In

The SaaS pitch solely is sensible in case you smuggle in a single assumption: that the system you’re optimizing for has the identical form because the one which’s been billing Web optimization shoppers for a quarter-century. Inputs you management. Outputs that reply. A retrievable causal chain between the 2.

That mannequin was all the time a simplification of how search labored. It was shut sufficient to maintain the trade operating, and shut sufficient to maintain the invoices going out.

None of that simplification survives contact with generative programs. The identical immediate produces totally different solutions throughout periods, customers, temperatures, mannequin variations, and days. Noticed conduct throughout the foremost engines, not a clear property of any single one. The retrieval layer in entrance of the mannequin additionally strikes: candidate sources shift, rating shifts, freshness home windows shift. No causal chain runs between “I added FAQ schema” and “the mannequin cited my web page.” What runs between them is a likelihood distribution, and the stuff you management have an effect on that distribution in methods no person can cleanly attribute. Not even the individuals who created these programs.

That is the established line on AI visibility instruments, repeated right here as a result of it applies to the entire prescriptive layer. Statistically unverifiable knowledge drawn from non-deterministic programs. A 13% quotation carry, measured how, in opposition to what counterfactual, with what reproducibility? The methodological questions aren’t what these numbers are designed to reply. The numbers are the reply. They land in a graphic, get rendered as ROI in a board deck, and the dialog strikes on.

One thing To Say In The Assembly

Right here is the half that the structure argument and the methodology argument don’t, on their very own, clarify. Why does the whole SaaS layer maintain efficiently promoting these items to people who find themselves not silly?

The sincere model of the reply goes one thing like: We’re working with decreased visibility right into a system that doesn’t expose its mechanics, that returns totally different outputs to totally different folks for a similar question, that’s altering month by month, and that has folded a considerable chunk of the funnel right into a black field. We will maintain doing the work that has all the time been the work: writing effectively, being helpful, constructing authority, sustaining the location. We will monitor what exhibits up the place. The deterministic dashboard we used to have just isn’t coming again.

That sentence is unsayable in a advertising and marketing assembly. It admits the lever just isn’t linked. It tells management that the price range line they accepted doesn’t have a corresponding motion. It offers the group nothing to place in subsequent quarter’s plan.

So the SaaS layer fills the hole. It manufactures levers. Pillars, frameworks, share lifts, schema audits, chunking optimization, machine-readable codecs. Reportable exercise. Defensible expenditure. One thing to say within the assembly. None of this will get you visibility. The engine decides that. What’s on provide is the looks of management, offered to individuals who would relatively pay than concede that management left the room.

As soon as the lever is purchased, it needs to be operated. Schema audits get scheduled. Chunking checklists get reviewed. Quotation likelihoods get tracked, refreshed, and in contrast. The dashboard the group paid for turns into the dashboard the group optimizes in opposition to, and the dashboard quietly replaces the precise downside with the a part of the issue it may possibly see. By the point anybody notices, the SaaS layer is writing the temporary.

None of it is a ethical failure on the client’s facet. What you’re watching is what occurs when an trade has been organized for a quarter-century across the premise that you may pull a lever and watch the meter transfer, and the meter quietly disconnects from the lever. The distributors aren’t operating a con. They’re filling demand for the one factor the client can not afford to do with out: a solution that matches in a slide.

Rank And Tank, All Over Once more

I maintain coming again to a phrase that matches this complete second: dancing to the rank-and-tank tunes (I borrowed it from David McSweeney). The cycle goes: Vendor sells the controllable-discipline body, businesses undertake it, content material groups scale manufacturing across the prescriptions, AI-generated articles get pumped out at quantity as a result of the prescriptions are simple to template. A few of it ranks for some time. Most of it will definitely tanks as a result of the prescriptions had been by no means the mechanism, and the engine adjusts, or the freshness window closes, or the system merely strikes on.

The Web optimization trade has carried out this earlier than. Spinning. Mass programmatic pages. Doorway content material. Every cycle adopted the identical form: a controllable enter dressed as a self-discipline, offered at scale, briefly efficient, finally punished by the engine, changed by the subsequent controllable enter dressed as a self-discipline.

GEO and AEO are the present cycle. The pillars and percentages and pyramids are this cycle’s templates. Beneath them, the methods bifurcate.

One path is model presence exploitation. Plant your title the place the engines look. Reddit threads, top-X listicles, the identical quotation surfaces time and again. The cycle feeds itself: engines cite the surfaces, manufacturers work the surfaces, surfaces feed the engines. I’ve written about this loop earlier than; I referred to as it the Ouroboros pattern. The brief model is that the loop is much less secure than the technique assumes.

The opposite path is content at scale. Produce variations, pump out quantity, deal with the templated output as content material that might earn a quotation. I’ve written about this method earlier than, within the “Scaling Disappointment” piece. The brief model is that uniqueness just isn’t worth, and on the tempo these prescriptions allow, qualitative evaluation stops being doable. The amount of AI-generated copy produced below this path is that this cycle’s externality.

The next cycle will sell the cleanup.

Neglect for a second whether or not your “Technical GEO” is ready up accurately. Ask whether or not the factor you’re placing on the web page is worth reading. Massive language fashions had been designed to learn no matter is there. If what’s there’s good, will probably be learn. If what’s there’s templated, low-utility content material optimized in opposition to a chunking heuristic that doesn’t exist, it will eventually be filtered out: by the engine, by the consumer, or by the subsequent educational paper displaying that retrieval high quality is degraded by precisely this sort of slop.

The benefit, when it accrues, will accrue to the people who do not get distracted. Who don’t subscribe to the dashboard. Who maintain engaged on product-driven Web optimization and the foundations which have all the time linked content material to folks. There are early indicators of this on the timelines I learn. Practitioners overtly questioning whether or not optimizing in opposition to a non-deterministic floor is sensible in any respect, and asking whether or not their consideration belongs again on classical search; which, on the finish of the chain, is what feeds these programs anyway.

The mess was all the time the purpose. The structure handles it. The trade simply must cease pretending the mess is the issue.

Extra Assets:

This put up was initially printed on The Inference.

Featured Picture: Roman Samborskyi/Shutterstock

#Level #Mess