Google is working toward a future where it understands what you want before you ever type a search.
Now Google is pushing that thinking onto the device itself, using small AI models that perform nearly as well as much larger ones.
What’s happening. In a research paper presented at EMNLP 2025, Google researchers show that a simple shift makes this possible: break “intent understanding” into smaller steps. When they do, small multimodal LLMs (MLLMs) become powerful enough to match systems like Gemini 1.5 Pro — while running faster, costing less, and keeping data on the device.
The future is intent extraction. Large AI models can already infer intent from user behavior, but they usually run in the cloud. That creates three problems. They’re slower. They’re more expensive. And they raise privacy concerns, because user actions can be sensitive.
Google’s solution is to split the task into two simple steps that small, on-device models can handle well.
- Step one: Each screen interaction is summarized separately. The system records what was on the screen, what the user did, and a tentative guess about why they did it.
- Step two: Another small model reviews only the factual parts of those summaries. It ignores the guesses and produces one short statement that explains the user’s overall goal for the session.
- By keeping each step focused, the system avoids a common failure mode of small models: breaking down when asked to reason over long, messy histories all at once.
How the researchers measure success. Instead of asking whether an intent summary “looks similar” to the right answer, they use a method called Bi-Fact. Using its main quality metric, an F1 score, small models with the step-by-step approach consistently outperform other small-model methods:
- Gemini 1.5 Flash, an 8B model, matches the performance of Gemini 1.5 Pro on mobile behavior data.
- Hallucinations drop because speculative guesses are stripped out before the final intent is written.
- Even with extra steps, the system runs faster and cheaper than cloud-based large models.
How it works. Intent is broken into small pieces of information, or facts. Then they measure which facts are missing and which ones were invented. This:
- Shows how intent understanding fails, not just that it fails.
- Reveals where systems tend to hallucinate meaning versus where they drop important details.
The paper also shows that messy training data hurts large, end-to-end models more than it hurts this step-by-step approach. When labels are noisy — which is common with real user behavior — the decomposed system holds up better.
Why we care. If Google wants agents that suggest actions or answers before people search, it needs to understand intent from user behavior (how people move through apps, browsers, and screens). This research moves this idea closer to reality. Keywords will still matter, but the query will be just one signal. In this future, you’ll have to optimize for clear, logical user journeys — not just the words typed at the end.
The Google Research blog post. Small models, big results: Achieving superior intent extraction through decomposition
Search Engine Land is owned by Semrush. We remain committed to providing high-quality coverage of marketing topics. Unless otherwise noted, this page’s content was written by either an employee or a paid contractor of Semrush Inc.
#Google #research #points #postquery #future #search #intent

