Sparse but Deep: why most RAG is noise

Last week we ran the same question through two retrieval shapes.

It was a docs review. Same source data: a week of commits across two repos, one backend (mirror-harness), one frontend (myai-ui-client). We asked two systems what mattered.

Codex got the commits. Page-level retrieval over git history. Here's a slice of what it returned:

v1.11.6: hotfix Redis getdel regression
v1.11.5: MCP OAuth onboarding (DCR + PKCE)
v1.11.4: related_to as agent catch-all edge
v1.11.3: invoke_tool redirect to send_message
v1.11.2: deployed-pod fix for invoke_tool
v1.11.1: deployed-pod fix for get_context
v1.11.0: external session API + MCP bridge
v1.10.17: REVERT email plus-alias send-from

Useful. Accurate. A list.

We also asked myai. We pointed it at the website's content/docs/ inventory and asked what was missing. myai pulled from the artifact graph: synthesized retrieval against work orders, design decisions, prior insights, and dimension-scoped lenses on the ontology. Here's a slice of what it returned:

P0 — User-facing capability gaps shipped this week, no docs

- MCP OAuth onboarding end-to-end docs page. Backend:
  /api/oauth/{register,authorize,login-callback,token} +
  WWW-Authenticate discovery on /api/mcp + DCR + PKCE
  (mirror-harness v1.11.5, hotfix v1.11.6). Frontend: the
  new /oauth-consent consent screen with MFA verify
  (OAuthConsentPage.tsx, 341 lines, commit 316f0a1 on
  myai-ui-client). Reader needs the full client-onboarding
  flow, not just the endpoints.

Same week. Same input. Very different output.

The Codex view is what RAG over a vector store gives you on a good day: a summary of the most-relevant chunks ranked by similarity. The myai view is what you get when retrieval has been pre-thinking for you: a punch list with provenance, with the backend release tied to the frontend commit, with the docs reader's actual question sitting at the center of the answer.

What the contrast shows

Codex did fine work. Its summary of the commits was accurate, the version numbers were right, the one-line summaries captured what the commit messages said. If you needed to know what got merged on mirror-harness last week, that list does the job.

But the output shape is bounded by the input shape. When you flat-feed an LLM a list of changes, it can summarize them. It can't tell you that the v1.11.5 OAuth flow only matters because of the v1.11.0 external session API two weeks earlier. It can't tell you the corresponding frontend commit landed on myai-ui-client the same day. It can't tell you that the website docs are missing the entire end-to-end story.

myai can do this, not because it's smarter, but because its retrieval surface is different. The artifact graph contains synthesized prior decisions linked to current changes, not just the raw artifacts. When we asked "what's missing from the docs," what came back wasn't a list of related chunks ranked by similarity. It was a structured punch list, drafted by an agent that had already navigated a graph of named relationships.

The retrieval is sparse: you don't get 30 chunks of "related" text. What you do get is deep: already linked to its provenance, already named, already explained.

The mechanism

Here's the one thing that makes the most difference.

We summarize on write, then search the summaries.

When an artifact lands in myai (a meeting transcript, a design doc, a customer email, a work order), we don't chunk-and-embed it and call it done. We synthesize it into a structured artifact: a name, a stated position, named relationships to prior artifacts. The synthesis happens at write time, not at search time.

When you query, you search the summaries. You get back synthesized artifacts, not raw text chunks. The agent has a story to tell, not a haystack to sift through.

That's the whole bet. It's more expensive at write time. We pay it because the alternative is what happens when you point a vector store at every doc your team has ever written and hope: the embedding ranks things by surface similarity, the retrieval grabs the top ten chunks, the model reads them, and the answer it gives is a polite reconstruction of whatever was statistically closest to the question. Not what was relevant. Not what the team decided. Just what looked like it.

Two design decisions follow from "synthesize on write":

Naming things makes them findable. A chunk is a chunk. An artifact has a title, a kind, a stated position, and known relatives. When the agent searches, it searches names and stances, not paragraphs.
Linking things makes them explainable. A vector match is opaque: you got this chunk because it was 0.87-similar. A graph hop is legible: you got this artifact because it's the design decision that constrained the change you're asking about. The agent can show its work.

That's the mechanism. The rest is implementation detail.

Why this matters more as agents scale

Single-shot prompts are forgiving. You ask once, you get an answer, you move on. If the answer is 70% relevant, you get the gist and re-prompt for the rest. The noise rounds away.

Agentic workflows compound noise instead of rounding it away. An agent making 30 tool calls in a session calls retrieval 30 times. Every call pulls "the most relevant 10 chunks." Every call adds another vector of marginal-relevance. By the end of a long session, a page-RAG-backed agent is drowning in marginally-related text. Its context window is full of stuff that looked relevant to its individual queries but doesn't add up to a coherent picture of the work it's doing.

A sparse-but-deep agent is still navigating a real graph of meaning at turn 30. The artifacts it pulled at turn 5 are still legible at turn 25, because they were named and linked, not just embedded. When it needs to explain why it took a particular action eight steps ago, the trail is intact.

The longer the agent runs, the bigger the gap.

The Kinaxis story, revisited

Last week's post had a customer story: a consulting firm rolling out Kinaxis Maestro, and myai built the user stories from the artifact graph. One detail in that story was that myai "knew which stakeholder to ping" when a story needed clarification.

That's sparse-but-deep working in production. myai didn't surface ten possible stakeholders ranked by similarity to the story's keywords. It surfaced one (the right one) with the named relationship that made the answer obvious. This person owns this design decision; you ping them.

You only get that if synthesis happens at write time and relationships are explicit. A vector store wouldn't have done it. A vector store would have ranked stakeholders by how often their name appeared near the story's text.

The next post in this series digs into something adjacent: why state matters more than context window length. Models keep getting bigger context windows. What they actually need is durable state. More on that next.

Try it

The endpoint pattern is https://api-[instance].makeyourself.ai/api/mcp. The Connect via MCP docs page walks through plugging it into Claude, ChatGPT, and Cursor. The platform page covers what's in the graph.

If you haven't read the first post in this series, start there: myai is the context. Your agent is the hands.. It's the doctrine; this post is the proof.

Sparse but Deep: why most RAG is noise

What the contrast shows

The mechanism

Why this matters more as agents scale

The Kinaxis story, revisited

Try it

More from our blog

Most AI stacks are built backwards. Start with the agent, not the data.

Use cases ship. Operating models work.