Back to blog

Use cases ship. Operating models work.

May 4, 2026 · William VanBuskirk

Most AI conference decks have a slide called "use cases." Meeting summarizer. Support copilot. Sales-research bot. Contract-review assistant. Outbound personalization. The slide is a list of small, well-shaped tasks an agent can do, each one a tractable demo, each one easy to buy.

The slide is also the wrong category.

A use case is a function call. Summarize this meeting. Draft a follow-up. Check this contract against the standard. It's a discrete unit of work an agent performs in response to a request, request-response shape with a clean input and a clean output. Most agent products on the market are stacks of these.

The shape that actually changes how a business operates is bigger and stranger. It looks less like a function call and more like a person on the org chart, with a beat it owns, peers it talks to, things it escalates and things it doesn't, surfaces it produces, work it picks up without being asked. We call this an operating model. The unit isn't the task; the unit is the role.

The reason most agent rollouts feel disappointing six months in (even when the demos worked) is that they shipped use cases when they needed to ship operating models. The agent performed the task, but it didn't close the loop.

The use-case lens makes agents into features

When you ship an agent as a use case, you make a series of implicit assumptions:

  • The agent waits to be asked.
  • The output is the deliverable.
  • The conversation is the unit.
  • The user provides the context.
  • Success looks like the user being satisfied with the response.

Each of these is reasonable on its own. Stack them and you've built a brain in a jar. The agent is competent at the moment of the request and invisible the rest of the time. It doesn't know what's happening in the business between requests. It doesn't notice the things that should have triggered it. It doesn't pick up the ball when it falls.

We have a name for this internally: orphaned agents. An orphaned agent can analyze a sales call beautifully and have no idea when a sales call happens, no instinct to update the CRM after, no awareness that the VP should hear about the competitor mention, no read on whether the customer success team needs the insight before the next QBR. The use case succeeded; the broader job still broke.

This isn't an integration problem you can solve by adding webhooks. It's a category problem. The agent was designed as a feature. Features wait to be invoked; jobs don't.

The operating model lens puts agents on the org chart

The reframe: stop designing agents as features, start designing them as roles.

A role on an org chart has a known shape. It has a beat, a scope of concern that's its responsibility. It has peers, other roles it coordinates with as a normal part of doing the job. It has a manager who calibrates it, signs off on its work, takes the call when the role hits something it doesn't have authority to decide. It has a way it picks up work, partly assigned, partly self-directed. It has artifacts it owns, things that exist because this role exists.

Agents need the same shape, in architectural terms. An operating model for an agent contains:

  1. Dimensions of concern — the scopes the agent is responsible for. Define these by the concerns the agent owns, not by the tools it can call. A maintenance triage agent isn't the agent that calls the work-order API; it's the agent that worries about machine health on Lines 3 through 7. The verbs are different. The first is reactive; the second is proactive.

  2. The mirror pair — the human expert this agent reflects and reports to. Every role on an org chart has a manager. Every operating-model agent has the same. The mirror pair is the calibration loop: the agent acts, the expert reviews, the patterns get refined, the agent's judgment improves. Without this, the agent's judgment freezes at the moment of deployment.

  3. Proactive triggers — events in the world that cause the agent to act without being asked. A new transcript landing in the discovery folder. A margin coming in under plan. A schedule slipping past the safety buffer. Triggers turn the dimension from a passive scope into an active one. This is what separates an operating model from a chat box.

  4. Escalation paths — the named conditions under which the agent stops and hands work to a human. Not "if confused, say I don't know," but a specific named escalation to a specific named human, with the work staged for them in a specific named place. The escalation is part of the design, not a fallback the model improvises.

  5. Peer communication — the other roles this agent coordinates with as a normal part of its work. When the maintenance triage agent decides Line 4 is going down for three hours, it doesn't only log the ticket. It tells the logistics agent so the 4 PM pickup gets re-routed. The peer connection is in the design.

  6. Artifact ownership — the durable outputs this role authors. Not chat replies. Things that live in the system of record. A user-story bundle. A variance canvas. A retrospective draft. A risk register entry. The role is real because the role produces.

A use case has none of these. A use case has an input and an output, and the structure around those bullets is what's missing.

A worked example: the user-story role

One of our customers, Trillium, runs Kinaxis Maestro implementations, a planning and orchestration platform their customers roll out over twelve to twenty-four months. Every project starts with weeks of discovery meetings producing user stories, the modern requirements format these implementations standardize on. The team that maintains the user-story library spends a lot of their week reading transcripts, deciding which conversations contained requirements, drafting candidate stories, deduplicating against the existing library, and emailing the curator a clean proposal.

The use-case version of "agent for this": summarize the meeting and extract action items. You can buy that today from any number of vendors. It will do the use case. It will not do the job.

The operating-model version is a different shape. We named it: the discovery triage role. The role's beat is the user-story pipeline. Its responsibilities, written as the role description we'd give a new hire:

  • Watch every meeting that lands in the discovery folder. For each one, decide whether the conversation produced requirements. If yes, extract candidate quotes. If no, drop it.
  • For each quote that survives, draft a candidate user story in the project's standard format, with provenance attached: which meeting, which speaker, which prior decision the story builds on.
  • Check the candidate against the existing library. If a duplicate exists, suggest a merge. If a conflict exists with a prior decision, flag it.
  • Bundle the surviving candidates into a curator email, ranked by confidence, with the conflicts at the top.
  • Wait for the curator to accept, reject, or ask a question. Apply the answers back into the library.

Every bullet is a verb the agent owns. The whole thing reads as a role description, not a feature spec.

The implementation is the workflow we use for multi-stage agent flows: a meeting indexer dimension, a discovery classifier dimension that gates further work, a user-story ingestor dimension that drafts the candidates against the project's design decisions, all dispatched as separate work orders so each step gets a clean context window. The peer connections are the project graph itself: the ingestor knows which prior design decisions a candidate touches because the graph names them. The escalation path is the curator email; the agent stops there, by design, because the call on whether a story enters the library is the curator's.

The use-case version of this would have been a chat box that the team copies meeting transcripts into. It would have summarized fine. It would not have noticed the meeting it should have triaged. It would not have known about the conflict with last month's design decision. It would not have ranked candidates by which ones the curator was about to ask about anyway.

The operating-model version watches, classifies, drafts, checks, bundles, waits, and learns. It picks up the ball when it falls. It hands off when it should. It knows what it owns and what it doesn't. The whole loop is the job. The use case was one slice of it.

The Andon pattern: escalation as a first-class design move

One specific piece worth pulling out, because most agent products fumble it.

In a Toyota plant, an Andon is a cord any operator can pull to halt the line. The whole line stops. A team converges. The problem gets named, fixed, and the line restarts. It's a powerful idea because it makes the boundary between I can handle this and I need help a structural part of the work, not a vibe an operator has to summon courage about.

Operating-model agents need the same thing. We call it an Andon when an agent hits the boundary of its judgment, not "I'm uncertain" (the model's confidence score is mostly noise) but "this is the kind of decision I am specifically not authorized to make." Examples: a pricing exception above the negotiated band, a user-story conflict that needs a product decision, a maintenance fault that affects a customer's contracted ship date. The Andon doesn't apologize. It hands work to a named human, in a named place, with the context staged.

The shape: the agent stages the work, names the decision needed, lists the relevant context (here's the prior decision, here's the conflict, here's what the customer was promised), and stops. The human acts. The decision flows back into the graph and updates the agent's context. Next time, the agent's judgment is calibrated by the answer.

This is operating-model machinery. A use-case agent doesn't have an Andon; what it has is I'm sorry, I can't help with that, the worst possible escalation pattern, because the work doesn't move forward and no human knows.

Why use-case rollouts fail the operating-model test

You can usually predict whether an agent rollout is going to disappoint by asking five questions about its design:

  1. What does it own? (If the answer is "it answers questions," it's a use case.)
  2. What triggers it without a human asking? (If the answer is "nothing," it's a use case.)
  3. Who does it talk to besides the user? (If the answer is "no one," it's a use case.)
  4. Where does its work live when it's done? (If the answer is "in the chat history," it's a use case.)
  5. What does it escalate, and to whom? (If the answer is "it apologizes," it's a use case.)

Five no's. Five times you've designed a feature when the business needed a peer. Six months later the rollout is "good but not transformative" and someone calls it AI fatigue. The fatigue is real, but the diagnosis is wrong: the agent did the use case it was designed for. Nobody asked it to do the job.

A drafting note

The "brain in a jar" framing came out of an MCP session against myai while writing this. The platform's own answer to what's the difference between a use case and an operating model:

"A use case is a function call. An operating model is a position on the org chart. When we build, we aren't building 'features'; we are building mirror pairs."

And on why most rollouts fall flat:

"Most rollouts fail because they create orphaned agents. You have an agent that can analyze a sales call but isn't integrated into the flow — it doesn't know to update the CRM, notify the VP about a competitor mention, draft a follow-up email. It's a brain in a jar."

Brain in a jar was the phrase I needed. The use-case lens produces real intelligence with nowhere for that intelligence to put its hands.

The platform also surfaced a Mark line we'd been circling for weeks:

"We aren't trying to build the 20% of features people use; we're trying to build the 100% of judgment the business requires."

Use cases are features. Judgment is a role. The first one ships. The second one works.

What to do Monday morning

If you're rolling out an agent and it feels like a feature instead of a peer, the move is to design backwards from the role.

  1. Write the role description, not the system prompt. The kind of thing you'd hand a new hire on day one: you watch X, produce Y, escalate Z to person W, coordinate with role V on event U. If you can't write that, you don't have an operating model. You have a use case.
  2. Name the dimensions. Three to seven scopes the agent owns. Each one should be the kind of thing a human in that role would actually say that's mine about.
  3. Wire the triggers. What in the business should cause this agent to act without an ask? Triggers are the move from "agent that responds" to "agent that owns."
  4. Wire the peers. Other agents, other people, other systems the role coordinates with as a normal part of its work. Peer connections live in the design, not in a webhook bolted on later.
  5. Wire the Andon. Named escalations to named humans, with the work staged. The Andon is a feature, not a fallback.
  6. Wire the artifacts. What does this role produce that lives in the system of record? Chat replies don't count.

When you're done, you have a job description rather than a feature list. The agent reads as a role you'd recognize on the org chart, with peers it talks to and a manager it reports to. The use case is what that role does on Tuesday at 2 PM. The operating model is the role itself.

The industry will keep selling the use case. Buy the role.


This is what we're building at Make Yourself AI. Mirrored agents with beats, peers, and escalation paths, designed to take a position on your org chart instead of sitting in a feature list.