Architecture

The agent writes the SQL. The system around it does the hard work.

Composable schema documentation, deterministic retries, the right way to stream tool-use traces. A field guide to the supporting cast around every production agent, and why getting it right matters more than which model you choose.

Conformal Engineering · 28 Mar 2026 · 11 min read

The demo version of an analytics agent is simple. The user asks a question, the model writes SQL, the database returns rows, and a chart appears. The production version is not simple. The production version survives bad column names, ambiguous business terms, stale extracts, partial permissions, slow warehouses, malformed dates, retries, timeouts, and the CFO asking the same question three different ways in the same meeting.

The model is visible, so it gets the credit. The system around it does most of the work. Good agent architecture is the discipline of making the model's job small, observable, and recoverable. When that discipline is missing, even a frontier model produces expensive nonsense.

Schema documentation is an interface

Most companies treat schema documentation as a compliance artifact. For an agent, it is executable context. The agent does not need a data dictionary that says `cust_cd` means customer code. It needs a composable description of which table represents active customers, which date defines revenue recognition, why one sales table excludes returns, and when the finance team prefers management geography over statutory geography.

We write schema documentation like product copy for a very literal user. Each table gets purpose, grain, joins, traps, examples, and allowed questions. Each metric gets a business definition and a few shots that show the expected shape of the query. This is not glamorous work. It is the difference between a system that sounds confident and a system that can be trusted.

Retries should be boring

Agents fail in predictable ways. A generated query references a column that exists in a different environment. A join multiplies revenue by customer count. A date parser treats fiscal year as calendar year. A warehouse times out because the model forgot a filter. The wrong response is to ask the model to try again with no structure. That turns failures into improvisation.

Production agents need deterministic repair loops. Validate SQL before execution. Parse database errors into typed failure reasons. Give the model the smallest useful correction, not the entire transcript. Cap retries. Preserve failed attempts in the trace. If a query returns an impossible row count or a metric outside an expected range, force a second pass before showing the answer. The model should be creative in composition, not in error handling.

The trace is the product

Executives do not need to read every query, but they need to know the query exists. The system should stream what it is doing: reading schema, selecting tables, writing SQL, executing, validating, summarizing, rendering. Each step should have time, cost, inputs, and outputs. A user should be able to expand the trace after a surprising answer and see the path from question to result.

This changes the psychology of the product. The answer is no longer a black box presented with theatrical confidence. It is a working object with a trail. The user can disagree with the SQL, correct the business definition, or pin the output to a board pack with evidence attached. Trust comes from auditability, not from anthropomorphic polish.

Model choice matters less than the contract

There are real differences between model providers. Some reason better over long schemas. Some are cheaper for high-volume background work. Some fit a company's security posture more cleanly. But the largest quality gains usually come before model selection: narrowing the tool contract, shaping the schema context, adding evals, streaming traces, and defining what the agent must refuse to answer.

A good architecture lets the model be replaced. The prompts, schemas, evals, tools, and traces should survive a provider switch. That is what model neutrality means in practice. You are not neutral because you put four logos on a slide. You are neutral because the system's center of gravity is the business contract, not the vendor API.

Build the boring parts first

The safest order is backward from the boardroom. Start with the answer format and audit trail a senior reader would accept. Then define the eval cases that would make the answer defensible. Then shape the tools and schema context that can produce that answer. Only then ask the model to write SQL.

This makes the agent less magical and more useful. It also makes it easier to operate. When something goes wrong, the team can see whether the failure came from intent parsing, table selection, query generation, execution, validation, or summarization. Production AI is not won by making the model feel smarter. It is won by making the surrounding system impossible to fool casually.