Hiring

We do live engineering interviews. Here's the prompt.

No take-homes, no whiteboard puzzles. The 90-minute exercise we run with every engineering candidate, and what we're actually looking for inside it.

Conformal Engineering · 22 Jan 2026 · 5 min read

We do not use take-home projects. They select for free time, tolerance for ambiguity without feedback, and willingness to perform unpaid labor. We do not use whiteboard puzzles either. They select for a kind of rehearsed cleverness that has very little to do with building enterprise AI products under messy constraints.

Our engineering interview is live, practical, and intentionally small. The candidate gets 90 minutes, a tiny codebase, a failing agent workflow, and a senior engineer in the room. The goal is not to finish everything. The goal is to see how the person thinks when the problem is real enough to have texture.

The prompt

The exercise starts with a simple product: a user asks a business question, an agent chooses a tool, a query runs against a local dataset, and the UI shows an answer with a trace. One part is broken. The schema docs are incomplete, the query fails on a fiscal-year edge case, the trace hides the useful error, or the UI makes an unsafe answer look definitive.

We ask the candidate to make the workflow production-friendlier. They can improve the prompt, change the tool contract, add validation, adjust the UI, or write a test. We deliberately avoid a single correct path. In real work, the hard part is not knowing that a bug exists. It is deciding which layer should own the fix.

What we look for

The strongest candidates slow down for the right reasons. They inspect the trace before editing. They ask what the user is trying to decide. They make the smallest change that improves the system. They name the residual risk. They do not hide uncertainty behind confident code. They can explain why a fix belongs in schema context rather than in the final answer prompt.

We also watch how they use the person in the room. Good pair engineers narrate trade-offs without turning the interview into a monologue. They are comfortable saying, "I think the product risk is here," or "I can patch this, but I would rather move the boundary." That judgment matters more than typing speed.

What we do not reward

We do not reward framework trivia. We do not reward building a large abstraction in a small exercise. We do not reward silently rewriting the app because the existing code is not how the candidate would have started. Client work often begins inside systems you would not have designed. Taste includes knowing when to leave a working thing alone.

We also do not reward fake certainty about AI. A candidate who says the model will handle it is usually less useful than one who adds a guardrail. Production agents need people who respect failure modes. The model is part of the system, not an excuse to stop engineering.

Why live is fairer

A live interview lets us help when the setup is confusing, clarify the business context, and see how someone responds to new information. It is closer to the work. Our projects happen in small senior teams, with clients asking sharper questions every week. The interview should resemble that environment enough to be predictive.

The best hires leave the exercise with code that is not perfect and a conversation that is excellent. They have improved the product, explained the next step, and shown that they can operate in ambiguity without making the system more complex than the problem deserves. That is the job.

We tell candidates the same thing we tell ourselves on client work: the goal is not to look smart in isolation. The goal is to make the system better while other people can still understand it. That means naming trade-offs, leaving a useful test or trace, and resisting the urge to make a local fix that creates a global mess. In ninety minutes, you can see whether someone has that instinct. You can also see whether they enjoy the kind of precise, unglamorous work that makes agents reliable.

The prompt keeps us honest as interviewers too. If we cannot explain why a change matters to a user, it is not a good interview signal. If the exercise rewards theatrics, the prompt is wrong. Hiring for this work should look like the work itself.

That standard makes the interview calmer, harder, and more predictive than performance alone in practice.