The closing brace
Every generation finds a synchronization point everyone accepts as physics.
Remember the full page reload before Ajax, when a single changing number required a complete HTML document? Or the synchronous React render before Fiber, when one state change meant one blocking pass through the tree? In both cases, the boundary felt natural because the surrounding architecture had grown around it. Then the boundary moved, and whole categories of machinery stopped being necessary.
Structured output has one too: the closing brace.
Today's APIs are prompt and JSON Schema in, complete JSON object out. Everyone accepts this as the natural interface.
At .txt, we recently built something small but that completely changed our way of working with structured outputs: an API that streams JSON Patch operations instead of waiting for the whole object to be generated. Rather than a finished blob or a series of tokens, you receive a stream of facts as the model produces them:
event: patch
data: {"op":"add","path":"/intent","value":"refund"}
event: patch
data: {"op":"add","path":"/account_id","value":"ACC-8821"}
event: patch
data: {"op":"add","path":"/reply","value":"Hi Jane, I've processed..."}
event: done
data: {}
The most obvious application is to use LLMs to generate UI elements. Using this interface, the form fills in field by field as each fact arrives, instead of sitting on a spinner until the closing brace:
t=50ms ┌─────────────────────┐
│ Intent: refund │
│ Account: ___ │
│ Reply: ___ │
└─────────────────────┘
t=120ms ┌─────────────────────┐
│ Intent: refund │
│ Account: ACC-8821 │
│ Reply: ___ │
└─────────────────────┘
t=800ms ┌─────────────────────┐
│ Intent: refund │
│ Account: ACC-8821 │
│ Reply: Hi Jane... │
└─────────────────────┘
The model was always generating those facts in sequence. It knew /intent before it knew /reply. Information was arriving incrementally, with meaning accumulating as generation proceeded. We were just hiding that process behind a buffering layer, then pretending the complete object had appeared all at once.
The closing brace turns out to be just a synchronization point we got used to waiting for. Routing decisions wait for it. Tool calls wait for it. Agent handoffs wait for it. Database writes wait for it. Entire orchestration layers are built around it. And yet nothing in the model requires it.
The closing brace was a collective hallucination.
Collapsing latencies
Once facts arrive one at a time, latency becomes a schedule.
Consider a support agent that classifies intent, checks urgency, and writes a reply. In a framework, those three fields are extracted from the complete object after generation finishes. All downstream actions (routing, escalation, sending the reply) begin at that point.
Framework
(t=800ms) |-- close }
|-- route
|-- page oncall
(t=850ms) |-- send reply
Fact stream:
(t=50ms) |-- /intent ---> route
(t=120ms) |-- /urgency --> page oncall
(t=800ms) |-- /reply ----> send reply
The routing decision fires at 50ms instead of 800ms. The escalation fires at 120ms. The reply still takes 800ms to generate, but the work that does not depend on it is already done.
Removing the closing brace lets the system do useful work during time it previously spent waiting. In multi-step workflows the savings compound: a researcher can start on step one while steps two through five are still being generated.
In a framework, every field arrives at the same time, after the closing brace. There is no way to say "I need this fact before that one." The object is atomic. With a fact stream, the field order in your schema decides when each fact becomes available. Put intent first because routing should fire early. Put urgency second because escalation should not wait for the reply. Put reply last because it is the longest field and should not block anything else. Schema design becomes priority specification.
The schema is a schedule.
What collapses
Once facts stream individually, each one becomes a point where your code can act: route, log, validate, spawn work, or decide not to proceed. The object was hiding all of those decision points behind a single synchronization barrier. What used to need a supervisor agent, a context object, and a session manager now fits in one Python function:
async def chat(messages, agent, handlers): task = None async for event in agent(messages): if not event.is_leaf: continue match event.field: case "department": # routing task = asyncio.create_task(handlers[event.value](messages)) case "confidence" if event.value < 0.5: # early cancellation if task: task.cancel() case "reply": messages.append({"role": "assistant", "content": event.value}) await send(event.value)
Routing collapses to match. Today: supervisor LLMs that pick which sub-agent runs, or conditional edges wired between graph nodes. With a fact stream, the classification field arrives early and the consumer dispatches by value.
Early cancellation becomes possible. With a closed object you cannot act on early information because you do not have it yet. Here /department at 50ms spawns a research task; /confidence at 200ms decides whether to keep it. The schedule lets you race the model and bail when the bet was wrong.
Conversation memory is a list. Today: thread IDs paired with checkpointers, message_history= params, session-state objects. With a fact stream, you append to a list. The list is the memory.
Dependency injection is just function arguments. Today: RunContext[DepsT] generics, signature inspection, ctx.deps.x. With a fact stream, dependencies are closure variables or function parameters.
The agent stops being an object. Today: a class with a dozen kwargs (output_type, tools, deps_type, retries, message_history, model, …). With a fact stream, an agent is a function consuming a stream — agent above. The class collapses.
Plenty stays (partial-truth semantics, durability, retries, observability), at the runtime layer instead of the parsing layer.
Underneath the format
The stream users see is really just carrying assertions:
("intent", "refund")
("account_id", "ACC-8821")
("confidence", 0.94)
("reply", "Hi Jane...")
These assertions become available one by one. The object is what you get if you collect them and freeze them. The stream is the record of the model's incremental commitments, closer to a transaction log over evolving structured state than to a serialized object.
The wire format is incidental. JSON Patch over SSE is one encoding (the .txt API also exposes the same stream over NDJSON); XML works too; any protocol that lets a model emit typed values at known paths works. And different models are good at different formats: some are reliable JSON generators, others do better with pseudo-Python, custom DSLs, or YAML. Pick whatever the model in front of you generates most accurately; the consumer never sees the wire anyway, they iterate over events with .field and .value.
For example, the same content could be emitted as XML:
<intent>refund</intent> <account_id>ACC-8821</account_id> <reply>Hi Jane, I've processed...</reply>
Or as YAML:
intent: refund account_id: ACC-8821 reply: Hi Jane, I've processed...
The grammar engine constrains different tokens. The consumer still receives the same events, with event.field = "intent"= and event.value = "refund"=, and so on. The schema, the routing, the cancellation logic: none of it changes.
The schema starts doing most of the work. It defines what facts the agent can emit; the field order defines when each one becomes available; the types constrain what values are valid; and JSON Schema's composition primitives ($ref, allOf, oneOf, if/then/else) turn into orchestration primitives. Change the schema and you change the agent. The schema is the program; the model runs it.