Your enum has 10,000 values. The model needs 6.

Structured generation ensures model output conforms to a schema. But most implementations treat that schema as fixed: every valid value must be known before the first token is generated. In practice, many schemas reference value sets with thousands or hundreds of thousands of entries, product taxonomies, medical terminologies, knowledge graph ontologies, where the valid subset depends on runtime context.

The structure of the output is stable. The valid values are not. The schema says "pick a category", but which categories are valid depends on what you're extracting from, what you already know, or what a retrieval step just returned.

Large value sets are everywhere

This comes up any time your schema references a value set that's too large to enumerate statically but too important to leave unconstrained:

Knowledge graphs and GraphRAG. A complex ontology might define millions of entity types. The valid subset depends on context. A retrieval step already identifies the relevant types, but the structured generation layer has no way to use that data to constrain the output.
Product catalogs. An e-commerce taxonomy has 10,000+ categories. The valid subset depends on the product type. The context is obvious to a human but invisible to a static schema.
Clinical terminologies. The SNOMED-CT medical ontology has 350,000+ concepts. For any given patient encounter, or data extraction task, only a tiny subset of these values are relevant.
Geographic and administrative codes. ISO 3166 lists 249 country codes. The valid subset depends on business rules. A generation task for EU invoices needs the 27 member states, a US shipping form needs 50 states plus territories. Same field, different constraints per deployment.

Two bad options

Structured generation as offered by most providers treats the schema as a static, fully-specified artifact. That forces you into one of two compromises.

Embed the full value set. Enumerate every valid value in the schema's enum. This works for small sets (countries, currencies). For anything larger, the schema becomes impractical. Thousands of mostly irrelevant options overfill the context window and slows down the schema compilation. A product extraction schema with 10,000 categories is technically correct and practically useless.

Leave it unconstrained. Define the field as "type": "string" and hope the model picks a valid value. For well-known domains, it often does. For specialized codes, identifiers, or long-tail categories, it doesn't. You get plausible-looking but invalid values, such as a SNOMED code that doesn't exist, a product category that's almost right but not in your taxonomy. You catch these in post-hoc validation and retry, burning tokens and latency.

Both approaches collapse what should be a two-step process — define the structure, then narrow the values — into a single static artifact. The schema can't adapt to context.

External constraints - the reliable option

We built a different primitive into our structured generation engine. The schema defines the shape of the output. A separate external constraint, supplied at request time, narrows specific fields to the values that are valid for this particular generation.

The schema stays generic and reusable. The constraint is specific to the task: the document being processed, the retrieval results, the user's context. And because the constraint is enforced during generation, the model never produces an invalid value. No retries. No post-hoc validation.

Constraints can come from anywhere: a GraphRAG retrieval step that identifies relevant entity types, a database lookup that returns valid category IDs, or a static configuration for a known domain. They can be defined upfront or computed dynamically during inference.

Narrowing a product category at runtime

A product extraction schema defines category as a string. The full taxonomy has 10,000+ values. At runtime, a retrieval step identifies the relevant product domain and supplies a constraint limiting the field to 6 valid categories.

Step 1 of 5

Output

Schema

A clinical example: patient allergies

Healthcare is one of the sharpest versions of this problem. Standards like FHIR represent coded values as a system + code + display triple, where the code comes from a terminology like RxNorm (hundreds of thousands of drug terms).

Say you're extracting allergy records from clinical notes. The substance field references RxNorm, but a given note only mentions a few drugs. An external constraint narrows the field to the relevant substances, so the model picks from "Penicillin" and "Amoxicillin" instead of the entire pharmacopeia.

Narrowing a clinical value set for allergy extraction

The schema defines an allergy record with a substance field from RxNorm (hundreds of thousands of entries). An external constraint narrows it to the substances mentioned in or relevant to a specific clinical note.

Step 1 of 5

Output

Schema (allergy)

How it works

External constraints are applied at the structured generation layer, before token sampling begins.

Schema definition. You define your schema with the full structure: object shapes, required fields, types. Fields that reference large value sets use "type": "string" or a broad placeholder.
Constraint binding. At request time, you supply constraints that target specific schema paths and narrow their allowed values. The constraint is a simple object: {"path": "$.category", "enum": ["Headphones", "Earbuds", ...]}
Generation-time enforcement. The engine merges constraints into the schema before building the token mask. The model can only produce tokens that lead to valid values. No retries, no post-hoc validation, no rejection sampling.

The schema is an artifact you version and deploy. The constraints are parameters you pass per request. One schema, many contexts.

Try it

If you're building extraction pipelines where the valid value set is large, context-dependent, or determined at runtime — whether from a knowledge graph, a product catalog, a clinical terminology, or any other reference system — get in touch or send us your schemas. We'll show you how external constraints can replace your post-hoc validation loops with generation-time enforcement.