Your enum has 10,000 values. The model needs 6.
Structured generation ensures model output conforms to a schema. But most implementations treat that schema as fixed: every valid value must be known before the first token is generated. In practice, many schemas reference value sets with thousands or hundreds of thousands of entries, product taxonomies, medical terminologies, knowledge graph ontologies, where the valid subset depends on runtime context.
The structure of the output is stable. The valid values are not. The schema says "pick a category", but which categories are valid depends on what you're extracting from, what you already know, or what a retrieval step just returned.
Large value sets are everywhere
This comes up any time your schema references a value set that's too large to enumerate statically but too important to leave unconstrained:
- Knowledge graphs and GraphRAG. A complex ontology might define millions of entity types. The valid subset depends on context. A retrieval step already identifies the relevant types, but the structured generation layer has no way to use that data to constrain the output.
- Product catalogs. An e-commerce taxonomy has 10,000+ categories. The valid subset depends on the product type. The context is obvious to a human but invisible to a static schema.
- Clinical terminologies. The SNOMED-CT medical ontology has 350,000+ concepts. For any given patient encounter, or data extraction task, only a tiny subset of these values are relevant.
- Geographic and administrative codes. ISO 3166 lists 249 country codes. The valid subset depends on business rules. A generation task for EU invoices needs the 27 member states, a US shipping form needs 50 states plus territories. Same field, different constraints per deployment.
Two bad options
Structured generation as offered by most providers treats the schema as a static, fully-specified artifact. That forces you into one of two compromises.
Embed the full value set.
Enumerate every valid value in the schema's enum. This
works for small sets (countries, currencies). For anything larger, the
schema becomes impractical. Thousands of mostly irrelevant options overfill the context window and
slows down the schema compilation. A product extraction schema with 10,000 categories is
technically correct and practically useless.
Leave it unconstrained.
Define the field as "type": "string" and hope the model
picks a valid value. For well-known domains, it often does. For
specialized codes, identifiers, or long-tail categories, it doesn't.
You get plausible-looking but invalid values, such as a SNOMED code that
doesn't exist, a product category that's almost right but not in your
taxonomy. You catch these in post-hoc validation and retry, burning
tokens and latency.
Both approaches collapse what should be a two-step process — define the structure, then narrow the values — into a single static artifact. The schema can't adapt to context.
External constraints - the reliable option
We built a different primitive into our structured generation engine. The schema defines the shape of the output. A separate external constraint, supplied at request time, narrows specific fields to the values that are valid for this particular generation.
The schema stays generic and reusable. The constraint is specific to the task: the document being processed, the retrieval results, the user's context. And because the constraint is enforced during generation, the model never produces an invalid value. No retries. No post-hoc validation.
Constraints can come from anywhere: a GraphRAG retrieval step that identifies relevant entity types, a database lookup that returns valid category IDs, or a static configuration for a known domain. They can be defined upfront or computed dynamically during inference.
category as a
string. The full taxonomy has 10,000+ values. At runtime, a
retrieval step identifies the relevant product domain and
supplies a constraint limiting the field to 6 valid categories.
A clinical example: patient allergies
Healthcare is one of the sharpest versions of this problem. Standards
like FHIR represent coded values as a system +
code + display triple, where the code comes
from a terminology like RxNorm (hundreds of thousands of drug terms).
Say you're extracting allergy records from clinical notes. The
substance field references RxNorm, but a given note
only mentions a few drugs. An external constraint narrows the field to
the relevant substances, so the model picks from
"Penicillin" and "Amoxicillin" instead of the entire pharmacopeia.
substance field from RxNorm (hundreds of thousands
of entries). An external constraint narrows it to the
substances mentioned in or relevant to a specific clinical note.
How it works
External constraints are applied at the structured generation layer, before token sampling begins.
- Schema definition. You define your schema with the
full structure: object shapes, required fields, types. Fields that
reference large value sets use
"type": "string"or a broad placeholder. - Constraint binding. At request time, you supply
constraints that target specific schema paths and narrow their allowed
values. The constraint is a simple object:
{"path": "$.category", "enum": ["Headphones", "Earbuds", ...]} - Generation-time enforcement. The engine merges constraints into the schema before building the token mask. The model can only produce tokens that lead to valid values. No retries, no post-hoc validation, no rejection sampling.
The schema is an artifact you version and deploy. The constraints are parameters you pass per request. One schema, many contexts.
Try it
If you're building extraction pipelines where the valid value set is large, context-dependent, or determined at runtime — whether from a knowledge graph, a product catalog, a clinical terminology, or any other reference system — get in touch or send us your schemas. We'll show you how external constraints can replace your post-hoc validation loops with generation-time enforcement.