Generative Taxonomies with Claude Code
This article was written based on Claude Code v2.1.17. The full example is available in its GitHub repo.
In one project, image candidates (short descriptions like "Tokyo skyline") needed to be classified by whether they'd be viable for image generation OR should be fetched via image search instead. Cheap image generation models like z-image/turbo struggle with certain subjects, but, on the other hand, image generation can avoid the hassle of licensing and/or quality issues that often come with stock images.
There were already classification examples of historical image candidates:

| Generatable | Searchable | Discarded |
|---|---|---|
| Finnish person portrait | The Mona Lisa | Medieval warrior |
| Swiss chocolate bar | Tokyo skyline | Programming language |
| Japanese samurai warrior | Pikachu | Electric vehicle |
| ... | ... | ... |
Generatable image candidates are such that cheap image models produce relatively good results for them. On the other hand, searchable image candidates can be searched for, but image generation models tend to not create great results for them (e.g. well-known characters, well-known pieces of art, celebrities...). Discarded candidates are too vague or abstract for either approach, and should be rethought.
But terms like "generatable" are ambiguous, especially for agents, so an unambiguous taxonomy was needed. Maintaining such documentation by hand wasn't viable, because new edge cases keep appearing daily, which would mean constant review and updates.
Generative Taxonomy
Providing tons of historical classification results as few-shot examples is an option, but it would still be ambiguous, and token-wise wasteful. Instead of that, the taxonomy should be fully disposable and re-generatable from the classification examples. So when new classification examples are added, the taxonomy can simply be regenerated — ready to use without manual maintenance.
Iterative Generation
One-shotting a whole taxonomy is a pipe dream, so the process of generating it should be iterative. In the Evaluator-Optimizer pattern, the Optimizer creates and adjusts some system, and the Evaluator gives the Optimizer feedback on it. In our case, the taxonomy is the system, and the feedback consists of misclassifications the Evaluator made using said taxonomy.
In addition, an Orchestrator agent runs the iteration loop and keeps the two agents at arm's length from each other, so they can't cheat by communicating directly outside the loop.
Imperative Execution
A plain (declarative) taxonomy would probably be too vague to be actionable for agents. So it should be materialized as a simple imperative logic, like a decision tree, that can be interpreted from top to bottom:
- _if_ {condition}, _then_ choose category {A/B/C/...}.
- _elif_ {condition}, _then_ choose category {A/B/C/...}.
- ...
- _else_ choose category {A/B/C/...}.
This way, the entire classification logic is also fully present locally, in one decision tree, rather than spread across cross-referencing descriptions. This helps with the challenge of agents tending to remember best what was appended to their context last.
Also, the number of branches should be limited (say, <9) to avoid overfitting to the examples (i.e. every single example handled by its very own branch).
Claude-Driven Clauding
Opus is probably the best candidate for the Orchestrator and Optimizer, because it's able to follow instructions without getting confused. But, at least in our case, Sonnet is used as the Evaluator. This ensures that the generated decision tree will be so unambiguous and actionable, that Sonnet can deal with it in prod as well, making ops a bit more affordable.
Implementation
Optimizer's Assignment: Taxonomy
The Optimizer receives the categorized examples and, based on those, must generate the taxonomy and the decision tree that follows it. In the assignment, the categories are labeled only like "A", "B", and "X" — not "Generatable", "Searchable", and "Discarded" — because these names, despite being somewhat intuitive to humans, are way too ambiguous and misleading for agents. Instead, the Optimizer must come up with proper, unambiguous naming along with the rest of the taxonomy (see work/t/taxonomy.md):
<!-- work/t/taxonomy.md -->
# Assignment
...
The decision tree should be like:
```md
- _if_ {some condition}, _then_ choose category {A/B/X}.
- _elif_ {some condition}, _then_ choose category {A/B/X}.
- _elif_ {some condition}, _then_ choose category {A/B/X}.
- ...
- _elif_ {some condition}, _then_ choose category {A/B/X}.
- _else_ choose category {A/B/X}.
```
## Category examples
### Category A
- Finnish person portrait
- Swiss chocolate bar
- Japanese samurai warrior
...
### Category B
- The Mona Lisa
- Tokyo skyline
- Pikachu
...
### Category X
- Medieval warrior
- Programming language
- Electric vehicle
...<!-- work/t/taxonomy.md -->
# Assignment
...
The decision tree should be like:
```md
- _if_ {some condition}, _then_ choose category {A/B/X}.
- _elif_ {some condition}, _then_ choose category {A/B/X}.
- _elif_ {some condition}, _then_ choose category {A/B/X}.
- ...
- _elif_ {some condition}, _then_ choose category {A/B/X}.
- _else_ choose category {A/B/X}.
```
## Category examples
### Category A
- Finnish person portrait
- Swiss chocolate bar
- Japanese samurai warrior
...
### Category B
- The Mona Lisa
- Tokyo skyline
- Pikachu
...
### Category X
- Medieval warrior
- Programming language
- Electric vehicle
...Evaluator's Assignment: Classification
The Evaluator receives the same examples but without ready-made classifications and in randomized order (see work/c/classification.md). The Evaluator tries to re-create the same correct classifications using only the decision tree:
<!-- work/c/classification.md -->
Items:
- Medieval European fortress
- Photo from Swiss Alps
- Helsinki
- Constitutional monarchy
- Nokia phone
...<!-- work/c/classification.md -->
Items:
- Medieval European fortress
- Photo from Swiss Alps
- Helsinki
- Constitutional monarchy
- Nokia phone
...Also, the Evaluator copies the decision tree from a separate file rather than work/t/taxonomy.md: The Evaluator must not see all the reasoning behind the taxonomy, because that isn't the case in production either.
Orchestration Loop
The Orchestrator follows a simple loop (see .claude/commands/create-taxonomy.md).
- Spawn the Optimizer to create initial taxonomy and decision tree
- Spawn a fresh Evaluator to classify all examples using the decision tree
- Compare Evaluator's actual classifications against the expected ones (the examples)
- If discrepancies exist, resume the Optimizer with feedback and go to step 2 (this is also the ending condition)
The Optimizer is resumed across iterations (keeping context) to iteratively improve the taxonomy. Whereas the Evaluator is re-spawned fresh each time (no memory of previous attempts) to match production scenarios.
Results
After running the orchestration, the Orchestrator agent reported that it took 5 iterations to create a taxonomy with a decision tree that classifies all the example image candidates correctly:
Iteration 1: 10 discrepancies (Finnish Government official, Roman caesar,
Ancient Greek philosopher, Egyptian pharaoh, The prime minister of Finland,
Marcus Aurelius, Aristotle, Urho Kekkonen, Albert Einstein, The current CEO
of Apple)
Iteration 2: 4 discrepancies (Space telescope flying through space, Photo
from Swiss Alps, Renaissance style piece of art, Medieval warrior)
Iteration 3: 4 discrepancies (Photo from Swiss Alps, European capital city,
Baroque style cathedral, Asian cuisine)
Iteration 4: 3 discrepancies (Electric vehicle, Helsinki cityscape, Tokyo
skyline)
Iteration 5: 0 discrepancies ✓
The main challenges were distinguishing:
- Temporal vs cultural/regional qualifiers
- Specific named cities vs generic regional descriptions
- Broad category labels vs specific object types
- Style labels alone vs style labels with specific objectsIteration 1: 10 discrepancies (Finnish Government official, Roman caesar,
Ancient Greek philosopher, Egyptian pharaoh, The prime minister of Finland,
Marcus Aurelius, Aristotle, Urho Kekkonen, Albert Einstein, The current CEO
of Apple)
Iteration 2: 4 discrepancies (Space telescope flying through space, Photo
from Swiss Alps, Renaissance style piece of art, Medieval warrior)
Iteration 3: 4 discrepancies (Photo from Swiss Alps, European capital city,
Baroque style cathedral, Asian cuisine)
Iteration 4: 3 discrepancies (Electric vehicle, Helsinki cityscape, Tokyo
skyline)
Iteration 5: 0 discrepancies ✓
The main challenges were distinguishing:
- Temporal vs cultural/regional qualifiers
- Specific named cities vs generic regional descriptions
- Broad category labels vs specific object types
- Style labels alone vs style labels with specific objectsThe generated taxonomy defines the categories A/B/X as follows (see work/t/taxonomy.md):
-
Category A: "Depictable Generic Subjects" — Visually concrete subjects described by type, archetype, style, or scene rather than by specific named identity. Examples: "Japanese samurai warrior", "Baroque style cathedral", "Swiss chocolate bar"
-
Category B: "Searchable Named Entities" — Specific named entities with visual context that makes them searchable. Examples: "The Mona Lisa", "Tokyo skyline", "Pikachu"
-
Category X: "Abstract or Underspecified Concepts" — Abstract concepts, bare proper nouns without visual specification, or generic labels lacking sufficient specificity. Examples: "Programming language", "Medieval warrior", "Helsinki"
The generated decision tree that correctly classifies all the example image candidates (see work/c/category-decision-tree.md):
- _if_ the subject is an abstract concept without inherent visual form,
_then_ choose category **X**.
- _elif_ the subject is a specific named individual (a person's actual name),
OR uses "The" to point to a unique role/position/entity, OR references a
specific named entity (brand, trademark, copyrighted character, titled artwork,
specific named landmark), OR is a specific named city combined with visual
specification (skyline, cityscape), _then_ choose category **B**.
- _elif_ the subject is a bare proper noun (place name, entity name) without
visual specification, _then_ choose category **X**.
- _elif_ the subject is a broad category label without specificity, OR is a
regional qualifier combined with a broad category label, OR is a role/archetype
with ONLY temporal qualifiers (Medieval, Ancient, Modern, Renaissance, Romantic
era) and NO cultural/regional qualifier, OR is a generic style label without a
specific object type, _then_ choose category **X**.
- _elif_ the subject is a generic regional landscape description (NOT a specific
named city), OR is a style label combined with a specific object type, OR
includes cultural/regional qualifiers combined with a specific object type or
role, OR is a specific object type with scene context, _then_ choose category **A**.
- _else_ choose category **A**.- _if_ the subject is an abstract concept without inherent visual form,
_then_ choose category **X**.
- _elif_ the subject is a specific named individual (a person's actual name),
OR uses "The" to point to a unique role/position/entity, OR references a
specific named entity (brand, trademark, copyrighted character, titled artwork,
specific named landmark), OR is a specific named city combined with visual
specification (skyline, cityscape), _then_ choose category **B**.
- _elif_ the subject is a bare proper noun (place name, entity name) without
visual specification, _then_ choose category **X**.
- _elif_ the subject is a broad category label without specificity, OR is a
regional qualifier combined with a broad category label, OR is a role/archetype
with ONLY temporal qualifiers (Medieval, Ancient, Modern, Renaissance, Romantic
era) and NO cultural/regional qualifier, OR is a generic style label without a
specific object type, _then_ choose category **X**.
- _elif_ the subject is a generic regional landscape description (NOT a specific
named city), OR is a style label combined with a specific object type, OR
includes cultural/regional qualifiers combined with a specific object type or
role, OR is a specific object type with scene context, _then_ choose category **A**.
- _else_ choose category **A**.Evaluating the Results of the Process to Generate a Mechanism for Classification
A taxonomy and actionable decision tree were created, and more importantly, a process that automatically regenerates these from classification examples gathered during normal production work.
In this article, we used the same set of examples for optimizing and evaluation. In practice, you'd want to separate these sets to avoid overfitting.
We required 0 discrepancies as the ending condition, but a reasonable threshold (say, 95% accuracy) might be more practical.
LLMs have their inherent temperature, so any generated taxonomy isn't guaranteed to be reproducible. But the goal isn't perfection but efficiency. If it saves 90% of manual classification work and a quick review is enough, it's still a nice bonus.