GFX-301a · Module 2
Agent Roles
6 min read
The lead agent asks 10 clarifying questions, routes to the right agents, presents ranked results, and never generates images itself.
The lead is the project manager. First, it asks you targeted questions: What style are you going for? What topic? What aspect ratio? What color palette preference? What reference images should I study? It won't kick off the pipeline until it has clear answers. Then it creates a summary brief for the team, invokes each specialist in order, and presents the critic's findings at the end. Critically, the lead never touches the image generation API — that's the generator's job.
The research agent (retriever) scans your reference images folder and outputs a structured style brief covering color, composition, typography, and mood.
When given an image generation request, the research agent scans the reference images folder — organized by style, composition, subject, brand, and output examples. It analyzes what makes each reference effective: the color palette, information hierarchy, visual metaphors, layout grid, typography choices. The output is a structured style brief that the prompt architect consumes. Think of it as the "creative director's notes" that guide all downstream work.
The prompt architect transforms the style brief into five descriptive narrative prompts — never keyword lists like old-school Midjourney prompting.
The key insight: modern image generation APIs (NanoBanana/Gemini) respond better to rich narrative descriptions than comma-separated keyword lists. Instead of "coffee cups, national flags, warm lighting, overhead angle," write: "An overhead view of artisanal coffee cups arranged on a dark wooden table, each cup decorated with the distinctive patterns and colors of national flags, bathed in warm golden morning light that creates soft shadows..." The prompt architect writes five variations, each with a different take on the concept.
The generator agent takes each prompt, calls the image generation API (NanoBanana/Gemini), and saves five images to a structured output folder.
The generator is the hands of the operation. It reads the API guide (a copy-pasted markdown version of the official docs), constructs the API calls with the prompt architect's narrative descriptions, and saves results to the outputs folder. It generates five images — one per prompt variation — because variety matters. Some will be better than others. That's why the critic exists. The generator should also include the Gemini API key configuration and handle retries on API failures.
The critic reviews each generated image on four dimensions: faithfulness, conciseness, readability, and beauty — then ranks them 1-5.
Faithfulness: how well does it match the original request? Conciseness: does it focus on core information without visual clutter? Readability: is the layout clear, text legible, composition clean? Beauty: does it look professional and visually appealing? Each dimension gets a score out of 10. The critic ranks all five images and recommends the best one with a written justification. If no image scores above the threshold, it sends specific improvement instructions back to the generator for another round.