GFX-201a · Module 2

Four-Dimension Evaluation

4 min read

Most people evaluate AI-generated images with their gut. "That looks good." "That feels off." "I like this one better." Gut reactions are fine for personal projects, but they are useless for production work. When you are generating images for a brand, a client, or a team, you need a shared language for what "good" means. The four-dimension framework gives you that language: Composition, Technical Quality, Narrative Clarity, and Brand Alignment. Every AI-generated image can be scored across these four axes, and the scores tell you exactly what needs to change.

Dimension 1: Composition Is the image visually balanced? Does the eye flow naturally through the frame? Are the key elements placed with intention — rule of thirds, leading lines, negative space? Composition problems are the most common in AI-generated images because most prompts describe subjects without describing spatial relationships.
Dimension 2: Technical Quality Sharpness, color accuracy, lighting consistency, absence of artifacts. Does skin look like skin? Do edges resolve cleanly? Are there telltale AI artifacts — extra fingers, melted text, impossible reflections? Technical quality is binary for most use cases: either it passes inspection or it does not.
Dimension 3: Narrative Clarity Does the image tell the story you intended? Can a viewer understand the subject, context, and mood within two seconds? If the image needs a caption to make sense, the narrative is unclear. Great AI images communicate their purpose without explanation.
Dimension 4: Brand Alignment Does this image feel like it belongs to your brand? Does it match the color language, mood, and style documented in your style guide? An image can score perfectly on the other three dimensions and still fail brand alignment — technically excellent but visually off-brand.

The scoring is simple: each dimension gets a pass, partial, or fail. Four passes means the image ships. Any fail means the image goes back for revision with a clear diagnosis — you know exactly which dimension broke and can adjust your prompt accordingly. Partials are judgment calls: if composition is partial but everything else passes, you might ship it with a slight crop. If brand alignment is partial, you probably go back to the reference images and try again.

The framework's real power is not individual image evaluation — it is pattern recognition across dozens of evaluations. After a week of scoring images, you will notice that your prompts consistently underperform on one dimension. Maybe your compositions are always centered and static. Maybe technical quality passes but narrative clarity is weak because you describe subjects without context. The pattern tells you what to fix in your prompt templates, not just in individual images.