CDX-301e · Module 2
Best-of-N & Voting Patterns
3 min read
Best-of-N execution runs the same task N times independently and selects the best result. Each run produces a different output because the model is non-deterministic — temperature, sampling order, and reasoning paths vary between runs. The value of best-of-N increases with task ambiguity: for a deterministic task like "rename variable X to Y," all N runs produce identical output and you have wasted N-1 runs. For an open-ended task like "design a caching architecture," N runs produce N different designs and you can select the best one or synthesize elements from multiple candidates.
Voting patterns extend best-of-N with automated selection. Instead of human review, a final task evaluates all N candidates against defined criteria and selects the winner — or synthesizes the best elements from multiple candidates into a composite solution. The voting task receives all N outputs as input and applies a rubric: correctness (do all tests pass?), complexity (lines of code, cyclomatic complexity), performance (benchmark results), and style (adherence to project conventions). This automates the selection step but requires well-defined evaluation criteria.
# Best-of-3 with automated voting
for i in 1 2 3; do
codex cloud --branch "candidate-${i}" \
"implement a rate limiter for the API gateway. \
Requirements: token bucket algorithm, Redis-backed, \
configurable per-route limits, graceful degradation. \
Include comprehensive tests." &
done
wait
# Voting: evaluate all candidates
codex cloud "review branches candidate-1, candidate-2, candidate-3. \
For each, evaluate:
1. Correctness: do all tests pass?
2. Completeness: are all requirements met?
3. Code quality: complexity, readability, error handling
4. Test coverage: line and branch coverage
Produce a ranked comparison table and recommend the winner."
Do This
- Use best-of-N for tasks where quality varies significantly between runs
- Define evaluation criteria before reviewing candidates — prevent recency bias
- Consider synthesizing the best elements from multiple candidates instead of picking one wholesale
Avoid This
- Run best-of-N on deterministic tasks — it wastes N-1 executions
- Use N > 5 routinely — the marginal improvement per additional candidate drops sharply
- Skip the voting/evaluation step and just pick the first candidate that looks good