CDX-301e · Module 2

Best-of-N & Voting Patterns

3 min read

Best-of-N execution runs the same task N times independently and selects the best result. Each run produces a different output because the model is non-deterministic — temperature, sampling order, and reasoning paths vary between runs. The value of best-of-N increases with task ambiguity: for a deterministic task like "rename variable X to Y," all N runs produce identical output and you have wasted N-1 runs. For an open-ended task like "design a caching architecture," N runs produce N different designs and you can select the best one or synthesize elements from multiple candidates.

Voting patterns extend best-of-N with automated selection. Instead of human review, a final task evaluates all N candidates against defined criteria and selects the winner — or synthesizes the best elements from multiple candidates into a composite solution. The voting task receives all N outputs as input and applies a rubric: correctness (do all tests pass?), complexity (lines of code, cyclomatic complexity), performance (benchmark results), and style (adherence to project conventions). This automates the selection step but requires well-defined evaluation criteria.

# Best-of-3 with automated voting
for i in 1 2 3; do
  codex cloud --branch "candidate-${i}" \
    "implement a rate limiter for the API gateway. \
     Requirements: token bucket algorithm, Redis-backed, \
     configurable per-route limits, graceful degradation. \
     Include comprehensive tests." &
done
wait

# Voting: evaluate all candidates
codex cloud "review branches candidate-1, candidate-2, candidate-3. \
  For each, evaluate:
  1. Correctness: do all tests pass?
  2. Completeness: are all requirements met?
  3. Code quality: complexity, readability, error handling
  4. Test coverage: line and branch coverage
  Produce a ranked comparison table and recommend the winner."

Do This

Use best-of-N for tasks where quality varies significantly between runs
Define evaluation criteria before reviewing candidates — prevent recency bias
Consider synthesizing the best elements from multiple candidates instead of picking one wholesale

Avoid This

Run best-of-N on deterministic tasks — it wastes N-1 executions
Use N > 5 routinely — the marginal improvement per additional candidate drops sharply
Skip the voting/evaluation step and just pick the first candidate that looks good