PM-301b · Module 3

How Many Examples

4 min read

More examples do not linearly improve performance. There is a diminishing returns curve: the first example provides maximum signal, the second adds significant information, subsequent examples add progressively less unless they cover genuinely new cases. Every example costs tokens. At production scale, token cost is a real constraint.

0-Shot Zero examples. Best when: the task is well-specified by instructions alone, the output format is simple, or the model has strong priors for this task type. Test zero-shot first — if it works, it is the cheapest option.
1-Shot One example. Establishes format and tone. Works for tasks where one demonstration is sufficient to anchor the output pattern. Effective for style transfer and format-heavy tasks where the format can be demonstrated in a single instance.
3-Shot Three examples. The production default for most tasks. Covers representativeness, variety, and edge case with reasonable token cost. Use 3-shot when zero-shot and 1-shot do not produce consistent enough outputs.
5-10 Shot Five to ten examples. Use when: input distribution is highly varied, format compliance is critical and must be demonstrated across many cases, or early experiments show 3-shot is not sufficient. Test empirically — add examples until quality plateaus.
10+ Shot More than ten examples. Rarely needed. If you find yourself here, consider whether fine-tuning is more appropriate than few-shot prompting. Fine-tuning is cheaper at inference time and more consistent than very long example sets.

Test protocol for determining optimal example count:

1. Baseline: run 100 test inputs with 0 examples. Measure quality score.
2. 1-shot: run same 100 inputs with 1 example. Measure delta.
3. 3-shot: run same 100 inputs with 3 examples. Measure delta.
4. Continue adding examples in sets until:
   - Quality improvement per additional example < 2%
   - OR you have reached the token budget limit

Report format:
| Examples | Quality Score | Delta | Token Cost per Request |
|----------|--------------|-------|----------------------|
| 0        | 71%          | -     | 450 tokens           |
| 1        | 84%          | +13%  | 620 tokens           |
| 3        | 91%          | +7%   | 960 tokens           |
| 5        | 92%          | +1%   | 1,300 tokens         |

Decision: 3-shot. The 5th and 6th examples add 1% quality at 340 tokens each — not worth it.