OC-301i · Module 2
Prompt Efficiency
3 min read
Every token in the prompt costs money. Prompt efficiency is the discipline of achieving the same output quality with fewer input tokens. This is not about making prompts shorter — it is about making prompts leaner. A lean prompt includes everything the model needs and nothing it does not.
Three prompt efficiency techniques. First: context pruning. Load only the context that is relevant to the current task. A sales agent analyzing a specific deal does not need the entire client history — it needs the last 90 days of interactions and the current deal terms. Prune the context to the relevant window. Second: instruction compression. Replace verbose instructions with concise equivalents. "Please analyze the following document and provide a detailed summary that captures the key themes, major findings, and any notable conclusions" becomes "Summarize: key themes, major findings, notable conclusions." Same output, 70% fewer instruction tokens. Third: output specification. Specify the maximum output length. Without a length constraint, models tend to be verbose — and every output token costs money.
Do This
- Prune context to the task-relevant window — 90 days of history instead of all history
- Compress instructions to the minimum words that produce the same output quality
- Specify output length constraints — "200 words max" prevents verbose completions that cost tokens
Avoid This
- Load the entire knowledge base into every prompt — most of it is irrelevant and all of it costs tokens
- Use polite, verbose instructions — "Please kindly" is 2 tokens that add no information
- Allow unconstrained output length — models default to verbose, and verbosity costs money