OC-301c · Module 1

Context Window Management

3 min read

The context window is finite. Every token used for memory is a token not available for reasoning. Context window management is the engineering discipline of loading the right memories at the right time while preserving enough window for the agent to actually think about the current task.

The management strategy: prioritize by relevance, not recency. The most relevant memories for the current task might be from three months ago, not from yesterday. Relevance scoring uses the current task description to query the memory store and retrieve the memories most likely to improve the agent's performance on this specific task. The retrieval prompt: "Given this task [description], which of these memories are relevant? Score each 0-10 for relevance. Load only memories scoring 7+." This selective loading keeps the context window lean — 20-30% memory, 70-80% available for reasoning.

Do This

  • Load memories by relevance to the current task, not by recency — old memories can be highly relevant
  • Budget context window: 20-30% for memory, 70-80% for reasoning — memory should inform, not crowd
  • Summarize loaded memories to reduce token cost — full transcripts waste context on details the task does not need

Avoid This

  • Load the last N memories regardless of relevance — most recent is not most relevant
  • Fill the context window with memory and leave no room for reasoning — the agent cannot think if it is drowning in context
  • Load full episodic transcripts when a summary would serve the task — summarization preserves window space