KM-301b · Module 1

The Knowledge Schema

5 min read

A knowledge base without a schema is a collection of text files with titles. The schema defines what types of knowledge exist, what fields each type requires, and what structure a contributor must follow. Without a schema, every contributor invents their own format. At scale, this produces a corpus that is human-navigable with effort and machine-readable only with heroic parsing.

Five knowledge types cover the majority of enterprise knowledge base content. Each requires a distinct schema because each serves a distinct retrieval purpose.

Articles (Explanatory Content) Required fields: title, summary (150 words max, plain text), body, author, last-reviewed date, topic facets, audience facet. Articles explain concepts. They answer "what is this?" and "why does it work this way?" The summary field is non-optional — it powers search result snippets and AI retrieval. Articles without summaries get silently downranked in every retrieval system that uses them.
Runbooks (Procedural Content) Required fields: title, trigger condition (when to use this runbook), numbered steps, expected outcome, rollback procedure, owner, last-tested date. Runbooks describe how to perform a specific task. The trigger condition field answers "when does someone reach for this?" without which a runbook cannot be surfaced by a retrieval system that does not already know to look for it. Last-tested date enables staleness detection.
Decision Records (Architectural Decisions) Required fields: title, date, status (proposed/accepted/superseded/deprecated), context (what was the situation), decision (what was chosen), rationale (why), consequences (what does this commit us to), supersedes (link to any prior decision this replaces). Decision records are the most undervalued knowledge type. They capture why, not just what — and the absence of decision records is the primary reason teams relitigate resolved decisions.
References (Structured Lookup) Required fields: title, reference type, content (structured table, list, or specification), source, last-updated date. References are stable, factual content: API endpoint tables, configuration parameter lists, compliance checklists, contact directories. They are not explanatory. They answer "what is the value of X?" rather than "why does X work this way?" Reference content benefits from strict formatting constraints — deviate and it becomes hard to parse programmatically.
FAQs (Question-Scoped Answers) Required fields: question (verbatim, as a user would ask it), answer, related articles (links), topic facets. FAQs are optimized for question-matching retrieval, not for human navigation. The question field must be written as a user would actually phrase it — not as an internal team member would phrase it. "How do I reset my API key?" not "API Key Reset Procedure." The mismatch between internal vocabulary and user vocabulary is where FAQ retrieval fails.

# Knowledge schema — content type definitions
content_types:
  article:
    required:
      - title: string (max 100 chars)
      - summary: string (max 150 words, plain text — no markdown)
      - body: markdown
      - author: user_id
      - last_reviewed: date
      - topic: facet (multi-value, from controlled list)
      - audience: facet (single-value, from controlled list)
    optional:
      - related_articles: list[content_id]
      - resources: list[url]

  runbook:
    required:
      - title: string (start with verb: "Reset," "Deploy," "Diagnose")
      - trigger_condition: string (max 200 chars — when does someone reach for this?)
      - steps: ordered_list[{step: string, expected_result: string}]
      - expected_outcome: string
      - rollback_procedure: string (or "N/A — not reversible")
      - owner: user_id
      - last_tested: date
      - complexity: facet (Tier 1 / Tier 2 / Tier 3)
    optional:
      - prerequisites: list[content_id]
      - tools_required: list[string]

  decision_record:
    required:
      - title: string (format: "Decision: [what was decided]")
      - date: date
      - status: enum [proposed, accepted, superseded, deprecated]
      - context: string (situation before the decision)
      - decision: string (what was chosen)
      - rationale: string (why this option over alternatives)
      - consequences: string (what this commits the team to)
    optional:
      - alternatives_considered: list[string]
      - supersedes: content_id
      - superseded_by: content_id

  faq:
    required:
      - question: string (verbatim user phrasing — not internal vocabulary)
      - answer: string (max 300 words)
      - topic: facet (multi-value)
    optional:
      - related_articles: list[content_id]
      - last_validated: date