KM-301b · Module 1

The Evaluation Matrix

5 min read

Platform selection without a structured evaluation produces a decision based on whoever gave the most compelling demo. The evaluation matrix makes the decision criteria explicit before any platform is evaluated, scores each dimension independently, and forces a conversation about what the organization actually needs rather than what is impressive in a sales context.

Four evaluation dimensions matter for knowledge base infrastructure: retrieval quality, governance features, AI-readiness, and integration surface. These are not equally weighted for every organization. Define the weights before the evaluation, not after.

Retrieval Quality How well does the platform surface relevant content given a query? Sub-dimensions: full-text search accuracy, fuzzy matching for typos and partial terms, semantic search capability (understanding query intent, not just keyword matching), search scoping (ability to search within a category or facet), and result ranking quality. Test with real queries from your actual users — not the vendor's curated demo queries. A platform that ranks poorly on your actual query patterns is the wrong platform regardless of its feature list.
Governance Features Can the platform support your content lifecycle? Sub-dimensions: review and approval workflows, content expiration and staleness flags, versioning and audit trails, role-based permissions (read, write, review, publish, deprecate), and analytics on content usage and contribution. Governance features are frequently the differentiator between platforms that look identical on a feature checklist. Ask for a live demo of the approval workflow — not a screenshot.
AI-Readiness How well does the platform support AI-connected knowledge retrieval? Sub-dimensions: export API with structured content format, embedding generation or external embedding support, webhook or event stream for change notification (so AI systems can stay synchronized), native RAG integrations, and the data model's compatibility with chunking strategies. Platforms that are "AI-ready" as a marketing claim and platforms that are actually AI-ready as an architectural reality are different products. Verify against your actual RAG architecture.
Integration Surface How does the platform connect to the rest of the stack? Sub-dimensions: SSO and identity provider compatibility, API completeness (can you do everything the UI does via API?), webhooks and event streaming, native integrations with CRM, ticketing, Slack, and the development toolchain, and data export formats. An island knowledge base — one that does not connect to the tools where work happens — will not get used. Integration surface is not a nice-to-have.

# Platform evaluation matrix — Knowledge Base Infrastructure
# Score each dimension 1–5. Weight reflects organizational priority.

evaluation_dimensions:
  retrieval_quality:
    weight: 0.30
    sub_criteria:
      - full_text_search_accuracy
      - semantic_search_capability
      - faceted_filter_support
      - result_ranking_quality
    test_method: "Run 20 real user queries from search logs. Score hit rate at position 1, 3, 5."

  governance_features:
    weight: 0.25
    sub_criteria:
      - review_and_approval_workflow
      - content_expiration_flags
      - version_history_and_audit_trail
      - role_based_permissions
    test_method: "Walk through full content lifecycle: draft → review → publish → deprecate."

  ai_readiness:
    weight: 0.25
    sub_criteria:
      - structured_export_api
      - change_event_stream_or_webhook
      - chunking_friendly_data_model
      - native_rag_or_embedding_support
    test_method: "Verify API can export full content corpus with metadata in under 60 seconds."

  integration_surface:
    weight: 0.20
    sub_criteria:
      - sso_compatibility
      - api_completeness
      - slack_and_crm_integrations
      - export_format_options
    test_method: "Confirm SSO works with existing IdP. Test API against 5 common operations."

# Scoring:
# 5 = Exceeds requirement
# 4 = Meets requirement fully
# 3 = Meets requirement with workaround
# 2 = Partial — significant gap
# 1 = Does not meet requirement