PM-301c · Module 1

Schema-Based Specification

5 min read

Providing the actual output schema in the prompt is more precise than describing it in natural language. The schema is the specification — it defines exactly what the output must contain, which fields are required, what types they must be, and what constraints they must satisfy. Three schema formats work reliably: JSON Schema, TypeScript interfaces, and annotated examples.

# FORMAT 1: JSON Schema
Return output conforming to this schema:
{
  "type": "object",
  "required": ["summary", "risk_level", "recommendations"],
  "properties": {
    "summary": { "type": "string", "maxLength": 200 },
    "risk_level": { "type": "string", "enum": ["low", "medium", "high", "critical"] },
    "recommendations": {
      "type": "array",
      "items": { "type": "string" },
      "minItems": 1,
      "maxItems": 5
    }
  }
}

# FORMAT 2: TypeScript Interface
Return output matching this TypeScript interface (as JSON):
interface ContractReview {
  summary: string;          // max 200 characters
  risk_level: 'low' | 'medium' | 'high' | 'critical';
  recommendations: string[]; // 1 to 5 items
}

# FORMAT 3: Annotated Example
Return output in this exact format (replace bracketed content):
{
  "summary": "[max 200 char summary of the contract's key terms and issues]",
  "risk_level": "[one of: low | medium | high | critical]",
  "recommendations": [
    "[first recommendation]",
    "[additional recommendations, 1-5 total]"
  ]
}

Do This

  • Use JSON Schema when you need machine-readable schema validation
  • Use TypeScript interfaces when your team is already in a TypeScript context
  • Use annotated examples when the schema needs human-readable comments
  • Choose one format and use it consistently within a system

Avoid This

  • Mix schema formats in the same prompt — pick one and use it throughout
  • Provide a schema without the instruction to conform to it
  • Omit enum constraints for string fields that have a defined set of valid values
  • Use annotated examples with fictional data that could be confused for real output