GC-301b · Module 2

Building Extensions from Scratch

3 min read

Building a Gemini CLI extension starts with the MCP server. The Model Context Protocol defines how tools are declared (name, description, input schema) and how results are returned (content blocks with type and text). Your MCP server is a process that speaks JSON-RPC over stdio — Gemini CLI starts it, sends tool call requests, and reads responses. The @modelcontextprotocol/sdk package provides the TypeScript scaffolding, but you can implement the protocol in any language.

The critical design decision is tool granularity. Too many tools overwhelm the model's tool selection. Too few force users to cram complex inputs into a single tool call. The sweet spot is 3-7 tools per extension, each with a single clear purpose. A database extension might expose query (read), mutate (write), schema (introspect), and migrate (alter) — four tools covering all CRUD operations plus metadata. Each tool name should be a verb-noun pair that describes exactly what it does.

import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";

const server = new Server(
  { name: "my-extension", version: "1.0.0" },
  { capabilities: { tools: {} } }
);

server.setRequestHandler("tools/list", async () => ({
  tools: [
    {
      name: "query_database",
      description: "Execute a read-only SQL query against the project database",
      inputSchema: {
        type: "object",
        properties: {
          sql: { type: "string", description: "SELECT query to execute" },
          limit: { type: "number", description: "Max rows (default 100)" }
        },
        required: ["sql"]
      }
    }
  ]
}));

const transport = new StdioServerTransport();
await server.connect(transport);

Do This

Expose 3-7 tools with clear verb-noun names and detailed descriptions
Use input schemas with required fields and descriptive property descriptions
Return structured results that the model can parse and present to the user

Avoid This

Create one mega-tool that accepts a "command" string — the model cannot reason about it well
Skip input validation — bad inputs produce cryptic errors that confuse the model and the user
Return raw API responses without structuring them — the model wastes tokens parsing noise