GC-301b · Module 2
Building Extensions from Scratch
3 min read
Building a Gemini CLI extension starts with the MCP server. The Model Context Protocol defines how tools are declared (name, description, input schema) and how results are returned (content blocks with type and text). Your MCP server is a process that speaks JSON-RPC over stdio — Gemini CLI starts it, sends tool call requests, and reads responses. The @modelcontextprotocol/sdk package provides the TypeScript scaffolding, but you can implement the protocol in any language.
The critical design decision is tool granularity. Too many tools overwhelm the model's tool selection. Too few force users to cram complex inputs into a single tool call. The sweet spot is 3-7 tools per extension, each with a single clear purpose. A database extension might expose query (read), mutate (write), schema (introspect), and migrate (alter) — four tools covering all CRUD operations plus metadata. Each tool name should be a verb-noun pair that describes exactly what it does.
import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
const server = new Server(
{ name: "my-extension", version: "1.0.0" },
{ capabilities: { tools: {} } }
);
server.setRequestHandler("tools/list", async () => ({
tools: [
{
name: "query_database",
description: "Execute a read-only SQL query against the project database",
inputSchema: {
type: "object",
properties: {
sql: { type: "string", description: "SELECT query to execute" },
limit: { type: "number", description: "Max rows (default 100)" }
},
required: ["sql"]
}
}
]
}));
const transport = new StdioServerTransport();
await server.connect(transport);
Do This
- Expose 3-7 tools with clear verb-noun names and detailed descriptions
- Use input schemas with required fields and descriptive property descriptions
- Return structured results that the model can parse and present to the user
Avoid This
- Create one mega-tool that accepts a "command" string — the model cannot reason about it well
- Skip input validation — bad inputs produce cryptic errors that confuse the model and the user
- Return raw API responses without structuring them — the model wastes tokens parsing noise