GC-301f · Module 3

Playwright Integration

3 min read

Playwright integration gives Gemini CLI the ability to interact with web pages — navigate URLs, click elements, fill forms, read content, and capture screenshots. The browser instance launches at MCP server startup and persists across tool calls. Do not start a new browser per tool invocation. A persistent browser context maintains cookies, localStorage, and session state between actions, which is essential for authenticated workflows. Use Chromium for compatibility, headless mode for CI, and headed mode for debugging.

Page interaction tools should be granular. Build separate tools for navigate, click, type, read_text, screenshot, and wait_for_element. Gemini chains these tools to accomplish complex workflows: navigate to a URL, wait for a selector, read the page content, decide what to click based on the content. Bundling multiple actions into a single "do everything" tool removes Gemini's ability to reason between steps. Each tool call is a decision point where Gemini can adapt based on what it sees.

import { chromium, Browser, Page } from "playwright";

let browser: Browser;
let page: Page;

async function initBrowser() {
  browser = await chromium.launch({ headless: true });
  const context = await browser.newContext({
    viewport: { width: 1280, height: 720 },
    userAgent: "GeminiCLI-Automation/1.0",
  });
  page = await context.newPage();
}

async function navigateTo(url: string) {
  await page.goto(url, { waitUntil: "networkidle" });
  return { url: page.url(), title: await page.title() };
}

async function screenshot(path: string) {
  await page.screenshot({ path, fullPage: true });
  return { saved: path, viewport: "1280x720" };
}

async function readText(selector: string) {
  const el = await page.waitForSelector(selector, { timeout: 5000 });
  return { text: await el?.textContent() ?? "" };
}

Screenshot capture is the bridge between browser state and Gemini's understanding. After a navigation or interaction, capture a screenshot and return the file path. Gemini can reference the screenshot to verify the page loaded correctly, identify UI elements, and decide the next action. For MCP server tools, save screenshots to a temp directory and return the path. Clean up old screenshots periodically — a long session can generate hundreds of images.