GC-301f · Module 2

Vector Stores

3 min read

Vector store tools connect Gemini CLI to embedding-based search systems — Pinecone, Weaviate, Chroma, pgvector, Qdrant. The core operation is semantic search: convert a text query into an embedding vector, find the nearest neighbors in the vector space, and return the matching documents with their similarity scores. The embedding model must match what was used during indexing. If documents were indexed with text-embedding-3-small, queries must use the same model. Dimension mismatches cause silent failures — the query returns zero results instead of erroring.

Similarity queries need metadata filtering to be useful. Pure vector search returns the semantically closest documents regardless of context. A query about "Q4 revenue" might return documents from any year. Combine vector similarity with metadata filters: namespace by document type, filter by date range, restrict by access level. Most vector databases support pre-filter (filter then search) and post-filter (search then filter). Pre-filter is almost always what you want — it reduces the search space before the expensive similarity computation.

import { Pinecone } from "@pinecone-database/pinecone";
import OpenAI from "openai";

const pc = new Pinecone({ apiKey: process.env.PINECONE_KEY! });
const openai = new OpenAI();
const index = pc.index("knowledge-base");

async function searchDocs(query: string, filters?: { docType?: string; after?: string }) {
  // Generate embedding with the same model used for indexing
  const embedding = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input: query,
  });

  const metadataFilter: Record<string, any> = {};
  if (filters?.docType) metadataFilter.docType = { $eq: filters.docType };
  if (filters?.after) metadataFilter.date = { $gte: filters.after };

  const results = await index.query({
    vector: embedding.data[0].embedding,
    topK: 5,
    includeMetadata: true,
    filter: Object.keys(metadataFilter).length ? metadataFilter : undefined,
  });

  return results.matches.map((m) => ({
    score: m.score,
    title: m.metadata?.title,
    source: m.metadata?.source,
    text: m.metadata?.text,
  }));
}

RAG integration combines vector search with Gemini's generation. The pattern: Gemini receives a user question, calls the vector search tool to retrieve relevant documents, then synthesizes an answer grounded in the retrieved context. The tool returns document chunks with source attribution — title, URL, page number. Gemini can then cite sources in its response. Keep retrieved chunks under 2000 tokens each. Five chunks at 2000 tokens is 10K tokens of context — enough to ground an answer without overwhelming the conversation.