MP-201b · Module 2

File System & Document Resources

4 min read

File system resources expose directories, files, and parsed documents through MCP. The official filesystem MCP server provides a reference implementation: it mounts a set of allowed directories and exposes their contents as file:// URIs. The server can list directory trees, read file contents, and watch for changes using the operating system's native filesystem events. Crucially, the server enforces path boundaries — it rejects any URI that resolves outside the mounted directories, preventing path traversal attacks.

Raw file content is useful for text files, but enterprise data often lives in structured documents — PDFs, CSVs, Excel spreadsheets, JSON configuration files. A document-aware MCP server adds a parsing layer that transforms binary formats into text the model can consume. CSV files become JSON arrays. PDFs become extracted text with page markers. Excel files become sheet-by-sheet JSON. The key design decision is whether to parse eagerly (at list time) or lazily (at read time). Lazy parsing is almost always correct — listing a directory with 500 PDFs should not trigger 500 PDF extractions.

File watching ties into MCP subscriptions. When a client subscribes to a file resource, the server registers a filesystem watcher (inotify on Linux, FSEvents on macOS, ReadDirectoryChangesW on Windows). When the file changes, the server emits a notification, and the client can re-read. This pattern is particularly valuable for log files and configuration files that change during an AI session — the model gets notified of changes without polling.

Do This

  • Mount specific directories, never the root filesystem
  • Parse documents lazily — extract content only on read, not on list
  • Return MIME types so the client knows how to display the content
  • Use filesystem watchers for subscription-based change detection

Avoid This

  • Expose the home directory or project root without filtering — config files contain secrets
  • Parse every file at startup — a directory with thousands of files will timeout
  • Return binary content directly — always convert to text or base64 with a MIME type
  • Ignore symbolic links — they can traverse outside your allowed directories