MP-301g · Module 3

Keepalive & Proxy Traversal

3 min read

Keepalive heartbeats serve two purposes: they confirm the connection is still alive, and they prevent intermediate network devices from killing idle connections. Corporate proxies, load balancers, and firewalls all have idle connection timeouts — typically 60-120 seconds. If an MCP session goes idle for longer than the intermediary's timeout, the connection is silently dropped. The client's next request gets a connection reset error with no indication of what happened. Application-level keepalives (a ping/pong exchange every 30 seconds) keep the connection active through these intermediaries.

Corporate proxy traversal is one of the hardest problems in MCP deployment. Many enterprises route all HTTP traffic through an inspecting proxy that terminates TLS, scans content, and re-encrypts. SSE streams are particularly problematic: some proxies buffer the entire response before forwarding, which defeats the purpose of streaming. Others enforce maximum connection duration (5-10 minutes), killing long-lived SSE connections. The workarounds are environment-specific: configure proxy bypass rules for MCP endpoints, use HTTP/2 (which proxies handle better than HTTP/1.1 SSE), or fall back to polling.

When MCP traffic must traverse a proxy that does not support SSE, the fallback is long-polling: the client sends periodic GET requests to check for server-initiated messages instead of maintaining a persistent stream. Long-polling increases latency (messages wait until the next poll) and server load (each poll is a new HTTP request), but it works through virtually any proxy, firewall, or CDN. Implement it as a degraded mode that the client activates automatically when SSE connection attempts fail repeatedly.

Implement application-level keepalive Send a ping event on the SSE stream every 30 seconds. For stdio, send a JSON-RPC notification with method "$/keepalive" on the same interval. Log missed keepalives as warnings.
Detect proxy interference If the SSE connection drops within the first 60 seconds, or if the first data event takes more than 5 seconds after connection, suspect proxy buffering. Log the failure pattern for diagnosis.
Implement polling fallback After 3 consecutive SSE connection failures, switch to long-polling mode. Poll every 2 seconds for pending messages. Log the mode switch clearly so operators know the client is degraded.