CDX-301d · Module 1
Snapshot/Restore & Memory Management
3 min read
Firecracker supports full VM snapshots — freezing the entire state of a running microVM (memory contents, CPU registers, device state) to disk and restoring it later. This enables Codex Cloud to pre-warm sandboxes: boot a VM, clone the repository, install dependencies, load AGENTS.md, then snapshot the result. When a task arrives, the system restores from the snapshot instead of booting from scratch, reducing startup time from 15-60 seconds to under 2 seconds.
Memory management inside a Firecracker microVM uses balloon devices and demand paging. The hypervisor allocates a maximum memory limit, but the guest only consumes physical pages as needed. If a task allocates 6 GB in an 8 GB VM, 6 GB of host memory is consumed. When the VM is destroyed, all pages are immediately reclaimed. There is no swap by default — if a task exceeds its memory limit, the OOM killer terminates the process. This hard boundary is intentional: swap would allow tasks to run slowly instead of failing fast, making resource overcommitment harder to detect.
# Snapshot lifecycle
1. Base VM boots → repo cloned → deps installed → AGENTS.md loaded
2. VM state frozen → memory pages + CPU state written to snapshot file
3. Snapshot stored (typically 200-800 MB compressed)
4. Task arrives → snapshot restored → VM resumes in <2 seconds
5. Task executes → diff extracted → VM destroyed
# Memory allocation model
Max allocation: 8 GB (configurable per tier)
Physical backing: On-demand (balloon device)
Swap: None (OOM kill on exceed)
Overcommit: Host-level only (not guest-visible)
Reclamation: Instant on VM destroy
# OOM behavior
- Process exceeds limit → OOM killer fires
- Task fails with clear error → no silent degradation
- Logs capture peak memory usage for debugging
Do This
- Monitor peak memory usage in task logs to right-size your VM memory allocation
- Use snapshots for frequently executed task patterns — the amortized boot time approaches zero
- Design tasks to fail fast on OOM rather than degrading silently with swap
Avoid This
- Ignore OOM errors — they indicate your task needs more memory or a smaller working set
- Assume snapshots are always fresh — dependency updates require snapshot regeneration
- Over-allocate memory "just in case" — unused allocations still reserve host resources in warm pools