CDX-301d · Module 2

GPU Access & Acceleration

3 min read

GPU access in Codex Cloud microVMs is not available by default. Standard and enhanced tiers provide CPU-only execution. GPU-enabled tiers use PCIe passthrough or vGPU (virtual GPU) to expose a physical GPU device to the microVM guest. The guest sees a standard CUDA or ROCm device and can run GPU-accelerated workloads — ML inference, image processing, CUDA compilation — as if the GPU were physically installed. The key constraint: GPU VMs are significantly more expensive and have limited availability.

GPU passthrough gives the guest exclusive access to a physical GPU. No other VM can use that GPU simultaneously, which means GPU VMs have a higher resource cost and longer cold start times (GPU allocation adds 5-15 seconds to boot). vGPU (MIG — Multi-Instance GPU on NVIDIA A100/H100) allows multiple VMs to share a single GPU, reducing cost but also reducing available VRAM and compute per VM. The choice between passthrough and vGPU depends on your workload: inference tasks with small models fit in vGPU partitions; training or large-model inference needs full passthrough.

# GPU access models

Passthrough:    Full GPU → 1 VM (exclusive)
  - Full VRAM (e.g., 80 GB on A100)
  - Full compute (all SMs)
  - Boot penalty: +5-15 seconds
  - Cost: highest tier

vGPU (MIG):     1 GPU → up to 7 VMs (shared)
  - Partitioned VRAM (e.g., 10 GB per partition)
  - Partitioned compute (fraction of SMs)
  - Boot penalty: +2-5 seconds
  - Cost: moderate premium over CPU-only

# When to use GPU in Codex tasks
- ML model inference during tests
- Image/video processing pipelines
- CUDA kernel compilation and testing
- Large-scale data processing with GPU-accelerated libraries

# When NOT to use GPU
- Standard code generation and refactoring
- Text-based testing and linting
- Documentation and configuration tasks
  1. Identify GPU-dependent tasks Audit your test suite and build process. Which steps actually invoke GPU APIs (CUDA, OpenCL, Metal)? Only those tasks need GPU-enabled VMs.
  2. Choose the right GPU model If your GPU task needs less than 10 GB VRAM, use vGPU partitions. Full passthrough is only justified for large-model inference or multi-GPU training tests.
  3. Benchmark CPU vs GPU For borderline tasks (image processing, small inference), benchmark both CPU and GPU tiers. If the CPU tier completes within your time budget, the GPU premium is not worth paying.