LLM GPU Memory (VRAM) Calculator
Estimate how much GPU memory a model needs for inference or training, from its parameter count, numeric precision and optimizer. Useful for choosing a GPU, picking a quantization level, or checking whether a model will fit. Everything runs in your browser.
How is this calculated? Read the guide →
Activations, CUDA context and framework workspace. ~15–25% is typical.
Optional: KV cache (advanced)
Fill these to add the attention KV cache. Leave layers or hidden size at 0 to skip.
Optimizer states and the master copy are stored in fp32. Activation memory depends on batch and sequence length and is not included — add headroom.
Estimated total VRAM
—
These are estimates. Real usage depends on the framework, kernels, attention implementation, fragmentation and activations. Use the result for planning and leave headroom — don't size a GPU to the exact number.