Skip to content

← All tools

LLM GPU Memory (VRAM) Calculator

Estimate how much GPU memory a model needs for inference or training, from its parameter count, numeric precision and optimizer. Useful for choosing a GPU, picking a quantization level, or checking whether a model will fit. Everything runs in your browser.

How is this calculated? Read the guide →

Mode

Activations, CUDA context and framework workspace. ~15–25% is typical.

Optional: KV cache (advanced)

Fill these to add the attention KV cache. Leave layers or hidden size at 0 to skip.

Estimated total VRAM

These are estimates. Real usage depends on the framework, kernels, attention implementation, fragmentation and activations. Use the result for planning and leave headroom — don't size a GPU to the exact number.

Related tools