LLM GPU Memory (VRAM) Calculator

Estimate how much GPU memory a model needs for inference or training, from its parameter count, numeric precision and optimizer. Useful for choosing a GPU, picking a quantization level, or checking whether a model will fit. Everything runs in your browser.

How is this calculated? Read the guide →

Parameters (billions)

Mode

Weight precision

Runtime overhead (%)

Activations, CUDA context and framework workspace. ~15–25% is typical.

Optional: KV cache (advanced)

Fill these to add the attention KV cache. Leave layers or hidden size at 0 to skip.

Layers

Hidden size

Context length (tokens)

Batch size

Estimated total VRAM

—

These are estimates. Real usage depends on the framework, kernels, attention implementation, fragmentation and activations. Use the result for planning and leave headroom — don't size a GPU to the exact number.

Related tools

LLM Token Counter
Cosine Similarity Calculator
JSONL Converter & Validator
All tools →