LoRA Parameter Calculator

Understand the math behind Low-Rank Adaptation and configure your PEFT training

How LoRA Works

Instead of updating all model weights, LoRA freezes the pretrained weights and injects trainable low-rank decomposition matrices into each layer. This dramatically reduces trainable parameters while maintaining performance.

Original Weight Matrix

W
d x d (frozen)

LoRA Matrices

A x B
d x r + r x d (trainable)

Output

W + BA
Merged at inference
LoRA params per layer = r x (d_in + d_out)
Scaling factor = alpha / r
Output = W*x + (alpha/r) * B*A*x

Key insight: If d=4096 and r=16, you go from 16M params per layer to just 131K - a 99% reduction!

Configure LoRA

Quick Presets

16
Lower = fewer params, Higher = more capacity. Common: 4, 8, 16, 32, 64
32
Scaling factor. Common practice: alpha = 2*r or alpha = r
More modules = more capacity but more params. Attention layers (q,k,v,o) are standard.
QLoRA: 4-bit quantization enables training 65B models on a single 48GB GPU!

Results

7,000M
Total Model Params
4.19M
LoRA Trainable Params
99.94%
Parameter Reduction
2.0x
Alpha/r Scaling

Trainable Parameters Comparison

Full Fine-Tuning 7,000M params
100%
LoRA 4.19M params
0.06%

Estimated GPU Memory

Model Weights 14.0 GB
LoRA Adapters 8.4 MB
Optimizer States (LoRA only) 16.8 MB
Gradients (LoRA only) 8.4 MB
Activations (est. batch=4) ~2 GB
Total Estimated ~16.0 GB

Recommendations

  • This config fits comfortably on a 24GB GPU (RTX 3090/4090)
  • Consider r=32 or r=64 if you have more GPU memory available
  • Alpha=2*r is a good default for stable training

PEFT Config

from peft import LoraConfig, get_peft_model lora_config = LoraConfig( r=16, lora_alpha=32, target_modules=["q_proj", "k_proj", "v_proj", "o_proj"], lora_dropout=0.05, bias="none", task_type="CAUSAL_LM" # or "SEQ_CLS", "SEQ_2_SEQ_LM" ) model = get_peft_model(base_model, lora_config) model.print_trainable_parameters() # trainable params: 4,194,304 || all params: 7,000,000,000 || trainable%: 0.06%

Test Your Understanding

1. If you double the LoRA rank from 16 to 32, what happens to the trainable parameters?

They stay the same (rank doesn't affect param count)
They roughly double
They quadruple
They increase by 50%
Correct! LoRA params = r x (d_in + d_out) per layer. Since r appears linearly in the formula, doubling r doubles the parameters. This is why choosing the right rank is a key decision!

2. What does the alpha/r ratio control?

The number of trainable layers
The scaling of the LoRA output before adding to frozen weights
The learning rate
The dropout probability
Correct! The scaling factor (alpha/r) determines how much the LoRA adaptation affects the output. Higher scaling = stronger adaptation. This lets you tune the "strength" of LoRA without changing the architecture.

3. Why is QLoRA (4-bit quantization + LoRA) so memory efficient?

It reduces the model size by removing layers
It stores frozen weights in 4-bit but trains LoRA adapters in full precision
It uses CPU offloading
It compresses the LoRA matrices
Correct! QLoRA's insight: frozen weights don't need full precision! By storing them in 4-bit and only training small LoRA adapters in FP16, you can fine-tune a 65B model on a single 48GB GPU. The LoRA adapters remain full precision for accurate gradient updates.