LoRA Parameter Calculator

How LoRA Works

Instead of updating all model weights, LoRA freezes the pretrained weights and injects trainable low-rank decomposition matrices into each layer. This dramatically reduces trainable parameters while maintaining performance.

Original Weight Matrix

d x d (frozen)

LoRA Matrices

A x B

d x r + r x d (trainable)

Output

W + BA

Merged at inference

LoRA params per layer = r x (d_in + d_out)
Scaling factor = alpha / r
Output = W*x + (alpha/r) * B*A*x

Key insight: If d=4096 and r=16, you go from 16M params per layer to just 131K - a 99% reduction!

Configure LoRA

Quick Presets

Base Model

LoRA Rank (r)

Lower = fewer params, Higher = more capacity. Common: 4, 8, 16, 32, 64

LoRA Alpha

Scaling factor. Common practice: alpha = 2*r or alpha = r

Target Modules

q_proj k_proj v_proj o_proj gate_proj up_proj down_proj embed

More modules = more capacity but more params. Attention layers (q,k,v,o) are standard.

Quantization

QLoRA: 4-bit quantization enables training 65B models on a single 48GB GPU!

Results

7,000M

Total Model Params

4.19M
LoRA Trainable Params

99.94%

Parameter Reduction

2.0x

Alpha/r Scaling

Trainable Parameters Comparison

Full Fine-Tuning 7,000M params

100%

LoRA 4.19M params

0.06%

Estimated GPU Memory

Model Weights 14.0 GB

LoRA Adapters 8.4 MB

Optimizer States (LoRA only) 16.8 MB

Gradients (LoRA only) 8.4 MB

Activations (est. batch=4) ~2 GB

Total Estimated ~16.0 GB

Recommendations

This config fits comfortably on a 24GB GPU (RTX 3090/4090)
Consider r=32 or r=64 if you have more GPU memory available
Alpha=2*r is a good default for stable training

PEFT Config

from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"  # or "SEQ_CLS", "SEQ_2_SEQ_LM"
)

model = get_peft_model(base_model, lora_config)
model.print_trainable_parameters()
# trainable params: 4,194,304 || all params: 7,000,000,000 || trainable%: 0.06%
                    

Test Your Understanding

1. If you double the LoRA rank from 16 to 32, what happens to the trainable parameters?

They stay the same (rank doesn't affect param count)

They roughly double

They quadruple

They increase by 50%

Correct! LoRA params = r x (d_in + d_out) per layer. Since r appears linearly in the formula, doubling r doubles the parameters. This is why choosing the right rank is a key decision!

2. What does the alpha/r ratio control?

The number of trainable layers

The scaling of the LoRA output before adding to frozen weights

The learning rate

The dropout probability

Correct! The scaling factor (alpha/r) determines how much the LoRA adaptation affects the output. Higher scaling = stronger adaptation. This lets you tune the "strength" of LoRA without changing the architecture.

3. Why is QLoRA (4-bit quantization + LoRA) so memory efficient?

It reduces the model size by removing layers

It stores frozen weights in 4-bit but trains LoRA adapters in full precision

It uses CPU offloading

It compresses the LoRA matrices

Correct! QLoRA's insight: frozen weights don't need full precision! By storing them in 4-bit and only training small LoRA adapters in FP16, you can fine-tune a 65B model on a single 48GB GPU. The LoRA adapters remain full precision for accurate gradient updates.

How LoRA Works

Original Weight Matrix

LoRA Matrices

Output

Configure LoRA

Quick Presets

Results

Trainable Parameters Comparison

Estimated GPU Memory

Recommendations

PEFT Config Copy

Test Your Understanding

PEFT Config