Configure LoRA
Quick Presets
Lower = fewer params, Higher = more capacity. Common: 4, 8, 16, 32, 64
Scaling factor. Common practice: alpha = 2*r or alpha = r
More modules = more capacity but more params. Attention layers (q,k,v,o) are standard.
QLoRA: 4-bit quantization enables training 65B models on a single 48GB GPU!
Results
7,000M
Total Model Params
4.19M
LoRA Trainable Params
99.94%
Parameter Reduction
2.0x
Alpha/r Scaling
Trainable Parameters Comparison
Estimated GPU Memory
Model Weights
14.0 GB
LoRA Adapters
8.4 MB
Optimizer States (LoRA only)
16.8 MB
Gradients (LoRA only)
8.4 MB
Activations (est. batch=4)
~2 GB
Total Estimated
~16.0 GB
Recommendations
- This config fits comfortably on a 24GB GPU (RTX 3090/4090)
- Consider r=32 or r=64 if you have more GPU memory available
- Alpha=2*r is a good default for stable training
PEFT Config
from peft import LoraConfig, get_peft_model
lora_config = LoraConfig(
r=16,
lora_alpha=32,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM" # or "SEQ_CLS", "SEQ_2_SEQ_LM"
)
model = get_peft_model(base_model, lora_config)
model.print_trainable_parameters()
# trainable params: 4,194,304 || all params: 7,000,000,000 || trainable%: 0.06%