Transformers Fine-Tuning & Multilingual Evaluation
"A training run is a reviewable artifact. Your PR should make it obvious: what you changed, why you changed it, how you measured it, and what still fails."
Master transformer fine-tuning through hands-on exploration
Interactive decision tree to choose between full fine-tuning and LoRA/PEFT based on your constraints.
Interactive guide to Hugging Face TrainingArguments. Understand each parameter's impact.
Design slice-based evaluation plans that expose hidden failures across languages and scripts.
See how different languages and scripts are tokenized. Understand why tokenization matters for multilingual NLP.
Generate the complete "evidence bundle" for a reviewable training PR. Create all required artifacts.
Calculate and compare trainable parameters for LoRA vs full fine-tuning. Understand the trade-offs.
Put your skills to the test! Fine-tune a model on SST-2 and compete for the highest accuracy.
Team learning activity - 5 teams each learn and teach critical concepts before the demo.
Use the tools above to optimize your training config, then compete for the highest accuracy!
Essential terms for tonight's session
Hugging Face Trainer pattern
from transformers import Trainer, TrainingArguments
args = TrainingArguments(
output_dir="runs/experiment",
evaluation_strategy="epoch",
save_strategy="epoch",
num_train_epochs=3,
per_device_train_batch_size=8,
learning_rate=2e-5,
seed=42, # Reproducibility!
)
trainer = Trainer(
model=model,
args=args,
train_dataset=train_ds,
eval_dataset=eval_ds,
compute_metrics=compute_metrics,
)
trainer.train()
By the end of this session, you'll be able to:
Select full fine-tune vs LoRA based on compute, data, and deployment constraints
Use Trainer API with proper checkpoints, eval, and artifact saving
Create slice-based evaluation with per-language metrics and error analysis
Generate reviewable PRs with config, metrics, failures, and limitations
Identify multilingual tokenization issues and create stress tests
Write ADRs for training choices that reviewers can audit