W17D2: Advanced NLP Engineering - Transformers Fine-Tuning

The Reviewable Training Run

"A training run is a reviewable artifact. Your PR should make it obvious: what you changed, why you changed it, how you measured it, and what still fails."

Interactive Learning Modules

Master transformer fine-tuning through hands-on exploration

Learn

🎯

Fine-Tuning Strategy Selector

Interactive decision tree to choose between full fine-tuning and LoRA/PEFT based on your constraints.

Compute and memory constraints
Dataset size considerations
Deployment requirements
Parameter efficiency analysis

Select Strategy

Learn

⚙️

TrainingArguments Explorer

Interactive guide to Hugging Face TrainingArguments. Understand each parameter's impact.

Batch size and learning rate
Evaluation and save strategies
Checkpointing options
Generate config snippets

Explore Args

Practice

🌍

Multilingual Evaluation Planner

Design slice-based evaluation plans that expose hidden failures across languages and scripts.

Language and script slices
Code-switching test cases
Failure mode categories
Export evaluation plan

Plan Evaluation

Learn

🔤

Tokenization Visualizer

See how different languages and scripts are tokenized. Understand why tokenization matters for multilingual NLP.

Compare tokenization across languages
Visualize subword splits
Identify tokenization stress tests
Understand coverage issues

Visualize Tokens

Challenge

📋

PR Evidence Builder

Generate the complete "evidence bundle" for a reviewable training PR. Create all required artifacts.

Config and metrics tables
Failure analysis templates
Limitations statements
Reproducibility commands

Build Evidence

Learn

🔧

LoRA Parameter Calculator

Calculate and compare trainable parameters for LoRA vs full fine-tuning. Understand the trade-offs.

Rank and alpha settings
Parameter count comparison
Memory estimation
Config generator

Calculate LoRA

Competition

🏆

SST-2 Competition

Put your skills to the test! Fine-tune a model on SST-2 and compete for the highest accuracy.

Starter notebook template
Competition rules & scoring
Leaderboard submission
Real training practice

View Rules

Prep

📚

Concepts Reference

Team learning activity - 5 teams each learn and teach critical concepts before the demo.

Foundation & Transformers
Models & Tokenization
Training & Hyperparameters
LoRA & Evaluation

Team Learning

Suggested Learning Path

🎯

Strategy

15 min

⚙️

Training Args

15 min

🌍

Eval Slices

20 min

🔤

Tokenization

10 min

📋

Evidence PR

20 min

🏆

Competition

30 min

🏆 SST-2 Fine-Tuning Competition

Use the tools above to optimize your training config, then compete for the highest accuracy!

Competition Rules Python Script Jupyter Notebook

Key Concepts

Essential terms for tonight's session

Fine-tuning

Updating pretrained weights on task-specific data to adapt behavior

LoRA

Low-rank adapters that reduce trainable parameters while keeping base weights fixed

PEFT

Parameter-Efficient Fine-Tuning methods for adapting models with fewer parameters

Trainer

HF API that manages training loop, evaluation, saving, and logging

TrainingArguments

Structured config for batch size, LR, epochs, save strategy

Evaluation Slice

Subset of eval data by condition (language, script, domain)

Code-switching

Input mixing multiple languages within the same text

Reproducibility

Ability to rerun training/eval and get consistent results

Code You'll Work With

Hugging Face Trainer pattern


from transformers import Trainer, TrainingArguments


args = TrainingArguments(

    output_dir="runs/experiment",

    evaluation_strategy="epoch",

    save_strategy="epoch",

    num_train_epochs=3,

    per_device_train_batch_size=8,

    learning_rate=2e-5,

    seed=42,  # Reproducibility!

)


trainer = Trainer(

    model=model,

    args=args,

    train_dataset=train_ds,

    eval_dataset=eval_ds,

    compute_metrics=compute_metrics,

)

trainer.train()

Tonight's Learning Objectives

By the end of this session, you'll be able to:

🎯

Choose Fine-Tuning Strategy

Select full fine-tune vs LoRA based on compute, data, and deployment constraints

🔄

Run Reproducible Training

Use Trainer API with proper checkpoints, eval, and artifact saving

🌍

Evaluate Multilingually

Create slice-based evaluation with per-language metrics and error analysis

📋

Produce Evidence

Generate reviewable PRs with config, metrics, failures, and limitations

🔤

Understand Tokenization

Identify multilingual tokenization issues and create stress tests

📝

Document Decisions

Write ADRs for training choices that reviewers can audit

Advanced NLP Engineering

The Reviewable Training Run

Tonight's Mission

Interactive Learning Modules

Fine-Tuning Strategy Selector

TrainingArguments Explorer

Multilingual Evaluation Planner

Tokenization Visualizer

PR Evidence Builder

LoRA Parameter Calculator

SST-2 Competition

Concepts Reference

Suggested Learning Path

🏆 SST-2 Fine-Tuning Competition

Key Concepts

Code You'll Work With

Tonight's Learning Objectives

Choose Fine-Tuning Strategy

Run Reproducible Training

Evaluate Multilingually

Produce Evidence

Understand Tokenization

Document Decisions