# W17D2: SST-2 Fine-Tuning Competition

## Overview

Welcome to the SST-2 Sentiment Classification Competition! Your goal is to achieve the highest accuracy on the Stanford Sentiment Treebank (SST-2) dataset while demonstrating proper ML engineering practices.

**Dataset:** SST-2 (binary sentiment: positive/negative)
**Metric:** Accuracy on validation set
**Time Limit:** 30 minutes training time

---

## The Challenge

Fine-tune a transformer model to classify movie review sentences as positive or negative. The baseline to beat:

| Model | Baseline Accuracy |
|-------|-------------------|
| DistilBERT | ~89% |
| BERT-base | ~92% |
| RoBERTa-base | ~94% |

Can you beat these baselines with smart hyperparameter tuning?

---

## Required Workflow

You **MUST** use the provided HTML tools. This is not optional - it's how you learn the concepts!

### Step 1: Choose Your Strategy
**Tool:** `strategy-selector.html`

Answer the decision tree questions:
- How much GPU memory do you have?
- How much training data?
- Do you need fast iteration?

**Output:** Decision on Full Fine-Tune vs LoRA

### Step 2: Configure Training Arguments
**Tool:** `training-args.html`

Configure your training hyperparameters:
- Learning rate (try: 1e-5 to 5e-5)
- Batch size (try: 8, 16, 32)
- Epochs (try: 2-5)
- Warmup ratio
- Weight decay

**Output:** Copy the generated `TrainingArguments` code

### Step 3: (If Using LoRA) Configure PEFT
**Tool:** `lora-calculator.html`

If you chose LoRA:
- Set rank (r) - try 8, 16, 32
- Set alpha - typically 2x rank
- Choose target modules

**Output:** Copy the generated `LoraConfig` code

### Step 4: Train & Evaluate
**Tool:** `student_starter.ipynb`

1. Paste your configs into the notebook
2. Run training
3. Record your accuracy and training time

### Step 5: Document Results
**Tool:** `evidence-builder.html`

Generate your METRICS.md with:
- Final accuracy
- Training configuration
- What you tried
- What worked/didn't work

---

## Scoring Rubric

| Component | Weight | Criteria |
|-----------|--------|----------|
| **Accuracy** | 60% | Higher accuracy = more points |
| **Documentation** | 20% | Used evidence-builder, clear explanations |
| **Efficiency** | 20% | Training time, smart resource use |

### Accuracy Scoring
```
Points = (Your Accuracy - 85) * 10

Examples:
- 90% accuracy = (90 - 85) * 10 = 50 points
- 93% accuracy = (93 - 85) * 10 = 80 points
- 95% accuracy = (95 - 85) * 10 = 100 points
```

### Documentation Scoring
- Used all HTML tools with screenshots: 15 points
- Clear METRICS.md from evidence-builder: 5 points
- Missing documentation: 0 points

### Efficiency Scoring
- Under 10 minutes: 20 points
- 10-20 minutes: 15 points
- 20-30 minutes: 10 points
- Over 30 minutes: 0 points (disqualified)

---

## Constraints

1. **Time Limit:** Maximum 30 minutes training time
2. **Model Choice:** Any HuggingFace model is allowed
3. **Tools Required:** Must use HTML tools (screenshots required)
4. **No Cheating:**
   - No using pre-fine-tuned SST-2 models
   - No external data augmentation
   - No ensemble methods
5. **Documentation:** Must submit METRICS.md from evidence-builder

---

## Submission Format

### Leaderboard Entry
Submit a row for the class leaderboard:

```
| Rank | Name | Accuracy | Model | Strategy | Time |
|------|------|----------|-------|----------|------|
| - | Your Name | XX.XX% | model-name | Full/LoRA | Xm Xs |
```

### Required Artifacts
1. **Leaderboard entry** (from notebook output)
2. **Screenshots** of your HTML tool configurations:
   - strategy-selector.html decision
   - training-args.html config
   - lora-calculator.html config (if applicable)
3. **METRICS.md** from evidence-builder.html

---

## Tips for Success

### Quick Wins
- Start with DistilBERT for fast iteration
- Use the "Quick Test" preset first to verify everything works
- Learning rate is often the most impactful hyperparameter

### Advanced Strategies
- Try RoBERTa-base for best accuracy (slower but worth it)
- Experiment with learning rate schedules (warmup_ratio)
- Consider LoRA if you want to try larger models

### Common Mistakes
- Training for too many epochs (overfitting)
- Batch size too large for your GPU
- Not saving checkpoints (losing progress)
- Forgetting to set `load_best_model_at_end=True`

### Model Recommendations
| Goal | Recommended Model |
|------|-------------------|
| Fastest training | `distilbert-base-uncased` |
| Best accuracy | `roberta-base` |
| Good balance | `bert-base-uncased` |
| Memory constrained | `microsoft/deberta-v3-small` |

---

## Honor Code

By participating, you agree to:

1. Only use models and data as specified
2. Not share solutions during the competition
3. Report accurate results (no fabrication)
4. Use the HTML tools as required
5. Submit your own work

---

## Timeline

| Phase | Duration |
|-------|----------|
| Instructor Demo | 15 min |
| Strategy Selection | 10 min |
| Config Generation | 10 min |
| Training & Tuning | 30 min |
| Documentation | 10 min |
| Submission & Leaderboard | 10 min |

---

## Getting Help

- **Stuck on tools?** Re-read the HTML tool instructions
- **Training errors?** Check your batch size (reduce if OOM)
- **Low accuracy?** Try a different learning rate
- **Out of time?** Submit your best result so far

---

## Leaderboard

*Results will be posted here during class*

| Rank | Name | Accuracy | Model | Strategy | Time |
|------|------|----------|-------|----------|------|
| 1 | - | -% | - | - | - |
| 2 | - | -% | - | - | - |
| 3 | - | -% | - | - | - |

---

Good luck, and may the best hyperparameters win!
