{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# W17D2: SST-2 Competition Starter\n",
    "\n",
    "**Your Mission:** Achieve the highest accuracy on SST-2 sentiment classification.\n",
    "\n",
    "## Required Workflow\n",
    "\n",
    "You **MUST** use the HTML tools to generate your configs:\n",
    "\n",
    "1. `strategy-selector.html` - Decide: Full fine-tune or LoRA?\n",
    "2. `training-args.html` - Generate your TrainingArguments\n",
    "3. `lora-calculator.html` - (If using LoRA) Configure PEFT\n",
    "4. `evidence-builder.html` - Document your final results\n",
    "\n",
    "## Scoring\n",
    "\n",
    "| Component | Weight | Notes |\n",
    "|-----------|--------|-------|\n",
    "| Accuracy | 60% | Higher = better |\n",
    "| Documentation | 20% | Must use evidence-builder |\n",
    "| Efficiency | 20% | Training time, resource usage |\n",
    "\n",
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Your Info\n",
    "\n",
    "Fill this out before submitting:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# ============================================================\n",
    "# STUDENT INFO - Fill this out!\n",
    "# ============================================================\n",
    "\n",
    "STUDENT_NAME = \"Your Name Here\"\n",
    "MODEL_CHOICE = \"distilbert-base-uncased\"  # What model are you using?\n",
    "STRATEGY = \"Full Fine-Tune\"  # or \"LoRA\"\n",
    "\n",
    "print(f\"Student: {STUDENT_NAME}\")\n",
    "print(f\"Model: {MODEL_CHOICE}\")\n",
    "print(f\"Strategy: {STRATEGY}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Setup (Don't Modify)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import time\n",
    "from datetime import datetime\n",
    "\n",
    "os.environ[\"TOKENIZERS_PARALLELISM\"] = \"false\"\n",
    "\n",
    "import torch\n",
    "from datasets import load_dataset\n",
    "from transformers import (\n",
    "    AutoTokenizer,\n",
    "    AutoModelForSequenceClassification,\n",
    "    TrainingArguments,\n",
    "    Trainer,\n",
    "    DataCollatorWithPadding,\n",
    ")\n",
    "import numpy as np\n",
    "from sklearn.metrics import accuracy_score, f1_score\n",
    "\n",
    "# Device detection\n",
    "if torch.backends.mps.is_available():\n",
    "    DEVICE = \"mps\"\n",
    "elif torch.cuda.is_available():\n",
    "    DEVICE = \"cuda\"\n",
    "else:\n",
    "    DEVICE = \"cpu\"\n",
    "\n",
    "print(f\"PyTorch: {torch.__version__}\")\n",
    "print(f\"Device: {DEVICE}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Load Dataset (Don't Modify)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load SST-2 dataset\n",
    "print(\"Loading SST-2 dataset...\")\n",
    "dataset = load_dataset(\"glue\", \"sst2\")\n",
    "\n",
    "print(f\"Train: {len(dataset['train']):,} examples\")\n",
    "print(f\"Validation: {len(dataset['validation']):,} examples\")\n",
    "print(\"Labels: 0=negative, 1=positive\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Load Your Model\n",
    "\n",
    "Choose your model. Options:\n",
    "- `distilbert-base-uncased` - Fast, ~89%\n",
    "- `bert-base-uncased` - Good, ~92%\n",
    "- `roberta-base` - Best base, ~94%\n",
    "- `microsoft/deberta-v3-small` - Efficient, ~93%\n",
    "- Or any other HuggingFace model!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# ============================================================\n",
    "# YOUR MODEL CHOICE\n",
    "# ============================================================\n",
    "\n",
    "MODEL_NAME = MODEL_CHOICE  # Uses your choice from above\n",
    "\n",
    "print(f\"Loading model: {MODEL_NAME}...\")\n",
    "tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)\n",
    "model = AutoModelForSequenceClassification.from_pretrained(\n",
    "    MODEL_NAME,\n",
    "    num_labels=2,\n",
    "    id2label={0: \"negative\", 1: \"positive\"},\n",
    "    label2id={\"negative\": 0, \"positive\": 1},\n",
    ")\n",
    "\n",
    "total_params = sum(p.numel() for p in model.parameters())\n",
    "print(f\"Total parameters: {total_params:,}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. (Optional) Apply LoRA\n",
    "\n",
    "If you chose LoRA in `strategy-selector.html`, paste your config from `lora-calculator.html` here.\n",
    "\n",
    "**Skip this cell if doing full fine-tuning.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# ============================================================\n",
    "# PASTE YOUR LoraConfig FROM lora-calculator.html HERE\n",
    "# (Skip this cell if doing full fine-tuning)\n",
    "# ============================================================\n",
    "\n",
    "USE_LORA = False  # Set to True if using LoRA\n",
    "\n",
    "if USE_LORA:\n",
    "    from peft import LoraConfig, get_peft_model, TaskType\n",
    "    \n",
    "    # PASTE YOUR CONFIG BELOW:\n",
    "    lora_config = LoraConfig(\n",
    "        r=16,\n",
    "        lora_alpha=32,\n",
    "        target_modules=[\"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\"],\n",
    "        lora_dropout=0.05,\n",
    "        bias=\"none\",\n",
    "        task_type=TaskType.SEQ_CLS,\n",
    "    )\n",
    "    \n",
    "    model = get_peft_model(model, lora_config)\n",
    "    model.print_trainable_parameters()\n",
    "else:\n",
    "    print(\"Using full fine-tuning (no LoRA)\")\n",
    "    trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)\n",
    "    print(f\"Trainable parameters: {trainable:,}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Tokenize Dataset (Don't Modify)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def tokenize_function(examples):\n",
    "    return tokenizer(\n",
    "        examples[\"sentence\"],\n",
    "        truncation=True,\n",
    "        max_length=128,\n",
    "    )\n",
    "\n",
    "print(\"Tokenizing...\")\n",
    "tokenized_dataset = dataset.map(\n",
    "    tokenize_function,\n",
    "    batched=True,\n",
    "    remove_columns=[\"sentence\", \"idx\"],\n",
    ")\n",
    "print(\"Done!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Metrics (Don't Modify)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def compute_metrics(eval_pred):\n",
    "    predictions, labels = eval_pred\n",
    "    predictions = np.argmax(predictions, axis=1)\n",
    "    return {\n",
    "        \"accuracy\": accuracy_score(labels, predictions),\n",
    "        \"f1\": f1_score(labels, predictions, average='binary'),\n",
    "    }"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7. YOUR Training Arguments\n",
    "\n",
    "**PASTE YOUR CONFIG FROM `training-args.html` HERE!**\n",
    "\n",
    "This is where the competition happens. Experiment with:\n",
    "- Learning rate\n",
    "- Batch size\n",
    "- Number of epochs\n",
    "- Warmup ratio\n",
    "- Weight decay"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# ============================================================\n",
    "# PASTE YOUR TrainingArguments FROM training-args.html HERE!\n",
    "# ============================================================\n",
    "\n",
    "training_args = TrainingArguments(\n",
    "    # === PASTE YOUR CONFIG BELOW ===\n",
    "    output_dir=\"./my_sst2_results\",\n",
    "    eval_strategy=\"epoch\",\n",
    "    save_strategy=\"epoch\",\n",
    "    learning_rate=2e-5,              # <-- Experiment with this!\n",
    "    per_device_train_batch_size=16,  # <-- And this!\n",
    "    per_device_eval_batch_size=32,\n",
    "    num_train_epochs=3,              # <-- And this!\n",
    "    weight_decay=0.01,\n",
    "    warmup_ratio=0.1,\n",
    "    load_best_model_at_end=True,\n",
    "    metric_for_best_model=\"accuracy\",\n",
    "    logging_steps=100,\n",
    "    fp16=False,  # Keep False for MPS\n",
    "    report_to=\"none\",\n",
    "    # === END YOUR CONFIG ===\n",
    ")\n",
    "\n",
    "print(\"Your training config:\")\n",
    "print(f\"  Learning rate: {training_args.learning_rate}\")\n",
    "print(f\"  Batch size: {training_args.per_device_train_batch_size}\")\n",
    "print(f\"  Epochs: {training_args.num_train_epochs}\")\n",
    "print(f\"  Warmup: {training_args.warmup_ratio}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 8. Train!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create trainer\n",
    "data_collator = DataCollatorWithPadding(tokenizer=tokenizer)\n",
    "\n",
    "trainer = Trainer(\n",
    "    model=model,\n",
    "    args=training_args,\n",
    "    train_dataset=tokenized_dataset[\"train\"],\n",
    "    eval_dataset=tokenized_dataset[\"validation\"],\n",
    "    tokenizer=tokenizer,\n",
    "    data_collator=data_collator,\n",
    "    compute_metrics=compute_metrics,\n",
    ")\n",
    "\n",
    "print(\"Trainer ready!\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# START TRAINING\n",
    "print(\"=\" * 50)\n",
    "print(f\"TRAINING STARTED - {datetime.now().strftime('%H:%M:%S')}\")\n",
    "print(\"=\" * 50)\n",
    "\n",
    "start_time = time.time()\n",
    "trainer.train()\n",
    "training_time = time.time() - start_time\n",
    "\n",
    "print(f\"\\nTraining completed in {int(training_time//60)}m {int(training_time%60)}s\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 9. Your Results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Final evaluation\n",
    "eval_results = trainer.evaluate()\n",
    "\n",
    "# Calculate final values\n",
    "final_accuracy = eval_results['eval_accuracy']\n",
    "final_f1 = eval_results['eval_f1']\n",
    "minutes = int(training_time // 60)\n",
    "seconds = int(training_time % 60)\n",
    "\n",
    "print(\"\\n\" + \"=\" * 50)\n",
    "print(\"YOUR FINAL RESULTS\")\n",
    "print(\"=\" * 50)\n",
    "print(f\"Student: {STUDENT_NAME}\")\n",
    "print(f\"Model: {MODEL_CHOICE}\")\n",
    "print(f\"Strategy: {STRATEGY}\")\n",
    "print(f\"Training time: {minutes}m {seconds}s\")\n",
    "print(f\"\")\n",
    "print(f\">>> ACCURACY: {final_accuracy*100:.2f}% <<<\")\n",
    "print(f\">>> F1 SCORE: {final_f1:.4f} <<<\")\n",
    "print(\"=\" * 50)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 10. Generate Submission\n",
    "\n",
    "Copy this output for the leaderboard:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Generate submission string\n",
    "submission = f\"| {STUDENT_NAME} | {final_accuracy*100:.2f}% | {MODEL_CHOICE} | {STRATEGY} | {minutes}m {seconds}s |\"\n",
    "\n",
    "print(\"\\n\" + \"=\" * 50)\n",
    "print(\"LEADERBOARD SUBMISSION\")\n",
    "print(\"=\" * 50)\n",
    "print(\"\\nCopy this line for the leaderboard:\\n\")\n",
    "print(submission)\n",
    "print(\"\\n\" + \"=\" * 50)\n",
    "print(\"\\nDon't forget to:\")\n",
    "print(\"1. Take screenshots of your HTML tool configs\")\n",
    "print(\"2. Use evidence-builder.html to generate METRICS.md\")\n",
    "print(\"3. Submit both this result AND your documentation\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 11. (Optional) Test Your Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Save model first\n",
    "trainer.save_model(\"./my_sst2_model\")\n",
    "tokenizer.save_pretrained(\"./my_sst2_model\")\n",
    "\n",
    "# Quick inference test\n",
    "from transformers import pipeline\n",
    "\n",
    "classifier = pipeline(\n",
    "    \"sentiment-analysis\",\n",
    "    model=\"./my_sst2_model\",\n",
    "    device=\"mps\" if torch.backends.mps.is_available() else -1,\n",
    ")\n",
    "\n",
    "test_sentences = [\n",
    "    \"This is the best thing ever!\",\n",
    "    \"Absolutely terrible, waste of money.\",\n",
    "    \"It's okay I guess.\",\n",
    "]\n",
    "\n",
    "print(\"\\nQuick inference test:\")\n",
    "for s in test_sentences:\n",
    "    r = classifier(s)[0]\n",
    "    print(f\"  [{r['label']}] ({r['score']:.1%}) {s}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "## Submission Checklist\n",
    "\n",
    "Before submitting, make sure you have:\n",
    "\n",
    "- [ ] Your accuracy score from cell 9\n",
    "- [ ] Screenshots of your HTML tool configs\n",
    "- [ ] METRICS.md from evidence-builder.html\n",
    "- [ ] Training time noted\n",
    "\n",
    "**Good luck!**"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}