{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"
"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"# Tutorial 1: Task definition, application, relations and impacts on generalization\n",
"\n",
"**Week 1, Day 2: Comparing Tasks**\n",
"\n",
"**By Neuromatch Academy**\n",
"\n",
"__Content creators:__ Deying Song, Leila Wehbe\n",
"\n",
"__Content reviewers:__ Samuele Bolotta, Hlib Solodzhuk, RyeongKyung Yoon, Lily Chamakura, Yizhou Chen, Ruiyi Zhang, Patrick Mineault\n",
"\n",
"__Production editors:__ Konstantine Tsafatinos, Ella Batty, Spiros Chavlis, Samuele Bolotta, Hlib Solodzhuk, Patrick Mineault\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"___\n",
"\n",
"\n",
"# Tutorial Objectives\n",
"\n",
"*Estimated timing of tutorial: 90 minutes*\n",
"\n",
"In this tutorial, we'll explore how task specification affects generalization. We will use the same base architecture (a convolutional neural network or CNN) to perform different tasks with different outputs. We will explore the number of training points and number of epochs needed to train these networks. Additionally, we will explore how well representations learned for a given task generalize, and whether they can be used to solve the other tasks.\n",
"\n",
"Today's learning objectives are:\n",
"\n",
"1. Formulate different tasks in terms of cost functions.\n",
"2. Train a network to accomplish these tasks and compare the performance of these networks.\n",
"3. Measure how well different representations generalize\n",
"\n",
"Let's get started.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"executionInfo": {
"elapsed": 172,
"status": "ok",
"timestamp": 1718208610982,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
},
"tags": [
"remove-input"
]
},
"outputs": [],
"source": [
"# @markdown\n",
"from IPython.display import IFrame\n",
"from ipywidgets import widgets\n",
"out = widgets.Output()\n",
"with out:\n",
" print(f\"If you want to download the slides: https://osf.io/download/x4y79/\")\n",
" display(IFrame(src=f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/x4y79/?direct%26mode=render%26action=download%26mode=render\", width=730, height=410))\n",
"display(out)"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Setup\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Install and import feedback gadget\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"executionInfo": {
"elapsed": 183983,
"status": "ok",
"timestamp": 1718208795199,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Install and import feedback gadget\n",
"\n",
"!pip install vibecheck numpy matplotlib torch torchvision tqdm ipywidgets memory-profiler requests scikit-learn torchmetrics --quiet\n",
"\n",
"\n",
"from vibecheck import DatatopsContentReviewContainer\n",
"def content_review(notebook_section: str):\n",
" return DatatopsContentReviewContainer(\n",
" \"\", # No text prompt\n",
" notebook_section,\n",
" {\n",
" \"url\": \"https://pmyvdlilci.execute-api.us-east-1.amazonaws.com/klab\",\n",
" \"name\": \"neuromatch_neuroai\",\n",
" \"user_key\": \"wb2cxze8\",\n",
" },\n",
" ).render()\n",
"\n",
"feedback_prefix = \"W1D2_T1\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Import dependencies\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"executionInfo": {
"elapsed": 5196,
"status": "ok",
"timestamp": 1718208800392,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Import dependencies\n",
"\n",
"# Import standard library dependencies\n",
"import os\n",
"import time\n",
"import gc\n",
"import logging\n",
"from pathlib import Path\n",
"import zipfile\n",
"import random\n",
"import contextlib\n",
"import io\n",
"\n",
"# Import third-party libraries\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import torch\n",
"import torch.nn as nn\n",
"import torch.optim as optim\n",
"import torch.nn.functional as F\n",
"import torchvision\n",
"import torchvision.transforms as transforms\n",
"from tqdm.notebook import tqdm\n",
"from ipywidgets import Layout\n",
"from memory_profiler import profile\n",
"import requests\n",
"from sklearn.metrics import confusion_matrix\n",
"from torchmetrics import Accuracy\n",
"from torch.utils.data import DataLoader\n",
"import vibecheck\n",
"import torch\n",
"import torch.nn as nn\n",
"import torch.nn.functional as F\n",
"from torch.utils.data import DataLoader, SubsetRandomSampler\n",
"from pathlib import Path\n",
"import time\n",
"from tqdm import tqdm"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Figure settings\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Figure settings\n",
"\n",
"logging.getLogger('matplotlib.font_manager').disabled = True\n",
"\n",
"%matplotlib inline\n",
"%config InlineBackend.figure_format = 'retina' # perfrom high definition rendering for images and plots\n",
"plt.style.use(\"https://raw.githubusercontent.com/NeuromatchAcademy/course-content/main/nma.mplstyle\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set device (GPU or CPU)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Set device (GPU or CPU)\n",
"\n",
"def set_device():\n",
" \"\"\"\n",
" Determines and sets the computational device for PyTorch operations based on the availability of a CUDA-capable GPU.\n",
"\n",
" Outputs:\n",
" - device (str): The device that PyTorch will use for computations ('cuda' or 'cpu'). This string can be directly used\n",
" in PyTorch operations to specify the device.\n",
" \"\"\"\n",
"\n",
" device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
" if device != \"cuda\":\n",
" print(\"GPU is not enabled in this notebook. \\n\"\n",
" \"If you want to enable it, in the menu under `Runtime` -> \\n\"\n",
" \"`Hardware accelerator.` and select `GPU` from the dropdown menu\")\n",
" else:\n",
" print(\"GPU is enabled in this notebook. \\n\"\n",
" \"If you want to disable it, in the menu under `Runtime` -> \\n\"\n",
" \"`Hardware accelerator.` and select `None` from the dropdown menu\")\n",
"\n",
" return device\n",
"\n",
"device = set_device()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Helper functions\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"executionInfo": {
"elapsed": 187,
"status": "ok",
"timestamp": 1718208800782,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Helper functions\n",
"\n",
"\n",
"\n",
"class BottleneckLayer(nn.Module):\n",
" def __init__(self, M):\n",
" super(BottleneckLayer, self).__init__()\n",
" self.fc = nn.Linear(LATENT_DIM, M)\n",
"\n",
" def forward(self, x):\n",
" x = F.relu(self.fc(x))\n",
" return x\n",
"\n",
"class ConvNeuralNetDecoder(nn.Module):\n",
" def __init__(self, M):\n",
" super(ConvNeuralNetDecoder, self).__init__()\n",
" self.fc3 = nn.Linear(M, LATENT_DIM)\n",
" self.fc2 = nn.Linear(84, 120)\n",
" self.fc1 = nn.Linear(120, 16 * 5 * 5)\n",
"\n",
" self.convT2 = nn.ConvTranspose2d(16, 6, 5, stride=2, padding=0, output_padding=1)\n",
" self.convT1 = nn.ConvTranspose2d(6, 1, 5, stride=2, padding=0, output_padding=1)\n",
"\n",
" def forward(self, x):\n",
" x = F.relu(self.fc3(x))\n",
" x = F.relu(self.fc2(x))\n",
" x = F.relu(self.fc1(x))\n",
"\n",
" x = x.view(-1, 16, 5, 5)\n",
"\n",
" x = F.relu(self.convT2(x))\n",
" x = self.convT1(x)\n",
"\n",
" return x\n",
"\n",
"def get_random_sample_dataloader(dataset, batch_size, M):\n",
" indices = torch.randperm(len(dataset))[:M]\n",
" sampler = SubsetRandomSampler(indices)\n",
" sampled_loader = DataLoader(dataset, batch_size=batch_size, sampler=sampler, pin_memory=True)\n",
"\n",
" return sampled_loader\n",
"\n",
"def get_random_sample_train_val(train_dataset, val_dataset, batch_size, N_train_data):\n",
"\n",
" sampled_train_loader = get_random_sample_dataloader(train_dataset, batch_size, N_train_data)\n",
"\n",
" N_val_data = int(N_train_data / 9.0)\n",
" if N_val_data < 30:\n",
" N_val_data = 30\n",
" sampled_val_loader = get_random_sample_dataloader(val_dataset, batch_size, N_val_data)\n",
"\n",
" return sampled_train_loader, sampled_val_loader\n",
"\n",
"class Accuracy:\n",
" def __init__(self, task='multiclass', num_classes=10):\n",
" assert task == 'multiclass', \"Only supports `multiclass` task accuracy!\"\n",
" self.num_classes = num_classes\n",
"\n",
" def __call__(self, predicted, target):\n",
" correct = predicted.eq(target.view_as(predicted)).sum().item()\n",
" return correct / predicted.size(0)\n",
"\n",
"def save_model(model, task_name, N_train_data, epoch, train_loss, val_loss):\n",
" MODEL_PATH = Path(\"models\")\n",
" MODEL_PATH.mkdir(parents=True, exist_ok=True)\n",
"\n",
" MODEL_NAME = f\"ConvNet_{task_name}_{N_train_data}_epoch_{epoch}.pth\"\n",
" MODEL_SAVE_PATH = MODEL_PATH / MODEL_NAME\n",
"\n",
" print(f\"Saving the model: {MODEL_SAVE_PATH}\")\n",
"\n",
" checkpoint = {\n",
" 'model_state_dict': model.state_dict(),\n",
" 'train_loss': train_loss,\n",
" 'val_loss': val_loss\n",
" }\n",
"\n",
" torch.save(obj=checkpoint, f=MODEL_SAVE_PATH)\n",
"\n",
"def train(model, train_dataloader, val_dataloader, test_dataloader, cost_fn, optimizer, epochs_max, acc_flag, triplet_flag, task_name, N_train_data):\n",
" tstart = time.time()\n",
" accuracy = Accuracy(task='multiclass', num_classes=10)\n",
"\n",
" epoch = 0\n",
" val_cost_last = 100000.0\n",
" val_cost_current = 100000.0\n",
"\n",
" my_epoch = []\n",
" my_train_cost = []\n",
" my_test_cost = []\n",
"\n",
" train_losses = []\n",
" val_losses = []\n",
"\n",
" if triplet_flag:\n",
" for epoch in tqdm(range(1, epochs_max + 1), desc=\"Training epochs\", unit=\"epoch\"):\n",
" my_epoch.append(epoch)\n",
"\n",
" train_cost = 0.0\n",
" for batch_idx, (anchor_img, positive_img, negative_img) in enumerate(train_dataloader):\n",
" model.train()\n",
" anchor_img, positive_img, negative_img = anchor_img.cuda(), positive_img.cuda(), negative_img.cuda()\n",
" anchor_reconstruct = model(anchor_img)\n",
" positive_reconstruct = model(positive_img)\n",
" negative_reconstruct = model(negative_img)\n",
" cost = cost_fn(anchor_reconstruct, positive_reconstruct, negative_reconstruct)\n",
" train_cost += cost.item()\n",
" optimizer.zero_grad()\n",
" cost.backward()\n",
" optimizer.step()\n",
" train_cost /= len(train_dataloader)\n",
" train_losses.append(train_cost)\n",
" my_train_cost.append(train_cost)\n",
"\n",
" val_cost = 0.0\n",
" model.eval()\n",
" with torch.no_grad():\n",
" for batch_idx, (anchor_img, positive_img, negative_img) in enumerate(val_dataloader):\n",
" anchor_img, positive_img, negative_img = anchor_img.cuda(), positive_img.cuda(), negative_img.cuda()\n",
" anchor_reconstruct = model(anchor_img)\n",
" positive_reconstruct = model(positive_img)\n",
" negative_reconstruct = model(negative_img)\n",
" cost = cost_fn(anchor_reconstruct, positive_reconstruct, negative_reconstruct)\n",
" val_cost += cost.item()\n",
" val_cost /= len(val_dataloader)\n",
" val_cost_last = val_cost_current\n",
" val_cost_current = val_cost\n",
" val_losses.append(val_cost)\n",
"\n",
" test_cost = 0.0\n",
" model.eval()\n",
" with torch.no_grad():\n",
" for batch_idx, (anchor_img, positive_img, negative_img) in enumerate(test_dataloader):\n",
" anchor_img, positive_img, negative_img = anchor_img.cuda(), positive_img.cuda(), negative_img.cuda()\n",
" anchor_reconstruct = model(anchor_img)\n",
" positive_reconstruct = model(positive_img)\n",
" negative_reconstruct = model(negative_img)\n",
" cost = cost_fn(anchor_reconstruct, positive_reconstruct, negative_reconstruct)\n",
" test_cost += cost.item()\n",
" test_cost /= len(test_dataloader)\n",
" my_test_cost.append(test_cost)\n",
"\n",
" print(f\"Epoch: {epoch}| Train cost: {train_cost: .5f}| \" +\n",
" f\"Val cost: {val_cost: .5f}| \" +\n",
" f\"Test cost: {test_cost: .5f}|\")\n",
"\n",
" save_model(model, task_name, N_train_data, epoch, train_cost, val_cost)\n",
"\n",
" else:\n",
" for epoch in tqdm(range(1, epochs_max + 1), desc=\"Training epochs\", unit=\"epoch\"):\n",
" my_epoch.append(epoch)\n",
"\n",
" train_cost, train_acc = 0.0, 0.0\n",
" for batch_idx, (X, y) in enumerate(train_dataloader):\n",
" model.train()\n",
"\n",
" predictions = model(X)\n",
" cost = cost_fn(predictions, y)\n",
" train_cost += cost.item()\n",
" if acc_flag:\n",
" _, predicted_classes = torch.max(predictions, 1)\n",
" acc = accuracy(predicted_classes, y)\n",
" train_acc += acc\n",
" optimizer.zero_grad()\n",
" cost.backward()\n",
" optimizer.step()\n",
" train_cost /= len(train_dataloader)\n",
" if acc_flag:\n",
" train_acc /= len(train_dataloader)\n",
" train_losses.append(train_cost)\n",
" my_train_cost.append(train_cost)\n",
"\n",
" val_cost, val_acc = 0.0, 0.0\n",
" model.eval()\n",
" with torch.no_grad():\n",
" for batch_idx, (X, y) in enumerate(val_dataloader):\n",
"\n",
" predictions = model(X)\n",
" cost = cost_fn(predictions, y)\n",
" val_cost += cost.item()\n",
" if acc_flag:\n",
" _, predicted_classes = torch.max(predictions, 1)\n",
" acc = accuracy(predicted_classes, y)\n",
" val_acc += acc\n",
" val_cost /= len(val_dataloader)\n",
" val_cost_last = val_cost_current\n",
" val_cost_current = val_cost\n",
" if acc_flag:\n",
" val_acc /= len(val_dataloader)\n",
" val_losses.append(val_cost)\n",
"\n",
" test_cost, test_acc = 0.0, 0.0\n",
" model.eval()\n",
" with torch.no_grad():\n",
" for batch_idx, (X, y) in enumerate(test_dataloader):\n",
"\n",
" predictions = model(X)\n",
" cost = cost_fn(predictions, y)\n",
" test_cost += cost.item()\n",
" if acc_flag:\n",
" _, predicted_classes = torch.max(predictions, 1)\n",
" acc = accuracy(predicted_classes, y)\n",
" test_acc += acc\n",
" test_cost /= len(test_dataloader)\n",
" my_test_cost.append(test_cost)\n",
"\n",
" if acc_flag:\n",
" test_acc /= len(test_dataloader)\n",
"\n",
" if acc_flag:\n",
" print(f\"Epoch: {epoch}| Train cost: {train_cost: .5f}| Train acc: {train_acc: .5f}| \" +\n",
" f\"Val cost: {val_cost: .5f}| Val acc: {val_acc: .5f}| \" +\n",
" f\"Test cost: {test_cost: .5f}| Test acc: {test_acc: .5f}\")\n",
" else:\n",
" print(f\"Epoch: {epoch}| Train cost: {train_cost: .5f}| \" +\n",
" f\"Val cost: {val_cost: .5f}| \" +\n",
" f\"Test cost: {test_cost: .5f}|\")\n",
"\n",
" save_model(model, task_name, N_train_data, epoch, train_cost, val_cost)\n",
"\n",
" elapsed = time.time() - tstart\n",
" print('Elapsed: %s' % elapsed)\n",
"\n",
" loss_data = {'train_losses': train_losses, 'val_losses': val_losses}\n",
" torch.save(loss_data, 'loss_data.pth')\n",
"\n",
" return my_epoch, my_train_cost, val_losses, my_test_cost\n",
"\n",
"def train_transfer(model, train_dataloader, val_dataloader, test_dataloader, cost_fn, optimizer, epochs_max, acc_flag, triplet_flag, task_name, N_train_data):\n",
" tstart = time.time()\n",
" accuracy = Accuracy(task='multiclass', num_classes=10)\n",
"\n",
" epoch = 0\n",
" val_cost_last = 100000.0\n",
" val_cost_current = 100000.0\n",
"\n",
" my_epoch = []\n",
" my_train_cost = []\n",
" my_test_cost = []\n",
"\n",
" train_losses = []\n",
" val_losses = []\n",
"\n",
" if triplet_flag:\n",
" for epoch in tqdm(range(1, epochs_max + 1), desc=\"Training epochs\", unit=\"epoch\"):\n",
" my_epoch.append(epoch)\n",
"\n",
" train_cost = 0.0\n",
" for batch_idx, (anchor_img, positive_img, negative_img) in enumerate(train_dataloader):\n",
" model.train()\n",
" anchor_img, positive_img, negative_img = anchor_img.cuda(), positive_img.cuda(), negative_img.cuda()\n",
" anchor_reconstruct = model(anchor_img)\n",
" positive_reconstruct = model(positive_img)\n",
" negative_reconstruct = model(negative_img)\n",
" cost = cost_fn(anchor_reconstruct, positive_reconstruct, negative_reconstruct)\n",
" train_cost += cost.item()\n",
" optimizer.zero_grad()\n",
" cost.backward()\n",
" optimizer.step()\n",
" train_cost /= len(train_dataloader)\n",
" train_losses.append(train_cost)\n",
" my_train_cost.append(train_cost)\n",
"\n",
" val_cost = 0.0\n",
" model.eval()\n",
" with torch.no_grad():\n",
" for batch_idx, (anchor_img, positive_img, negative_img) in enumerate(val_dataloader):\n",
" anchor_img, positive_img, negative_img = anchor_img.cuda(), positive_img.cuda(), negative_img.cuda()\n",
" anchor_reconstruct = model(anchor_img)\n",
" positive_reconstruct = model(positive_img)\n",
" negative_reconstruct = model(negative_img)\n",
" cost = cost_fn(anchor_reconstruct, positive_reconstruct, negative_reconstruct)\n",
" val_cost += cost.item()\n",
" val_cost /= len(val_dataloader)\n",
" val_cost_last = val_cost_current\n",
" val_cost_current = val_cost\n",
" val_losses.append(val_cost)\n",
"\n",
" test_cost = 0.0\n",
" model.eval()\n",
" with torch.no_grad():\n",
" for batch_idx, (anchor_img, positive_img, negative_img) in enumerate(test_dataloader):\n",
" anchor_img, positive_img, negative_img = anchor_img.cuda(), positive_img.cuda(), negative_img.cuda()\n",
" anchor_reconstruct = model(anchor_img)\n",
" positive_reconstruct = model(positive_img)\n",
" negative_reconstruct = model(negative_img)\n",
" cost = cost_fn(anchor_reconstruct, positive_reconstruct, negative_reconstruct)\n",
" test_cost += cost.item()\n",
" test_cost /= len(test_dataloader)\n",
" my_test_cost.append(test_cost)\n",
"\n",
" print(f\"Epoch: {epoch}| Train cost: {train_cost: .5f}| \" +\n",
" f\"Val cost: {val_cost: .5f}| \" +\n",
" f\"Test cost: {test_cost: .5f}|\")\n",
"\n",
" save_model(model, task_name, N_train_data, epoch, train_cost, val_cost)\n",
"\n",
" else:\n",
" for epoch in tqdm(range(1, epochs_max + 1), desc=\"Training epochs\", unit=\"epoch\"):\n",
" my_epoch.append(epoch)\n",
"\n",
" train_cost, train_acc = 0.0, 0.0\n",
" for batch_idx, (X, y) in enumerate(train_dataloader):\n",
" model.train()\n",
"\n",
" predictions = model(X)\n",
" cost = cost_fn(predictions, y)\n",
" train_cost += cost.item()\n",
" if acc_flag:\n",
" _, predicted_classes = torch.max(predictions, 1)\n",
" acc = accuracy(predicted_classes, y)\n",
" train_acc += acc\n",
" optimizer.zero_grad()\n",
" cost.backward()\n",
" optimizer.step()\n",
" train_cost /= len(train_dataloader)\n",
" if acc_flag:\n",
" train_acc /= len(train_dataloader)\n",
" train_losses.append(train_cost)\n",
" my_train_cost.append(train_cost)\n",
"\n",
" val_cost, val_acc = 0.0, 0.0\n",
" model.eval()\n",
" with torch.no_grad():\n",
" for batch_idx, (X, y) in enumerate(val_dataloader):\n",
"\n",
" predictions = model(X)\n",
" cost = cost_fn(predictions, y)\n",
" val_cost += cost.item()\n",
" if acc_flag:\n",
" _, predicted_classes = torch.max(predictions, 1)\n",
" acc = accuracy(predicted_classes, y)\n",
" val_acc += acc\n",
" val_cost /= len(val_dataloader)\n",
" val_cost_last = val_cost_current\n",
" val_cost_current = val_cost\n",
" if acc_flag:\n",
" val_acc /= len(val_dataloader)\n",
" val_losses.append(val_cost)\n",
"\n",
" test_cost, test_acc = 0.0, 0.0\n",
" model.eval()\n",
" with torch.no_grad():\n",
" for batch_idx, (X, y) in enumerate(test_dataloader):\n",
"\n",
" predictions = model(X)\n",
" cost = cost_fn(predictions, y)\n",
" test_cost += cost.item()\n",
" if acc_flag:\n",
" _, predicted_classes = torch.max(predictions, 1)\n",
" acc = accuracy(predicted_classes, y)\n",
" test_acc += acc\n",
" test_cost /= len(test_dataloader)\n",
" my_test_cost.append(test_cost)\n",
"\n",
" if acc_flag:\n",
" test_acc /= len(test_dataloader)\n",
"\n",
" if acc_flag:\n",
" print(f\"Epoch: {epoch}| Train cost: {train_cost: .5f}| Train acc: {train_acc: .5f}| \" +\n",
" f\"Val cost: {val_cost: .5f}| Val acc: {val_acc: .5f}| \" +\n",
" f\"Test cost: {test_cost: .5f}| Test acc: {test_acc: .5f}\")\n",
" else:\n",
" print(f\"Epoch: {epoch}| Train cost: {train_cost: .5f}| \" +\n",
" f\"Val cost: {val_cost: .5f}| \" +\n",
" f\"Test cost: {test_cost: .5f}|\")\n",
"\n",
" save_model(model, task_name, N_train_data, epoch, train_cost, val_cost)\n",
"\n",
" elapsed = time.time() - tstart\n",
" print('Elapsed: %s' % elapsed)\n",
"\n",
" loss_data = {'train_losses': train_losses, 'val_losses': val_losses}\n",
" torch.save(loss_data, 'loss_data.pth')\n",
"\n",
" return my_epoch, my_train_cost, val_losses, my_test_cost"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Plotting functions\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Plotting functions\n",
"\n",
"def plot_reconstructions(original_images, reconstructed_images, N_train_data, epochs):\n",
" fig = plt.figure(figsize=(10, 5))\n",
" rows, cols = 2, 6\n",
" image_count = 0\n",
" for i in range(1, rows * cols, 2):\n",
" fig.add_subplot(rows, cols, i)\n",
" plt.imshow(np.squeeze(original_images[image_count]), cmap='gray')\n",
" plt.title(f\"Original {image_count+1}\", fontsize=8)\n",
" plt.axis('off')\n",
"\n",
" fig.add_subplot(rows, cols, i + 1)\n",
" plt.imshow(np.squeeze(reconstructed_images[image_count]), cmap='gray')\n",
" plt.title(f\"Reconstructed {image_count+1}\", fontsize=8)\n",
" plt.axis('off')\n",
"\n",
" image_count += 1\n",
" fig.suptitle(f\"Training for {epochs} epochs with {N_train_data} points\")\n",
" plt.show()\n",
"\n",
"def cost_classification(output, target):\n",
" criterion = nn.CrossEntropyLoss()\n",
" target = target.to(torch.int64)\n",
" cost = criterion(output, target)\n",
" return cost\n",
"\n",
"def cost_regression(output, target):\n",
" criterion = nn.MSELoss()\n",
" cost = criterion(output, target)\n",
" return cost\n",
"\n",
"def cost_autoencoder(output, target):\n",
" criterion = nn.MSELoss()\n",
" output_flat = output.view(output.size(0), -1)\n",
" target_flat = target.view(target.size(0), -1)\n",
" cost = criterion(output_flat, target_flat)\n",
" return cost"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Data retrieval\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Data retrieval\n",
"\n",
"import os\n",
"import requests\n",
"import hashlib\n",
"import zipfile\n",
"\n",
"def download_file(fname, url, expected_md5):\n",
" \"\"\"\n",
" Downloads a file from the given URL and saves it locally.\n",
" \"\"\"\n",
" if not os.path.isfile(fname):\n",
" try:\n",
" r = requests.get(url)\n",
" except requests.ConnectionError:\n",
" print(\"!!! Failed to download data !!!\")\n",
" return\n",
" if r.status_code != requests.codes.ok:\n",
" print(\"!!! Failed to download data !!!\")\n",
" return\n",
" if hashlib.md5(r.content).hexdigest() != expected_md5:\n",
" print(\"!!! Data download appears corrupted !!!\")\n",
" return\n",
" with open(fname, \"wb\") as fid:\n",
" fid.write(r.content)\n",
"\n",
"def extract_zip(zip_fname):\n",
" \"\"\"\n",
" Extracts a ZIP file to the current directory.\n",
" \"\"\"\n",
" with zipfile.ZipFile(zip_fname, 'r') as zip_ref:\n",
" zip_ref.extractall(\".\")\n",
"\n",
"# Details for the zip files to be downloaded and extracted\n",
"zip_files = [\n",
" {\n",
" \"fname\": \"models.zip\",\n",
" \"url\": \"https://osf.io/dms2n/download\",\n",
" \"expected_md5\": \"2c88be8804ae546da6c6985226bc98e7\"\n",
" }\n",
"]\n",
"\n",
"# Process zip files: download and extract\n",
"for zip_file in zip_files:\n",
" download_file(zip_file[\"fname\"], zip_file[\"url\"], zip_file[\"expected_md5\"])\n",
" extract_zip(zip_file[\"fname\"])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set random seed\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"executionInfo": {
"elapsed": 2,
"status": "ok",
"timestamp": 1718208800782,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Set random seed\n",
"\n",
"def set_seed(seed=None, seed_torch=True):\n",
" if seed is None:\n",
" seed = np.random.choice(2 ** 32)\n",
" random.seed(seed)\n",
" np.random.seed(seed)\n",
" if seed_torch:\n",
" torch.manual_seed(seed)\n",
" torch.cuda.manual_seed_all(seed)\n",
" torch.cuda.manual_seed(seed)\n",
" torch.backends.cudnn.benchmark = False\n",
" torch.backends.cudnn.deterministic = True\n",
"\n",
"set_seed(seed = 42)"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Section 1: Formalize different tasks as cost functions and train the same architecture to achieve these tasks"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Tutorial Video\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"remove-input"
]
},
"outputs": [],
"source": [
"# @title Tutorial Video\n",
"\n",
"from ipywidgets import widgets\n",
"from IPython.display import YouTubeVideo\n",
"from IPython.display import IFrame\n",
"from IPython.display import display\n",
"\n",
"\n",
"class PlayVideo(IFrame):\n",
" def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
" self.id = id\n",
" if source == 'Bilibili':\n",
" src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
" elif source == 'Osf':\n",
" src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
" super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
"\n",
"\n",
"def display_videos(video_ids, W=400, H=300, fs=1):\n",
" tab_contents = []\n",
" for i, video_id in enumerate(video_ids):\n",
" out = widgets.Output()\n",
" with out:\n",
" if video_ids[i][0] == 'Youtube':\n",
" video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
" height=H, fs=fs, rel=0)\n",
" print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
" else:\n",
" video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
" height=H, fs=fs, autoplay=False)\n",
" if video_ids[i][0] == 'Bilibili':\n",
" print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
" elif video_ids[i][0] == 'Osf':\n",
" print(f'Video available at https://osf.io/{video.id}')\n",
" display(video)\n",
" tab_contents.append(out)\n",
" return tab_contents\n",
"\n",
"\n",
"video_ids = [('Youtube', 'HxovwyQTi0o'), ('Bilibili', 'BV1Ri421e7mR')]\n",
"tab_contents = display_videos(video_ids, W=730, H=410)\n",
"tabs = widgets.Tab()\n",
"tabs.children = tab_contents\n",
"for i in range(len(tab_contents)):\n",
" tabs.set_title(i, video_ids[i][0])\n",
"display(tabs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_Tutorial_Video\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"## Review of CNNs"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"In this tutorial, we will use a simple Convolutional Neural Network (CNN) architecture and a subset of the MNIST dataset, which consists of images of handwritten digits. We will use the same base architecture and training datasets to accomplish different tasks by creating various output layers and training them with different objective functions.\n",
"\n",
"A Convolutional Neural Network (CNN) is a deep learning algorithm designed to process input images, assign importance (learnable weights and biases) to various features within the images, and distinguish between different objects. Unlike pure feedforward neural networks that flatten the input into a one-dimensional array, CNNs preserve the spatial hierarchy of the input images. This makes them particularly effective for processing data with a grid-like structure, such as images. A CNN architecture is engineered to automatically and adaptively learn spatial hierarchies of features, ranging from low-level to high-level patterns.\n",
"\n",
"The core components of CNNs are convolutional layers, pooling layers, and fully connected layers. A schematic of a CNN is shown below.\n",
"\n",
"- Convolutional layers apply convolution operations to the input and pass the results to the next layer. This enables the network to be deep with fewer parameters, enhancing the learning of feature hierarchies.\n",
"- Pooling layers reduce the dimensions of the data by combining the outputs of neuron clusters at one layer into a single neuron in the next layer.\n",
"- Fully connected layers connect every neuron in one layer to every neuron in the next layer and are typically used at the end of the network to make class predictions.\n",
"\n",
"Due to their ability to capture the spatial and temporal dependencies in images through the application of relevant filters, CNNs are extensively used in image and video recognition, recommender systems, image classification, medical image analysis, and natural language processing."
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"Here we'll replicate the structure of LeNet up to the fully connected `fc2` layer. The latent representation in this layer is 84 dimensional. We'll add various decoder heads and bottleneck layers to this core, train on different objectives, and see how the representations change."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 3,
"status": "ok",
"timestamp": 1718208800783,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
}
},
"outputs": [],
"source": [
"LATENT_DIM = 84\n",
"\n",
"class ConvNeuralNet(nn.Module):\n",
"\n",
" def __init__(self):\n",
" super(ConvNeuralNet, self).__init__()\n",
" self.conv1 = nn.Conv2d(1, 6, 5)\n",
" self.conv2 = nn.Conv2d(6, 16, 5)\n",
" self.fc1 = nn.Linear(16 * 5 * 5, 120)\n",
" self.fc2 = nn.Linear(120, LATENT_DIM)\n",
"\n",
" def forward(self, x):\n",
" x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))\n",
" x = F.max_pool2d(F.relu(self.conv2(x)), 2)\n",
" x = torch.flatten(x, 1)\n",
" x = F.relu(self.fc1(x))\n",
" x = F.relu(self.fc2(x))\n",
" return x"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"### Preparing the data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 4069,
"status": "ok",
"timestamp": 1718208804850,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
}
},
"outputs": [],
"source": [
"with contextlib.redirect_stdout(io.StringIO()):\n",
"\n",
" # Define a transformation pipeline for the MNIST dataset\n",
" mnist_transform = transforms.Compose([\n",
" transforms.Resize((32, 32)), # Resize the images to 32x32 pixels\n",
" transforms.ToTensor(), # Convert images to PyTorch tensors\n",
" transforms.Normalize(mean=(0.1307,), std=(0.3081,)) # Normalize the images with mean and standard deviation\n",
" ])\n",
"\n",
" # Load the MNIST training dataset with transformations applied\n",
" train_val_dataset = torchvision.datasets.MNIST(\n",
" root='./data', # Directory to store/load the data\n",
" train=True, # Specify to load the training set\n",
" transform=mnist_transform, # Apply the transformation pipeline defined earlier\n",
" download=True # Download the dataset if it's not already present\n",
" )\n",
"\n",
" # Load the MNIST test dataset with transformations applied\n",
" test_dataset = torchvision.datasets.MNIST(\n",
" root='./data', # Directory to store/load the data\n",
" train=False, # Specify to load the test set\n",
" transform=mnist_transform, # Apply the transformation pipeline defined earlier\n",
" download=True # Download the dataset if it's not already present\n",
" )\n",
"\n",
" # Split the training dataset into training and validation sets\n",
" train_size = int(0.9 * len(train_val_dataset)) # Calculate the size of the training set (90% of the original)\n",
" val_size = len(train_val_dataset) - train_size # Calculate the size of the validation set (remaining 10%)\n",
" train_dataset, val_dataset = torch.utils.data.random_split(\n",
" dataset=train_val_dataset, # Original training dataset to split\n",
" lengths=[train_size, val_size] # Lengths of the resulting splits\n",
" )\n",
"\n",
" # Split the test dataset into two halves: original and transfer sets\n",
" test_size_original = int(0.5 * len(test_dataset)) # Calculate the size of the original test set (50% of the original)\n",
" test_size_transfer = len(test_dataset) - test_size_original # Calculate the size of the transfer test set (remaining 50%)\n",
" test_dataset_original, test_dataset_transfer = torch.utils.data.random_split(\n",
" dataset=test_dataset, # Original test dataset to split\n",
" lengths=[test_size_original, test_size_transfer] # Lengths of the resulting splits\n",
" )\n",
"\n",
" # Display the training dataset object\n",
" train_dataset"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"### Visualizing some samples from the dataset"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 3195,
"status": "ok",
"timestamp": 1718208808042,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
}
},
"outputs": [],
"source": [
"# Retrieve the class names (labels) from the training dataset\n",
"class_names = train_dataset.dataset.classes\n",
"\n",
"# Set a manual seed for PyTorch to ensure reproducibility of results\n",
"torch.manual_seed(10)\n",
"\n",
"# Create a figure for displaying the images\n",
"fig = plt.figure(figsize=(8, 4)) # Set the figure size to 8x4 inches\n",
"rows, cols = 2, 5 # Define the number of rows and columns for the subplot grid\n",
"\n",
"# Define the mean and standard deviation used for normalization\n",
"mean = 0.1307\n",
"std = 0.3081\n",
"\n",
"# Loop to display a grid of sample images from the training dataset\n",
"for i in range(1, (rows*cols) + 1):\n",
" rand_ind = torch.randint(0, len(train_dataset), size=[1]).item()\n",
" img, label = train_dataset[rand_ind]\n",
" img_tensor = img * std + mean\n",
" img_tensor = img_tensor / 2 + 0.5\n",
" img_np = np.squeeze(img_tensor.numpy())\n",
" fig.add_subplot(rows, cols, i)\n",
" plt.imshow(img_np, cmap='gray')\n",
" plt.title(f\"{class_names[label]}\")\n",
" plt.axis(False)\n",
" plt.tight_layout()"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"### Preparing the data loaders"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 7,
"status": "ok",
"timestamp": 1718208808042,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
}
},
"outputs": [],
"source": [
"batch_size = 32\n",
"\n",
"# Create a DataLoader for the training dataset\n",
"train_loader = torch.utils.data.DataLoader(\n",
" dataset=train_dataset, # The dataset to load data from\n",
" batch_size=batch_size, # The number of samples per batch\n",
" shuffle=True # Shuffle the data at every epoch\n",
")\n",
"\n",
"# Create a DataLoader for the validation dataset\n",
"val_loader = torch.utils.data.DataLoader(\n",
" dataset=val_dataset, # The dataset to load data from\n",
" batch_size=batch_size, # The number of samples per batch\n",
" shuffle=True # Shuffle the data at every epoch\n",
")\n",
"\n",
"# Create a DataLoader for the original test dataset\n",
"test_loader_original = torch.utils.data.DataLoader(\n",
" dataset=test_dataset_original, # The dataset to load data from\n",
" batch_size=batch_size, # The number of samples per batch\n",
" shuffle=True # Shuffle the data at every epoch\n",
")\n",
"\n",
"# Create a DataLoader for the transfer test dataset\n",
"test_loader_transfer = torch.utils.data.DataLoader(\n",
" dataset=test_dataset_transfer, # The dataset to load data from\n",
" batch_size=batch_size, # The number of samples per batch\n",
" shuffle=True # Shuffle the data at every epoch\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 6,
"status": "ok",
"timestamp": 1718208808042,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
}
},
"outputs": [],
"source": [
"# Defining epochs and batch size\n",
"epochs_max = 20\n",
"batch_size = 32"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"## Section 1.1: Classification\n",
"\n",
"In this task, we'll train the CNN to classify digits into one of 10 different classes."
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"### Code exercise 1: Cost Function"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"#### Training\n",
"\n",
"In this task, we aim to classify hand-written digits from images, where each digit ranges from 0 to 9. To achieve this, we add a classification head to the CNN.\n",
"\n",
"We introduce an output layer Y with 10 nodes, each representing one of the possible digits. The output layer uses the softmax activation function to produce probability scores for each class:\n",
"\n",
"$$p(y=j|x) = \\frac{e^{\\mu_j}}{\\sum_{k=1}^{10} e^{\\mu_k}}$$\n",
"\n",
"where $\\mu_j = \\text{CNN}_j(x)$ is the output of the $j^{th}$ node in the output layer."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 6,
"status": "ok",
"timestamp": 1718208808042,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
}
},
"outputs": [],
"source": [
"class ClassificationOutputLayer(nn.Module):\n",
"\n",
" def __init__(self):\n",
" super(ClassificationOutputLayer, self).__init__()\n",
" self.fc = nn.Linear(LATENT_DIM, 10)\n",
"\n",
" def forward(self, x):\n",
" x = F.softmax(self.fc(x), dim=1)\n",
"\n",
" return x"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"Thus, the network outputs a probability distribution over the 10 possible classes for each input image.\n",
"\n",
"#### Cost Function\n",
"\n",
"To train the network effectively, we implement a cost function based on the cross-entropy. The loss function is:\n",
"\n",
"$$\\mathcal{L} = -\\frac{1}{N} \\sum_{i=1}^{N} \\sum_{j=1}^{10} y_{ij} \\log(p(y_{ij}|x_i))$$\n",
"\n",
"where:\n",
"\n",
"* $N$ is the number of samples\n",
"* $y_{ij}$ is the true label for the $i^{th}$ sample, encoded as a one-hot vector\n",
"* $p(y_{ij}|x_i)$ is the predicted probability of the $j^{th}$ class for the $i^{th}$ sample."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 7,
"status": "ok",
"timestamp": 1718208808043,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
}
},
"outputs": [],
"source": [
"############################################################\n",
"raise NotImplementedError(\"Student exercise: Calculate the loss using the criterion\")\n",
"############################################################\n",
"\n",
"def cost_classification(output, target):\n",
" criterion = nn.CrossEntropyLoss()\n",
" target = target.to(torch.int64)\n",
" cost = ...\n",
" return cost"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"execution": {},
"executionInfo": {
"elapsed": 6,
"status": "ok",
"timestamp": 1718208808043,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
}
},
"source": [
"[*Click for solution*](https://github.com/neuromatch/NeuroAI_Course/tree/main/tutorials/W1D2_ComparingTasks/solutions/W1D2_Tutorial1_Solution_53f79ab6.py)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"By implementing this cost function, the model is trained to minimize the difference between the predicted probability distributions and the actual one-hot encoded targets."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_Cost_Function\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"### Defining the model\n",
"\n",
"We define a CNN model with a classification head to classify the digits."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 6,
"status": "ok",
"timestamp": 1718208808043,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
}
},
"outputs": [],
"source": [
"class ClassificationConvNet(nn.Module):\n",
"\n",
" def __init__(self, ConvNet, Output):\n",
" super(ClassificationConvNet, self).__init__()\n",
" self.ConvNet = ConvNet\n",
" self.Output = Output\n",
"\n",
" def forward(self, x):\n",
" conv_intermediate = self.ConvNet(x)\n",
" output = self.Output(conv_intermediate)\n",
"\n",
" return output"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"Here, ConvNet represents the convolutional part of the network responsible for feature extraction, while Output is the classification output layer described earlier. \n",
"\n",
"### Training on varying datapoints\n",
"\n",
"We conduct training experiments with this model on varying dataset sizes (10, 100, 1000, 10000). This approach helps us understand how the model's performance scales with the amount of training data available (sample complexity). Larger datasets typically improve the model's ability to generalize to the test set."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 160722,
"status": "ok",
"timestamp": 1718208968759,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
}
},
"outputs": [],
"source": [
"set_seed(42)\n",
"\n",
"# Usage example for classification task\n",
"training_points = np.array([10, 100, 1000, 10000])\n",
"task_name_classification = \"classification\"\n",
"acc_flag_classification = True\n",
"triplet_flag_classification = False\n",
"epochs_max_classification = 10\n",
"\n",
"my_epoch_Classification = []\n",
"my_train_cost_Classification = [] # Add a list to store training costs\n",
"my_val_cost_Classification = [] # Add a list to store val costs\n",
"my_test_cost_Classification = [] # Add a list to store test costs\n",
"conf_matrices = [] # List to store confusion matrices\n",
"\n",
"for N_train_data in training_points:\n",
" model = ClassificationConvNet(ConvNeuralNet(), ClassificationOutputLayer()).to(device)\n",
"\n",
" sampled_train_loader, sampled_val_loader = get_random_sample_train_val(train_dataset, val_dataset, batch_size, N_train_data)\n",
" optimizer = torch.optim.Adam(params=model.parameters(), lr=0.001)\n",
"\n",
" # Update the train function call to get training costs\n",
" my_epoch, my_train_cost, my_val_cost, my_test_cost = train(\n",
" model,\n",
" sampled_train_loader,\n",
" sampled_val_loader,\n",
" test_loader_original,\n",
" cost_classification,\n",
" optimizer,\n",
" epochs_max_classification,\n",
" acc_flag_classification,\n",
" triplet_flag_classification,\n",
" task_name_classification,\n",
" N_train_data\n",
" )\n",
"\n",
" my_epoch_Classification.append(my_epoch)\n",
" my_train_cost_Classification.append(my_train_cost)\n",
" my_val_cost_Classification.append(my_val_cost)\n",
" my_test_cost_Classification.append(my_test_cost)\n",
"\n",
" # Compute predictions and confusion matrix for the validation set\n",
" all_preds = []\n",
" all_labels = []\n",
" model.eval()\n",
" with torch.no_grad():\n",
" for batch_idx, (X, y) in enumerate(sampled_val_loader):\n",
" X, y = X.to(device), y.to(device)\n",
" predictions = model(X)\n",
" _, predicted_classes = torch.max(predictions, 1)\n",
" all_preds.extend(predicted_classes.cpu().numpy())\n",
" all_labels.extend(y.cpu().numpy())\n",
"\n",
" # Compute confusion matrix\n",
" conf_matrix = confusion_matrix(all_labels, all_preds)\n",
" conf_matrices.append((N_train_data, conf_matrix)) # Store the confusion matrix with the number of training points"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"### Test performance\n",
"\n",
"The test performance of the model is evaluated by plotting the test cost across training epochs for different sample sizes. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 1975,
"status": "ok",
"timestamp": 1718210554892,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
}
},
"outputs": [],
"source": [
"# Create a single plot for all training costs with a logarithmic scale\n",
"with plt.xkcd():\n",
" plt.figure(figsize=(8, 6)) # Set the figure size\n",
"\n",
" for i, n in enumerate(training_points):\n",
" epochs = my_epoch_Classification[i]\n",
" test_cost = my_test_cost_Classification[i]\n",
" plt.plot(epochs, test_cost, marker='o', linestyle='-', label=f'{n} training points')\n",
"\n",
" plt.xlabel('Epochs')\n",
" plt.ylabel('Test cost (log scale)')\n",
" plt.title('Test cost over epochs for different training points (classification)')\n",
" plt.yscale('log')\n",
" plt.legend()\n",
" plt.grid(True)\n",
" plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"### Discussion point 1"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"Now that you have trained your network with different sample sizes, plot the performance on the test dataset for each network across epochs. How does sample size interact with number of training epochs?"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"execution": {}
},
"source": [
"[*Click for solution*](https://github.com/neuromatch/NeuroAI_Course/tree/main/tutorials/W1D2_ComparingTasks/solutions/W1D2_Tutorial1_Solution_17513eb4.py)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_Discussion_Point_1\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"## Section 1.2: Regression\n",
"\n",
"After examining the use of network architecture for digit classification, we now transition to a regression task using the same architecture. We are going to pick a very simple output task that is not directly related to the classification or the identification of the digits. In this task, given an image of a handwritten digit, our objective is to predict the number of pixels that are 'ON' (i.e., pixel values greater than 0.5). This can be achieved by the network performing operations similar to the simple addition of pixels (though it might still take time and data for the network to find this simple solution). Thus, we don't expect that the network will learn rich representations that are useful for other tasks, such as classification.\n",
"\n",
"### Task objective\n",
"\n",
"This regression task, while relatively simple as it involves summing pixel values, serves to illustrate how well a Convolutional Neural Network (CNN) can adapt to learning a continuous output instead of discrete class labels. Note that we'll regress the normalized images.\n",
"\n",
"### Output layer\n",
"\n",
"The output layer for this regression task consists of a single node that predicts the number of 'ON' pixels in the image. This necessitates a different cost function compared to the classification task."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 4,
"status": "ok",
"timestamp": 1718208969924,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
}
},
"outputs": [],
"source": [
"class RegressionOutputLayer(nn.Module):\n",
"\n",
" def __init__(self):\n",
" super(RegressionOutputLayer, self).__init__()\n",
" self.fc = nn.Linear(LATENT_DIM, 1)\n",
"\n",
" def forward(self, x):\n",
" x = self.fc(x)\n",
" return x"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"Here, the `RegressionOutputLayer` outputs a single continuous value.\n",
"\n",
"### Code exercise 2: Cost function\n",
"\n",
"Here we implement the mean squared error loss, which measures the average squared difference between the predicted and actual values:\n",
"\n",
"$$\\mathcal{L} = \\frac{1}{N} \\sum_{i=1}^{N} (y_i - \\mu_i)^2$$\n",
"\n",
"where:\n",
"\n",
"- $N$ is the number of samples\n",
"- $y_i$ is the true label for the $i^{th}$ sample, the number of on pixels\n",
"- $\\mu_i = \\text{CNN}(x_i)$ is the output of the model for the $i^{th}$ sample"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 4,
"status": "ok",
"timestamp": 1718208969924,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
}
},
"outputs": [],
"source": [
"############################################################\n",
"# Hint for criterion: The criterion used for regression tasks is designed\n",
"# to minimize the average squared difference between predicted and actual values.\n",
"# Hint for cost: To compute the cost, apply the criterion function to\n",
"# the predicted output and the actual target values, which will return the mean squared error loss.\n",
"raise NotImplementedError(\"Student exercise\")\n",
"############################################################\n",
"\n",
"def cost_regression(output, target):\n",
" criterion = ...\n",
" cost = ...\n",
" return cost"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"execution": {},
"executionInfo": {
"elapsed": 4,
"status": "ok",
"timestamp": 1718208969925,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
}
},
"source": [
"[*Click for solution*](https://github.com/neuromatch/NeuroAI_Course/tree/main/tutorials/W1D2_ComparingTasks/solutions/W1D2_Tutorial1_Solution_5ddb1f2c.py)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_Cost_Function_2\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"This cost function computes the MSE loss between the predicted number of 'ON' pixels and the actual count, guiding the model to make accurate continuous predictions.\n",
"\n",
"### Training\n",
"\n",
"We train the network on varying dataset sizes (10, 100, 1000, 10000) to observe the impact of sample size on the model's performance."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 183,
"status": "ok",
"timestamp": 1718208970104,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
}
},
"outputs": [],
"source": [
"class RegressionConvNet(nn.Module):\n",
"\n",
" def __init__(self, ConvNet, Output):\n",
" super(RegressionConvNet, self).__init__()\n",
" self.ConvNet = ConvNet\n",
" self.Output = Output\n",
"\n",
" def forward(self, x):\n",
" conv_intermediate = self.ConvNet(x)\n",
" output = self.Output(conv_intermediate)\n",
"\n",
" return output"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"The RegressionConvNet integrates the convolutional feature extraction network with the regression output layer.\n",
"\n",
"### Dataset preparation\n",
"\n",
"We adapt the MNIST dataset for the regression task by computing the number of 'ON' pixels for each image."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 6,
"status": "ok",
"timestamp": 1718208970104,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
}
},
"outputs": [],
"source": [
"class RegressionMNIST(torch.utils.data.Dataset):\n",
" def __init__(self, mnist_dataset):\n",
" self.dataset = mnist_dataset.dataset\n",
"\n",
" def __getitem__(self, index):\n",
" X, _ = self.dataset[index]\n",
" updated_label = torch.sum(X > 0.0).float() / X.shape[-1] ** 2 - 0.1307\n",
" return X, updated_label\n",
"\n",
" def __len__(self):\n",
" return len(self.dataset)"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"This custom dataset class transforms the images and computes the target values required for regression.\n",
"\n",
"### Model training and evaluation\n",
"\n",
"We initialize datasets and data loaders for the regression task, and define functions to evaluate models across different sample sizes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 199703,
"status": "ok",
"timestamp": 1718209169802,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
}
},
"outputs": [],
"source": [
"set_seed(42)\n",
"\n",
"training_points = np.array([10, 100, 1000, 10000])\n",
"task_name_regression = \"regression\"\n",
"acc_flag = False\n",
"triplet_flag = False\n",
"epochs_max_regression = 10\n",
"\n",
"my_epoch_Regression = []\n",
"my_train_cost_Regression = []\n",
"my_val_cost_Regression = []\n",
"my_test_cost_Regression = []\n",
"\n",
"train_dataset_regression = RegressionMNIST(train_dataset)\n",
"val_dataset_regression = RegressionMNIST(val_dataset)\n",
"test_dataset_original_regression = RegressionMNIST(test_dataset_original)\n",
"test_loader_original_regression = torch.utils.data.DataLoader(dataset = test_dataset_original_regression,\n",
" batch_size = batch_size,\n",
" shuffle = True)\n",
"\n",
"for N_train_data in training_points:\n",
" model = RegressionConvNet(ConvNeuralNet(), RegressionOutputLayer()).to(device)\n",
"\n",
" sampled_train_loader, sampled_val_loader = get_random_sample_train_val(train_dataset_regression, val_dataset_regression, batch_size, N_train_data)\n",
" optimizer = torch.optim.Adam(params=model.parameters(), lr=0.001)\n",
"\n",
" my_epoch, my_train_cost, my_val_cost, my_test_cost = train(model, sampled_train_loader, sampled_val_loader, test_loader_original_regression, cost_regression, optimizer, epochs_max_regression, acc_flag, triplet_flag, task_name_regression, N_train_data)\n",
" my_epoch_Regression.append(my_epoch)\n",
" my_train_cost_Regression.append(my_train_cost) # Append the training costs\n",
" my_val_cost_Regression.append(my_val_cost) # Append the val costs\n",
" my_test_cost_Regression.append(my_test_cost) # Append the test costs"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"### Discussion point 2\n",
"\n",
"Now that you have trained your network with different sample sizes, plot the test performance for each network across epochs. How does sample size interact with the number of training epochs?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 2488,
"status": "ok",
"timestamp": 1718210592452,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
}
},
"outputs": [],
"source": [
"# Create a single plot for all test costs with a logarithmic scale\n",
"\n",
"with plt.xkcd():\n",
" plt.figure(figsize=(8, 6)) #Set the figure size\n",
"\n",
" for i, n in enumerate(training_points):\n",
" epochs = my_epoch_Regression[i]\n",
" test_cost = my_test_cost_Regression[i]\n",
" plt.plot(epochs, test_cost, marker='o', linestyle='-', label=f'{n} training points')\n",
"\n",
" plt.xlabel('Epochs')\n",
" plt.ylabel('Test cost (log scale)')\n",
" plt.title('Test cost over epochs for different training points (regression)')\n",
" plt.yscale('log')\n",
" plt.legend()\n",
" plt.grid(True)\n",
" plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"execution": {}
},
"source": [
"[*Click for solution*](https://github.com/neuromatch/NeuroAI_Course/tree/main/tutorials/W1D2_ComparingTasks/solutions/W1D2_Tutorial1_Solution_eb15e56d.py)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_Discussion_Point_2\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"## Section 1.3: Auto-encoder\n",
"\n",
"Now, we extend our network architecture to an unsupervised learning task. Specifically, we aim to develop an autoencoder capable of compressing an image of a handwritten digit into a lower-dimensional representation of size $M$ and then reconstructing the original image with minimal error.\n",
"\n",
"### Autoencoder architecture\n",
"\n",
"An autoencoder consists of three main components: an encoder, a bottleneck layer, and a decoder. The encoder compresses the input into a smaller representation, the bottleneck layer holds this compressed representation, and the decoder reconstructs the original image from this representation."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 6,
"status": "ok",
"timestamp": 1718209171272,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
}
},
"outputs": [],
"source": [
"class Autoencoder(nn.Module):\n",
"\n",
" def __init__(self, encoder, bottleneck, decoder):\n",
" super(Autoencoder, self).__init__()\n",
" self.encoder = encoder\n",
" self.bottleneck = bottleneck\n",
" self.decoder = decoder\n",
"\n",
" def forward(self, x):\n",
" encoded = self.encoder(x)\n",
" bottlenecked = self.bottleneck(encoded)\n",
" decoded = self.decoder(bottlenecked)\n",
" return decoded"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"In our architecture:\n",
"\n",
"* The encoder will be a CNN\n",
"* The bottleneck layer will be a fully connected layer of size $M$.\n",
"* The decoder layer will be a deconvolutional neural network, which does the operations of a CNN in reverse: it goes from a dense representation to a low-resolution image, and then upsamples that image in subsequence layers.\n",
"\n",
"### Code exercise 3: Cost Function\n",
"\n",
"We'll use Mean Squared Error (MSE) loss for the autoencoder. This loss function measures the average squared difference between the original and reconstructed images, guiding the network to minimize the reconstruction error."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 6,
"status": "ok",
"timestamp": 1718209171272,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
}
},
"outputs": [],
"source": [
"############################################################\n",
"# Hint for output_flat: To flatten the output tensor for comparison, reshape it to\n",
"# have a size of (batch_size, -1) where batch_size is the number of samples.\n",
"# Hint for target_flat: Similarly, flatten the target tensor to match the shape\n",
"# of the flattened output tensor, ensuring it has a size of (batch_size, -1).\n",
"raise NotImplementedError(\"Student exercise\")\n",
"############################################################\n",
"\n",
"def cost_autoencoder(output, target):\n",
" criterion = nn.MSELoss()\n",
" output_flat = ...\n",
" target_flat = ...\n",
" cost = criterion(output_flat, target_flat)\n",
" return cost"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"execution": {},
"executionInfo": {
"elapsed": 5,
"status": "ok",
"timestamp": 1718209171272,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
}
},
"source": [
"[*Click for solution*](https://github.com/neuromatch/NeuroAI_Course/tree/main/tutorials/W1D2_ComparingTasks/solutions/W1D2_Tutorial1_Solution_48232671.py)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_Cost_Function_3\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"### Dataset\n",
"\n",
"This custom dataset class prepares the MNIST images for the autoencoder task, applying necessary transformations and using the images themselves as targets for reconstruction."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 5,
"status": "ok",
"timestamp": 1718209171272,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
}
},
"outputs": [],
"source": [
"class AutoencoderMNIST(torch.utils.data.Dataset):\n",
" def __init__(self, mnist_dataset):\n",
" self.dataset = mnist_dataset\n",
"\n",
" def __getitem__(self, index):\n",
" X, y = self.dataset[index]\n",
" return X, X\n",
"\n",
" def __len__(self):\n",
" return len(self.dataset)"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"### Model training and evaluation\n",
"\n",
"We train separate autoencoder networks on different dataset sizes (10, 100, 1000, 10000) to analyze how the amount of data influences the model's performance. The training continues until the validation performance ceases to improve, and test performance is recorded at each epoch.\n",
"\n",
"Note: we also plot here some of the original validation images and how they were reconstructed by each network after 10 iterations."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 168488,
"status": "ok",
"timestamp": 1718209378567,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
}
},
"outputs": [],
"source": [
"set_seed(42)\n",
"\n",
"# Define constants for autoencoder task\n",
"training_points = np.array([10, 100, 1000, 10000])\n",
"task_name_autoencoder = \"autoencoder\"\n",
"\n",
"# Size of the bottleneck. We'll keep this consistent across experiments.\n",
"M = 16\n",
"acc_flag_autoencoder = False\n",
"triplet_flag_autoencoder = False\n",
"epochs_max_autoencoder = 10\n",
"\n",
"train_dataset_autoencoder = AutoencoderMNIST(train_dataset)\n",
"val_dataset_autoencoder = AutoencoderMNIST(val_dataset)\n",
"test_dataset_original_autoencoder = AutoencoderMNIST(test_dataset_original)\n",
"test_loader_original_autoencoder = torch.utils.data.DataLoader(\n",
" dataset=test_dataset_original_autoencoder,\n",
" batch_size=batch_size,\n",
" shuffle=True\n",
")\n",
"\n",
"my_epoch_Autoencoder = []\n",
"my_train_cost_Autoencoder = []\n",
"my_val_cost_Autoencoder = []\n",
"my_test_cost_Autoencoder = []\n",
"reconstructions = []\n",
"\n",
"for N_train_data in training_points:\n",
" model = Autoencoder(ConvNeuralNet(), BottleneckLayer(M), ConvNeuralNetDecoder(M)).to(device)\n",
"\n",
" sampled_train_loader, sampled_val_loader = get_random_sample_train_val(\n",
" train_dataset_autoencoder,\n",
" val_dataset_autoencoder,\n",
" batch_size,\n",
" N_train_data\n",
" )\n",
" optimizer = torch.optim.Adam(params=model.parameters(), lr=0.001)\n",
"\n",
" my_epoch, my_train_cost, my_val_cost, my_test_cost = train(\n",
" model,\n",
" sampled_train_loader,\n",
" sampled_val_loader,\n",
" test_loader_original_autoencoder,\n",
" cost_autoencoder,\n",
" optimizer,\n",
" epochs_max_autoencoder,\n",
" acc_flag_autoencoder,\n",
" triplet_flag_autoencoder,\n",
" task_name_autoencoder,\n",
" N_train_data\n",
" )\n",
" my_epoch_Autoencoder.append(my_epoch)\n",
" my_train_cost_Autoencoder.append(my_train_cost)\n",
" my_val_cost_Autoencoder.append(my_val_cost)\n",
" my_test_cost_Autoencoder.append(my_test_cost)\n",
"\n",
" original_images = []\n",
" reconstructed_images = []\n",
" model.eval()\n",
" with torch.no_grad():\n",
" for batch_idx, (X, _) in enumerate(sampled_val_loader):\n",
" if batch_idx == 0:\n",
" outputs = model(X)\n",
" orig = X.numpy()\n",
" original_images.extend(orig)\n",
" recon = outputs.numpy()\n",
" reconstructed_images.extend(recon)\n",
" plot_reconstructions(original_images, reconstructed_images, N_train_data, epochs_max_autoencoder)\n",
" break\n",
"\n",
" reconstructions.append((N_train_data, original_images, reconstructed_images))"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"### Discussion point 3"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"- Plot the performance of the network across epochs. What's the relationship between sample size and iteration complexity?\n",
"- What do you think of the images plotted above? Does the autoencoding task require more or less data than the two previous tasks (classification of digit and regression of number of ON pixels)?\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 1773,
"status": "ok",
"timestamp": 1718210609824,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
}
},
"outputs": [],
"source": [
"# Create a single plot for all test costs with a logarithmic scale\n",
"\n",
"with plt.xkcd():\n",
" plt.figure(figsize=(8, 6)) # Set the figure size\n",
"\n",
" for i, n in enumerate(training_points):\n",
" epochs = my_epoch_Autoencoder[i]\n",
" test_cost = my_test_cost_Autoencoder[i]\n",
" plt.plot(epochs, test_cost, marker='o', linestyle='-', label=f'{n} training points')\n",
"\n",
" plt.xlabel('Epochs')\n",
" plt.ylabel('Test cost (log scale)')\n",
" plt.title('Test cost over epochs for different training points (autoencoder)')\n",
" plt.yscale('log')\n",
" plt.legend()\n",
" plt.grid(True)\n",
" plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"execution": {}
},
"source": [
"[*Click for solution*](https://github.com/neuromatch/NeuroAI_Course/tree/main/tutorials/W1D2_ComparingTasks/solutions/W1D2_Tutorial1_Solution_e18cfe86.py)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_Discussion_Point_3\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"## Section 1.4: Self-supervised - Inpainting\n",
"\n",
"In this section, we tackle a self-supervised task using the same architecture. Given an image of a handwritten digit with a patch of size $N×N$ masked out, the objective is to reconstruct the image by accurately predicting the pixel values in the masked region. Thus, the network must use the surrounding context to effectively \"inpaint\" the missing portion.\n",
"\n",
"### Task objective\n",
"\n",
"The task is to build an autoencoder that can fill in missing parts of an image, a process known as inpainting. This involves training the model to reconstruct the entire image. Implicitely, the model will predict and reconstruct the obstructed regions of the image using the contextual information from the unobstructed parts.\n",
"\n",
"Important note: this is a simplified inpainting task and not how inpainting is usually define. Usually, the region to be inpainted is provided as part of the input.\n",
"\n",
"### Random masking\n",
"\n",
"First, we implement a function to randomly mask a part of the image. This function will be used to generate the training data for our inpainting task."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 2,
"status": "ok",
"timestamp": 1718209379977,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
}
},
"outputs": [],
"source": [
"def random_mask(images, mask_size=8):\n",
" \"\"\"\n",
" Randomly mask an N x N patch in a batch of images.\n",
"\n",
" Parameters:\n",
" - images: A batch of images as a PyTorch tensor, shape (batch_size, channels, height, width)\n",
" - mask_size: Size of the square mask (N)\n",
"\n",
" Returns:\n",
" - A new batch of images with the masked portions zeroed out.\n",
" \"\"\"\n",
" # Clone the images to avoid modifying the original data\n",
" obstructed_images = images.clone()\n",
"\n",
" batch_size, height, width = images.size()\n",
"\n",
" for i in range(batch_size):\n",
" # Choose a random location for the mask\n",
" y = np.random.randint(0, height - mask_size)\n",
" x = np.random.randint(0, width - mask_size)\n",
"\n",
" # Apply the mask by setting the pixel values to 0 (or another value)\n",
" obstructed_images[i, y:y + mask_size, x:x + mask_size] = 0\n",
"\n",
" return obstructed_images"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"Here's one example of a masked image."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {}
},
"outputs": [],
"source": [
"plt.figure(figsize=(4, 2))\n",
"ind = 12\n",
"img, label = train_dataset[ind]\n",
"plt.subplot(121)\n",
"plt.imshow(img.numpy().squeeze(), cmap='gray')\n",
"plt.title(f\"Original\")\n",
"plt.axis(False)\n",
"plt.tight_layout()\n",
"plt.subplot(122)\n",
"img_masked = random_mask(img, mask_size=12)\n",
"plt.imshow(img_masked.numpy().squeeze(), cmap='gray')\n",
"plt.title(f\"Masked\")\n",
"plt.axis(False)\n",
"plt.tight_layout()"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"This function randomly places a N×N mask in each image by setting the pixel values within this region to zero.\n",
"\n",
"### Autoencoder and cost function\n",
"\n",
"We re-use the same autoencoder architecture as in the previous sections, with an encoder, bottleneck layer, and decoder. We also use the Mean Squared Error (MSE) loss, as it measures the reconstruction error between the predicted and actual pixel values."
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"### Dataset\n",
"\n",
"This custom dataset class prepares the MNIST images for the inpainting task, applying necessary transformations and adding random masking to create the training data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 4,
"status": "ok",
"timestamp": 1718209379979,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
}
},
"outputs": [],
"source": [
"class InpaintingMNIST(torch.utils.data.Dataset):\n",
" def __init__(self, mnist_dataset):\n",
" self.dataset = mnist_dataset\n",
"\n",
" def __getitem__(self, index):\n",
" X, y = self.dataset[index]\n",
" obstructed = random_mask(X, mask_size=8)\n",
" return obstructed, X\n",
"\n",
" def __len__(self):\n",
" return len(self.dataset)"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"### Model training and evaluation\n",
"\n",
"We train the autoencoder on different dataset sizes (10, 100, 1000, 10000) to evaluate how the sample size affects the model's performance. Training continues until validation performance stops improving, and test performance is recorded at each epoch. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 64228,
"status": "ok",
"timestamp": 1718209620094,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
}
},
"outputs": [],
"source": [
"# Define constants\n",
"set_seed(42)\n",
"\n",
"training_points = np.array([10, 100, 1000, 10000])\n",
"task_name_inpainting = \"inpainting\"\n",
"\n",
"acc_flag_inpainting = False\n",
"triplet_flag_inpainting = False\n",
"epochs_max_inpainting = 10\n",
"\n",
"my_epoch_Inpainting = []\n",
"my_train_cost_Inpainting = []\n",
"my_val_cost_Inpainting = []\n",
"my_test_cost_Inpainting = []\n",
"reconstructions_inpainting = []\n",
"\n",
"# Create inpainting versions of the training, validation, and test datasets\n",
"train_dataset_inpainting = InpaintingMNIST(train_dataset)\n",
"val_dataset_inpainting = InpaintingMNIST(val_dataset)\n",
"test_dataset_original_inpainting = InpaintingMNIST(test_dataset_original)\n",
"\n",
"# Create a data loader for the inpainting test dataset\n",
"test_loader_original_inpainting = torch.utils.data.DataLoader(\n",
" dataset=test_dataset_original_inpainting,\n",
" batch_size=batch_size,\n",
" shuffle=True\n",
")\n",
"\n",
"for N_train_data in training_points:\n",
" model = Autoencoder(ConvNeuralNet(), BottleneckLayer(M), ConvNeuralNetDecoder(M)).to(device)\n",
"\n",
" sampled_train_loader, sampled_val_loader = get_random_sample_train_val(\n",
" train_dataset_inpainting,\n",
" val_dataset_inpainting,\n",
" batch_size,\n",
" N_train_data\n",
" )\n",
" optimizer = torch.optim.Adam(params=model.parameters(), lr=0.001)\n",
"\n",
" # Update the train function call to get training costs\n",
" my_epoch, my_train_cost, my_val_cost, my_test_cost = train(\n",
" model,\n",
" sampled_train_loader,\n",
" sampled_val_loader,\n",
" test_loader_original_inpainting,\n",
" cost_autoencoder,\n",
" optimizer,\n",
" epochs_max_inpainting,\n",
" acc_flag_inpainting,\n",
" triplet_flag_inpainting,\n",
" task_name_inpainting,\n",
" N_train_data\n",
" )\n",
"\n",
" my_epoch_Inpainting.append(my_epoch)\n",
" my_train_cost_Inpainting.append(my_train_cost)\n",
" my_val_cost_Inpainting.append(my_val_cost)\n",
" my_test_cost_Inpainting.append(my_test_cost)\n",
" original_images = []\n",
" reconstructed_images = []\n",
" model.eval()\n",
" with torch.no_grad():\n",
" for batch_idx, (X, _) in enumerate(sampled_val_loader):\n",
" if batch_idx == 0: # Only visualize the first batch for simplicity\n",
" outputs = model(X)\n",
" orig = X.numpy()\n",
" original_images.extend(orig)\n",
" recon = outputs.numpy()\n",
" reconstructed_images.extend(recon)\n",
" fig = plt.figure(figsize=(8, 4))\n",
" rows, cols = 2, 6\n",
" image_count = 1\n",
" for i in range(1,(rows*cols),2 ):\n",
" fig.add_subplot(rows, cols, i)\n",
" plt.imshow(np.squeeze(orig[image_count]), cmap='gray')\n",
" fig.add_subplot(rows, cols, i+1)\n",
" plt.imshow(np.squeeze(recon[image_count]), cmap='gray')\n",
" image_count+=1\n",
" break\n",
" plt.suptitle(\"Training for 10 epochs with {} points\".format(N_train_data))\n",
"\n",
" reconstructions_inpainting.append((N_train_data, original_images, reconstructed_images))"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"### Discussion point 4"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"- Plot the performance of the model on the test dataset for each network across epochs. What's the relationship between sample size and number of training epochs?\n",
"- How do these compare with the other examples above?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 1019,
"status": "ok",
"timestamp": 1718210627109,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
}
},
"outputs": [],
"source": [
"# Create a single plot for all test costs with a logarithmic scale\n",
"with plt.xkcd():\n",
" plt.figure(figsize=(8, 6)) # Set the figure size\n",
"\n",
" for i, n in enumerate(training_points):\n",
" epochs = my_epoch_Inpainting[i]\n",
" test_cost = my_test_cost_Inpainting[i]\n",
" plt.plot(epochs, test_cost, marker='o', linestyle='-', label=f'{n} training points')\n",
"\n",
" plt.xlabel('Epochs')\n",
" plt.ylabel('Test cost (log scale)')\n",
" plt.title('Test cost over epochs for different training points (inpainting)')\n",
" plt.yscale('log')\n",
" plt.legend()\n",
" plt.grid(True)\n",
" plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"execution": {}
},
"source": [
"[*Click for solution*](https://github.com/neuromatch/NeuroAI_Course/tree/main/tutorials/W1D2_ComparingTasks/solutions/W1D2_Tutorial1_Solution_9836554e.py)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_Discussion_Point_5\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Section 2: Generalization of representations\n",
"\n",
"In the first part of this tutorial, we focused on training different networks on different tasks, measuring how fast these networks learned in terms of training samples and training epochs. Here, we shift our attention to how well these representations generalize across different tasks. \n",
"\n",
"## Section 2.1: Transfer\n",
"\n",
"### Taskonomy\n",
"\n",
"Humans can perform a wide range of tasks. The representations within this system must be general enough to handle multiple tasks, yet specific enough to accommodate differing requirements. We assess the ability of the representations we have learned to perform different tasks by testing generalization from one model to another.\n",
"\n",
"### Transfer learning\n",
"\n",
"We'll measure the ability of a representation learned for one task to transfer to other tasks. For each of the four tasks, we'll use the network trained for the highest number of training points and epochs. A new test set will be provided for each task. To test generalization, we follow these steps:\n",
"\n",
"1. Create a new network for the destination task using the same architecture as in part 1.\n",
"2. Copy the weights for the first N layers from the source network.\n",
"3. Fix these copied weights so they do not change during training.\n",
"4. Train the remaining layers on the destination network.\n",
"\n",
"After training, we'll compare the new performance to the original performance to assess transfer performance between each pair of networks. \n",
"\n",
"Because we have N=4 different models, we could measure $2*(N-1)*(N-2)=12$ transfer directions. In the interest of time, we'll concentrate on measuring 3 transfer performances from the last 3 models to the classification model.\n",
"\n",
"### Copy weights\n",
"\n",
"We start by defining a function to copy the weights and freeze the layers from the source model to the destination model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 2,
"status": "ok",
"timestamp": 1718209621274,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
}
},
"outputs": [],
"source": [
"def copy_and_freeze_layers(source_model, destination_model, N):\n",
" \"\"\"\n",
" Copies the weights of the first N layers from the source model to the\n",
" destination model and freezes them.\n",
"\n",
" Parameters:\n",
" - source_model: The model from which weights are copied.\n",
" - destination_model: The model to which weights are copied and trained.\n",
" - N: The number of layers to transfer and freeze.\n",
"\n",
" Note: This function assumes the first N layers are directly accessible and\n",
" does not handle nested modules automatically.\n",
" \"\"\"\n",
"\n",
" # Ensure both models are in the same mode (train/eval).\n",
" source_model.eval()\n",
" destination_model.train()\n",
"\n",
" # Counter to track the number of transferred layers\n",
" transferred_layers = 0\n",
"\n",
" source_dict = source_model.state_dict()\n",
" dest_dict = destination_model.state_dict()\n",
"\n",
" # Variables to map between layers\n",
" source_prefix = list(source_dict.keys())[0]\n",
" source_prefix = source_prefix[:source_prefix.find('.')]\n",
" source_dict_names = [s[s.find('.')+1:] for s in list(source_dict.keys())]\n",
" dest_prefix = list(dest_dict.keys())[0]\n",
" dest_prefix = dest_prefix[:dest_prefix.find('.')]\n",
" dest_dict_names = [s[s.find('.')+1:] for s in list(dest_dict.keys())]\n",
" print(dest_dict_names)\n",
"\n",
" # Transfer layers by matching keys\n",
" for name in source_dict_names:\n",
" if name in dest_dict_names and transferred_layers < N:\n",
" try:\n",
" dest_dict[dest_prefix+'.'+name].copy_(source_dict[source_prefix+'.'+name])\n",
" transferred_layers += 1\n",
" print(f\"Copied {name}\")\n",
" except Exception as e:\n",
" print(f\"Could not copy {name}: {e}\")\n",
"\n",
" # Load the updated state dictionary back to the destination model\n",
" destination_model.load_state_dict(dest_dict)\n",
" print(transferred_layers)\n",
" # Freeze the transferred layers\n",
" for name, param in destination_model.named_parameters():\n",
" if name[name.find('.')+1:] in source_dict_names and transferred_layers > 0:\n",
" print('froze ',name)\n",
" param.requires_grad = False\n",
" transferred_layers -= 1\n",
"\n",
" print(f\"Copied and froze {N} layers.\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"## Transfer example 1: regression to classification\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"We first copy the weights from a regression network to a classification network and freeze them."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 0,
"status": "ok",
"timestamp": 1718209621291,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
}
},
"outputs": [],
"source": [
"MODEL_NAME_SOURCE = \"models/ConvNet_regression_10000_epoch_10.pth\"\n",
"model_source = RegressionConvNet(ConvNeuralNet(), RegressionOutputLayer())\n",
"\n",
"# Load the checkpoint\n",
"checkpoint = torch.load(MODEL_NAME_SOURCE)\n",
"\n",
"# Extract the state dictionary from the checkpoint\n",
"model_state_dict = checkpoint['model_state_dict']\n",
"\n",
"# Load the state dictionary into the model\n",
"model_source.load_state_dict(model_state_dict)\n",
"\n",
"# Define the destination model\n",
"model_destination = ClassificationConvNet(ConvNeuralNet(), ClassificationOutputLayer())\n",
"\n",
"# Number of layers to transfer and freeze\n",
"N = 8\n",
"\n",
"# Transfer and freeze layers\n",
"copy_and_freeze_layers(model_source, model_destination, N)"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"#### Training\n",
"\n",
"We train the destination network on the classification task with the transferred weights."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 72190,
"status": "ok",
"timestamp": 1718209693462,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
}
},
"outputs": [],
"source": [
"set_seed(42)\n",
"\n",
"batch_size = 64 # Define your batch size\n",
"\n",
"test_loader_transfer = torch.utils.data.DataLoader(dataset = test_dataset_transfer,\n",
" batch_size = batch_size,\n",
" shuffle = True)\n",
"sampled_train_loader, sampled_val_loader = get_random_sample_train_val(train_dataset, val_dataset, batch_size, N_train_data)\n",
"\n",
"# Optimizer\n",
"optimizer = optim.Adam(params=model_destination.parameters(), lr=0.001)\n",
"\n",
"# Training parameters\n",
"N_train_data = 10000\n",
"task_name = \"regression_classification\"\n",
"epochs_max = 10 # Number of epochs\n",
"acc_flag = False # Whether to calculate accuracy\n",
"triplet_flag = False # Whether to use triplet loss\n",
"\n",
"# Call the train_transfer function\n",
"my_epoch_transfer1, my_train_cost_transfer1, my_val_cost_transfer1, my_test_cost_transfer1 = train_transfer(\n",
" model_destination,\n",
" sampled_train_loader,\n",
" sampled_val_loader,\n",
" test_loader_transfer,\n",
" cost_classification,\n",
" optimizer,\n",
" epochs_max,\n",
" acc_flag,\n",
" triplet_flag,\n",
" task_name,\n",
" N_train_data\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"We compare the performance of the transferred model with the original network."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 1484,
"status": "ok",
"timestamp": 1718210650543,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
}
},
"outputs": [],
"source": [
"# Create a single plot for all training costs with a logarithmic scale\n",
"with plt.xkcd():\n",
" plt.figure(figsize=(8, 6)) # Set the figure size\n",
"\n",
" epochs = my_epoch_Classification[-1]\n",
" epochs_transfer_1 = my_epoch_transfer1\n",
" test_cost = my_test_cost_Classification[-1]\n",
" test_cost_1 = my_test_cost_transfer1\n",
"\n",
" # Plot the autoencoder training cost\n",
" plt.plot(epochs, test_cost, marker='o', linestyle='-', label='Original Classifier (10000 training points)')\n",
"\n",
" # Plot the classification_autoencoder training cost\n",
" plt.plot(epochs_transfer_1, test_cost_1, marker='x', linestyle='-', label='Regression to Classification (10000 training points)')\n",
"\n",
" # Set the labels and title\n",
" plt.xlabel('Epochs')\n",
" plt.ylabel('Test cost (log scale)')\n",
" plt.title('Test cost over epochs')\n",
" plt.yscale('log') # Set the y-axis to a logarithmic scale\n",
"\n",
" # Add the legend and grid\n",
" plt.legend()\n",
" plt.grid(True)\n",
"\n",
" # Show the plot\n",
" plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"### Discussion point 5\n",
"\n",
"What did the regression model learn? What does its performance on the classification task after fine-tuning indicate?"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"execution": {}
},
"source": [
"[*Click for solution*](https://github.com/neuromatch/NeuroAI_Course/tree/main/tutorials/W1D2_ComparingTasks/solutions/W1D2_Tutorial1_Solution_430915db.py)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_Discussion_Point_5\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"## Transfer example 2: autoencoder to classification"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"Now we do the same, but we transfer the weights of the network to predict the total number of on pixels to the autoencoder task. How well do you think this will work?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 0,
"status": "ok",
"timestamp": 1718209694545,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
}
},
"outputs": [],
"source": [
"MODEL_NAME_SOURCE = \"models/ConvNet_autoencoder_10000_epoch_10.pth\"\n",
"model_source = Autoencoder(ConvNeuralNet(), BottleneckLayer(M), ConvNeuralNetDecoder(M))\n",
"\n",
"# Load the checkpoint\n",
"checkpoint = torch.load(MODEL_NAME_SOURCE)\n",
"\n",
"# Extract the state dictionary from the checkpoint\n",
"model_state_dict = checkpoint['model_state_dict']\n",
"\n",
"# Load the state dictionary into the model\n",
"model_source.load_state_dict(model_state_dict)\n",
"\n",
"# Define the destination model\n",
"\n",
"model_destination = ClassificationConvNet(ConvNeuralNet(), ClassificationOutputLayer())\n",
"\n",
"# Number of layers to transfer and freeze\n",
"N = 8\n",
"\n",
"# Transfer and freeze layers\n",
"copy_and_freeze_layers(model_source, model_destination, N)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 72995,
"status": "ok",
"timestamp": 1718209767523,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
}
},
"outputs": [],
"source": [
"set_seed(42)\n",
"\n",
"batch_size = 64 # Define your batch size\n",
"\n",
"test_loader_transfer = torch.utils.data.DataLoader(dataset = test_dataset_transfer,\n",
" batch_size = batch_size,\n",
" shuffle = True)\n",
"sampled_train_loader, sampled_val_loader = get_random_sample_train_val(train_dataset, val_dataset, batch_size, N_train_data)\n",
"\n",
"# Optimizer\n",
"optimizer = optim.Adam(params=model_destination.parameters(), lr=0.001)\n",
"\n",
"# Training parameters\n",
"N_train_data = 10000\n",
"task_name = \"autoencoder_classification\"\n",
"epochs_max = 10 # Number of epochs\n",
"acc_flag = False # Whether to calculate accuracy\n",
"triplet_flag = False # Whether to use triplet loss\n",
"\n",
"# Call the train_transfer function\n",
"my_epoch_transfer2, my_train_cost_transfer2, my_val_cost_transfer2, my_test_cost_transfer2 = train_transfer(\n",
" model_destination,\n",
" sampled_train_loader,\n",
" sampled_val_loader,\n",
" test_loader_transfer,\n",
" cost_classification,\n",
" optimizer,\n",
" epochs_max,\n",
" acc_flag,\n",
" triplet_flag,\n",
" task_name,\n",
" N_train_data\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"We compare the performance of the transferred model with the original network. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {},
"executionInfo": {
"elapsed": 1969,
"status": "ok",
"timestamp": 1718210668696,
"user": {
"displayName": "Leila Wehbe",
"userId": "09974289515413278669"
},
"user_tz": 240
}
},
"outputs": [],
"source": [
"# Create a single plot for all training costs with a logarithmic scale\n",
"with plt.xkcd():\n",
" plt.figure(figsize=(8, 6)) # Set the figure size\n",
"\n",
" epochs = my_epoch_Classification[-1]\n",
" epochs_transfer_1 = my_epoch_transfer1\n",
" epochs_transfer_2 = my_epoch_transfer2\n",
" test_cost = my_test_cost_Classification[-1]\n",
" test_cost_1 = my_test_cost_transfer1\n",
" test_cost_2 = my_test_cost_transfer2\n",
"\n",
" # Plot the autoencoder training cost\n",
" plt.plot(epochs, test_cost, marker='o', linestyle='-', label='Original Classifier (10000 training points)')\n",
"\n",
" # Plot the classification_autoencoder training cost\n",
" plt.plot(epochs_transfer_1, test_cost_1, marker='x', linestyle='-', label='Regression to Classification (10000 training points)')\n",
"\n",
"# Plot the classification_autoencoder training cost\n",
" plt.plot(epochs_transfer_2, test_cost_2, marker='x', linestyle='-', label='Autoencoder to Classification (10000 training points)')\n",
"\n",
" # Set the labels and title\n",
" plt.xlabel('Epochs')\n",
" plt.ylabel('Test cost (log scale)')\n",
" plt.title('Test cost over epochs')\n",
" plt.yscale('log') # Set the y-axis to a logarithmic scale\n",
"\n",
" # Add the legend and grid\n",
" plt.legend()\n",
" plt.grid(True)\n",
"\n",
" # Show the plot\n",
" plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"### Discussion point 6\n",
"\n",
"How well does the representation learned on the autoencoder transfer to classification?"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"execution": {}
},
"source": [
"[*Click for solution*](https://github.com/neuromatch/NeuroAI_Course/tree/main/tutorials/W1D2_ComparingTasks/solutions/W1D2_Tutorial1_Solution_733ba3ce.py)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_Discussion_Point_6\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"## Transfer example 3: inpainting to classification\n",
"\n",
"Finally, we'll transfer the weights of the inpainting network to the classification network. How well do you think this will work?"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {}
},
"outputs": [],
"source": [
"MODEL_NAME_SOURCE = \"models/ConvNet_inpainting_10000_epoch_10.pth\"\n",
"M = 16\n",
"model_source = Autoencoder(ConvNeuralNet(), BottleneckLayer(M), ConvNeuralNetDecoder(M))\n",
"\n",
"# Load the checkpoint\n",
"checkpoint = torch.load(MODEL_NAME_SOURCE)\n",
"\n",
"# Extract the state dictionary from the checkpoint\n",
"model_state_dict = checkpoint['model_state_dict']\n",
"\n",
"# Load the state dictionary into the model\n",
"model_source.load_state_dict(model_state_dict)\n",
"\n",
"# Define the destination model\n",
"\n",
"model_destination = ClassificationConvNet(ConvNeuralNet(), ClassificationOutputLayer())\n",
"\n",
"# Number of layers to transfer and freeze\n",
"N = 8\n",
"\n",
"# Transfer and freeze layers\n",
"copy_and_freeze_layers(model_source, model_destination, N)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {}
},
"outputs": [],
"source": [
"set_seed(42)\n",
"\n",
"batch_size = 64 # Define your batch size\n",
"\n",
"test_loader_transfer = torch.utils.data.DataLoader(dataset = test_dataset_transfer,\n",
" batch_size = batch_size,\n",
" shuffle = True)\n",
"sampled_train_loader, sampled_val_loader = get_random_sample_train_val(train_dataset, val_dataset, batch_size, N_train_data)\n",
"\n",
"# Optimizer\n",
"optimizer = optim.Adam(params=model_destination.parameters(), lr=0.001)\n",
"\n",
"# Training parameters\n",
"N_train_data = 10000\n",
"task_name = \"inpainting_classification\"\n",
"epochs_max = 10 # Number of epochs\n",
"acc_flag = False # Whether to calculate accuracy\n",
"triplet_flag = False # Whether to use triplet loss\n",
"\n",
"# Call the train_transfer function\n",
"my_epoch_transfer3, my_train_cost_transfer3, my_val_cost_transfer3, my_test_cost_transfer3 = train_transfer(\n",
" model_destination,\n",
" sampled_train_loader,\n",
" sampled_val_loader,\n",
" test_loader_transfer,\n",
" cost_classification,\n",
" optimizer,\n",
" epochs_max,\n",
" acc_flag,\n",
" triplet_flag,\n",
" task_name,\n",
" N_train_data\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"execution": {}
},
"outputs": [],
"source": [
"# Create a single plot for all training costs with a logarithmic scale\n",
"with plt.xkcd():\n",
" plt.figure(figsize=(8, 6)) # Set the figure size\n",
"\n",
" epochs = my_epoch_Classification[-1]\n",
" epochs_transfer_1 = my_epoch_transfer1\n",
" epochs_transfer_2 = my_epoch_transfer2\n",
" epochs_transfer_3 = my_epoch_transfer3\n",
" test_cost = my_test_cost_Classification[-1]\n",
" test_cost_1 = my_test_cost_transfer1\n",
" test_cost_2 = my_test_cost_transfer2\n",
" test_cost_3 = my_test_cost_transfer3\n",
"\n",
" plt.plot(epochs, test_cost, marker='o', linestyle='-', label='Original Classifier (10000 training points)')\n",
" plt.plot(epochs_transfer_1, test_cost_1, marker='x', linestyle='-', label='Regression to Classification (10000 training points)')\n",
" plt.plot(epochs_transfer_2, test_cost_2, marker='x', linestyle='-', label='Autoencoder to Classification (10000 training points)')\n",
" plt.plot(epochs_transfer_3, test_cost_3, marker='x', linestyle='-', label='Inpainting to Classification (10000 training points)')\n",
"\n",
" # Set the labels and title\n",
" plt.xlabel('Epochs')\n",
" plt.ylabel('Test cost (log scale)')\n",
" plt.title('Test cost over epochs')\n",
" plt.yscale('log') # Set the y-axis to a logarithmic scale\n",
"\n",
" # Add the legend and grid\n",
" plt.legend()\n",
" plt.grid(True)\n",
"\n",
" # Show the plot\n",
" plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"### Discussion point 7\n",
"\n",
"What does the test loss across these three networks (regression, autoencoder, inpainting) tell us about the representations learned by the networks?"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"execution": {}
},
"source": [
"[*Click for solution*](https://github.com/neuromatch/NeuroAI_Course/tree/main/tutorials/W1D2_ComparingTasks/solutions/W1D2_Tutorial1_Solution_329eb9d7.py)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_Discussion_Point_7\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"### Bonus discussion point 8\n",
"\n",
"\n",
"How would you find out if the representations learned by the networks are similar or different, apart from their performance on downstream tasks?"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"execution": {}
},
"source": [
"[*Click for solution*](https://github.com/neuromatch/NeuroAI_Course/tree/main/tutorials/W1D2_ComparingTasks/solutions/W1D2_Tutorial1_Solution_9e82edae.py)\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Submit your feedback\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"execution": {},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# @title Submit your feedback\n",
"content_review(f\"{feedback_prefix}_Discussion_Point_8\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"---\n",
"# Summary"
]
},
{
"cell_type": "markdown",
"metadata": {
"execution": {}
},
"source": [
"In this tutorial, we've explored the concept of generalization of representation in AI. We trained a network on several distinct tasks, including classification, regression, autoencoding, and inpainting. We explored well a network trained on one task transfers to another. We found that richer tasks, including inpainting and autoencoding, lead to more useful representations for downstream tasks like classification."
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"collapsed_sections": [],
"gpuType": "T4",
"include_colab_link": true,
"name": "W1D2_Tutorial1",
"provenance": [],
"toc_visible": true
},
"kernel": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.19"
}
},
"nbformat": 4,
"nbformat_minor": 4
}