{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {},
    "id": "view-in-github"
   },
   "source": [
    "<a href=\"https://colab.research.google.com/github/neuromatch/NeuroAI_Course/blob/main/tutorials/W2D4_Macrolearning/student/W2D4_Tutorial4.ipynb\" target=\"_blank\"><img alt=\"Open In Colab\" src=\"https://colab.research.google.com/assets/colab-badge.svg\"/></a>   <a href=\"https://kaggle.com/kernels/welcome?src=https://raw.githubusercontent.com/neuromatch/NeuroAI_Course/main/tutorials/W2D4_Macrolearning/student/W2D4_Tutorial4.ipynb\" target=\"_blank\"><img alt=\"Open in Kaggle\" src=\"https://kaggle.com/static/images/open-in-kaggle.svg\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "# Tutorial 4: Biological meta reinforcement learning  \n",
    "\n",
    "**Week 2, Day 4: Macro-Learning**\n",
    "\n",
    "**By Neuromatch Academy**\n",
    "\n",
    "__Content creators:__ Hlib Solodzhuk, Ximeng Mao, Grace Lindsay\n",
    "\n",
    "__Content reviewers:__ Aakash Agrawal, Alish Dipani, Hossein Rezaei, Yousef Ghanbari, Mostafa Abdollahi, Hlib Solodzhuk, Ximeng Mao, Samuele Bolotta, Grace Lindsay\n",
    "\n",
    "__Production editors:__ Konstantine Tsafatinos, Ella Batty, Spiros Chavlis, Samuele Bolotta, Hlib Solodzhuk\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "___\n",
    "\n",
    "\n",
    "# Tutorial Objectives\n",
    "\n",
    "*Estimated timing of tutorial: 70 minutes*\n",
    "\n",
    "In this tutorial, you will observe how meta-learning may occur in the brain, specifically through reinforcement learning and the Baldwin effect."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "remove-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @markdown\n",
    "from IPython.display import IFrame\n",
    "from ipywidgets import widgets\n",
    "out = widgets.Output()\n",
    "with out:\n",
    "    print(f\"If you want to download the slides: https://osf.io/download/t36w8/\")\n",
    "    display(IFrame(src=f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/t36w8/?direct%26mode=render%26action=download%26mode=render\", width=730, height=410))\n",
    "display(out)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "---\n",
    "# Setup\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Install and import feedback gadget\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Install and import feedback gadget\n",
    "\n",
    "!pip install numpy matplotlib ipywidgets jupyter_ui_poll torch tqdm vibecheck datatops --quiet\n",
    "\n",
    "from vibecheck import DatatopsContentReviewContainer\n",
    "def content_review(notebook_section: str):\n",
    "    return DatatopsContentReviewContainer(\n",
    "        \"\",  # No text prompt\n",
    "        notebook_section,\n",
    "        {\n",
    "            \"url\": \"https://pmyvdlilci.execute-api.us-east-1.amazonaws.com/klab\",\n",
    "            \"name\": \"neuromatch_neuroai\",\n",
    "            \"user_key\": \"wb2cxze8\",\n",
    "        },\n",
    "    ).render()\n",
    "\n",
    "\n",
    "feedback_prefix = \"W2D4_T4\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Imports\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Imports\n",
    "\n",
    "#working with data\n",
    "import numpy as np\n",
    "import random\n",
    "\n",
    "#plotting\n",
    "import matplotlib.pyplot as plt\n",
    "import logging\n",
    "\n",
    "#interactive display\n",
    "import ipywidgets as widgets\n",
    "from IPython.display import display, clear_output\n",
    "from jupyter_ui_poll import ui_events\n",
    "import time\n",
    "from tqdm import tqdm\n",
    "\n",
    "#modeling\n",
    "import copy\n",
    "import torch\n",
    "import torch.nn as nn\n",
    "import torch.nn.functional as F\n",
    "import torch.optim as optim\n",
    "from torch.autograd import Variable"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Figure settings\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Figure settings\n",
    "\n",
    "logging.getLogger('matplotlib.font_manager').disabled = True\n",
    "\n",
    "%matplotlib inline\n",
    "%config InlineBackend.figure_format = 'retina' # perfrom high definition rendering for images and plots\n",
    "plt.style.use(\"https://raw.githubusercontent.com/NeuromatchAcademy/course-content/main/nma.mplstyle\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Plotting functions\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Plotting functions\n",
    "\n",
    "def plot_cumulative_rewards(rewards):\n",
    "    \"\"\"\n",
    "    Plot the cumulative rewards over time.\n",
    "\n",
    "    Inputs:\n",
    "    - rewards (list): list containing the cumulative rewards at each time step.\n",
    "    \"\"\"\n",
    "    with plt.xkcd():\n",
    "        plt.plot(range(len(rewards)), rewards)\n",
    "        plt.xlabel('Time Step')\n",
    "        plt.ylabel('Cumulative Reward')\n",
    "        plt.title('Cumulative Reward Over Time')\n",
    "        plt.show()\n",
    "\n",
    "\n",
    "def plot_boxplot_scores(scores):\n",
    "    \"\"\"\n",
    "    Plots a boxplot of the given scores.\n",
    "\n",
    "    Inputs:\n",
    "    scores (list): list of scores.\n",
    "    \"\"\"\n",
    "    with plt.xkcd():\n",
    "        plt.boxplot(scores, labels = [''])\n",
    "        plt.xlabel('')\n",
    "        plt.ylabel('Score')\n",
    "        plt.title('Distribution of Scores')\n",
    "        plt.show()\n",
    "\n",
    "def plot_two_boxplot_scores(newbie_scores, experienced_scores):\n",
    "    \"\"\"\n",
    "    Plots two boxplots of the given scores.\n",
    "\n",
    "    Inputs:\n",
    "    scores (list): list of scores.\n",
    "    \"\"\"\n",
    "    with plt.xkcd():\n",
    "        plt.boxplot([newbie_scores, experienced_scores], labels=['Newbie', 'Experienced'])\n",
    "        plt.xlabel('Agent')\n",
    "        plt.ylabel('Score')\n",
    "        plt.title('Distribution of Scores')\n",
    "        plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Helper functions\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Helper functions\n",
    "\n",
    "def generate_symbols():\n",
    "    \"\"\"\n",
    "    Generate random symbols for playing Harlow experiment.\n",
    "\n",
    "    Outputs:\n",
    "    - symbols (list): list of symbols.\n",
    "    \"\"\"\n",
    "    symbols = []\n",
    "    symbol_types = ['circle', 'square', 'triangle', 'star', 'pentagon', 'hexagon', 'octagon', 'diamond', 'arrow', 'rectangle']\n",
    "    symbol_types = np.random.permutation(symbol_types)\n",
    "\n",
    "    for symbol_type in symbol_types:\n",
    "        color = np.random.choice(['red', 'blue', 'green', 'yellow'])\n",
    "\n",
    "        if symbol_type == 'circle':\n",
    "            symbol = plt.Circle((0.5, 0.5), 0.3, color=color)\n",
    "        elif symbol_type == 'square':\n",
    "            symbol = plt.Rectangle((0.2, 0.2), 0.6, 0.6, color=color)\n",
    "        elif symbol_type == 'triangle':\n",
    "            symbol = plt.Polygon([(0.2, 0.2), (0.5, 0.8), (0.8, 0.2)], closed=True, color=color)\n",
    "        elif symbol_type == 'star':\n",
    "            symbol = plt.Polygon([(0.5, 1), (0.6, 0.7), (0.8, 0.7), (0.65, 0.5), (0.75, 0.3),\n",
    "                                  (0.5, 0.45), (0.25, 0.3), (0.35, 0.5), (0.2, 0.7), (0.4, 0.7)], closed=True, color=color)\n",
    "        elif symbol_type == 'pentagon':\n",
    "            symbol = plt.Polygon([(0.5 + 0.2*np.cos(2*np.pi*i/5), 0.5 + 0.2*np.sin(2*np.pi*i/5)) for i in range(5)], closed=True, color=color)\n",
    "        elif symbol_type == 'hexagon':\n",
    "            symbol = plt.Polygon([(0.5 + 0.2*np.cos(2*np.pi*i/6), 0.5 + 0.2*np.sin(2*np.pi*i/6)) for i in range(6)], closed=True, color=color)\n",
    "        elif symbol_type == 'octagon':\n",
    "            symbol = plt.Polygon([(0.5 + 0.2*np.cos(2*np.pi*i/8), 0.5 + 0.2*np.sin(2*np.pi*i/8)) for i in range(8)], closed=True, color=color)\n",
    "        elif symbol_type == 'diamond':\n",
    "            symbol = plt.Polygon([(0.5, 0.7), (0.3, 0.5), (0.5, 0.3), (0.7, 0.5)], closed=True, color=color)\n",
    "        elif symbol_type == 'arrow':\n",
    "            symbol = plt.Polygon([(0.3, 0.3), (0.5, 0.7), (0.7, 0.3), (0.5, 0.5)], closed=True, color=color)\n",
    "        elif symbol_type == 'rectangle':\n",
    "            symbol = plt.Rectangle((0.4, 0.2), 0.2, 0.6, color=color)\n",
    "        symbols.append(symbol)\n",
    "\n",
    "    return symbols\n",
    "\n",
    "def run_dummy_agent(env):\n",
    "    \"\"\"\n",
    "    Implement dummy agent strategy: chooses the last rewarded action.\n",
    "\n",
    "    Inputs:\n",
    "    - env (HarlowExperimentEnv): An environment.\n",
    "    \"\"\"\n",
    "    action = 0\n",
    "    cumulative_reward = 0\n",
    "    rewards = [cumulative_reward]\n",
    "\n",
    "    for _ in (range(num_trials)):\n",
    "        _, reward = env.step(action)\n",
    "        cumulative_reward += reward\n",
    "        rewards.append(cumulative_reward)\n",
    "\n",
    "        #dummy agent\n",
    "        if reward == -1:\n",
    "            action = 1 - action\n",
    "    return rewards\n",
    "\n",
    "def game():\n",
    "    \"\"\"\n",
    "    Create interactive game which resembles one famous experiment!\n",
    "    \"\"\"\n",
    "    total_reward = 0\n",
    "    symbols = generate_symbols()\n",
    "    message = \"Start of the game!\"\n",
    "    total_attempts = 5 * 6  # Assuming 5 sets with 6 attempts each\n",
    "\n",
    "    left_button = widgets.Button(description=\"Left\")\n",
    "    right_button = widgets.Button(description=\"Right\")\n",
    "    button_box = widgets.HBox([left_button, right_button])\n",
    "\n",
    "    def define_choice(button):\n",
    "        \"\"\"\n",
    "        Change `choice` variable with respect to the pressed button.\n",
    "        \"\"\"\n",
    "        nonlocal choice\n",
    "        display(widgets.HTML(f\"<h3>{button.description}</h3>\"))\n",
    "        print(button.description)\n",
    "        if button.description == \"Left\":\n",
    "            choice = 0\n",
    "        else:\n",
    "            choice = 1\n",
    "\n",
    "    left_button.on_click(define_choice)\n",
    "    right_button.on_click(define_choice)\n",
    "\n",
    "    attempt_count = 0  # Initialize attempt counter\n",
    "\n",
    "    for index in range(5):\n",
    "        first_symbol, second_symbol = symbols[2*index : 2*index + 2]\n",
    "        for attempt in range(6):\n",
    "            attempt_count += 1  # Increment the attempt counter\n",
    "            start_time = time.time()\n",
    "            clear_output(wait=True)\n",
    "            location_of_first_symbol = np.random.choice([0, 1])\n",
    "            display(widgets.HTML(f\"<h3>{message}</h3>\"))\n",
    "            display(widgets.HTML(f\"<h3>Total reward: {total_reward}</h3>\"))\n",
    "            display(widgets.HTML(f\"<h4>Objects:</h4>\"))\n",
    "\n",
    "            # Display attempt number out of total attempts\n",
    "            display(widgets.HTML(f\"<h4>Attempt {attempt_count} out of {total_attempts}</h4>\"))\n",
    "\n",
    "            if location_of_first_symbol == 0:\n",
    "                symbol_left = copy.copy(first_symbol)\n",
    "                symbol_right = copy.copy(second_symbol)\n",
    "            else:\n",
    "                symbol_left = copy.copy(second_symbol)\n",
    "                symbol_right = copy.copy(first_symbol)\n",
    "\n",
    "            with plt.xkcd():\n",
    "                fig, axs = plt.subplots(1, 2, figsize=(8, 4))\n",
    "\n",
    "                axs[0].add_patch(symbol_left)\n",
    "                axs[0].set_xlim(0, 1)\n",
    "                axs[0].set_ylim(0, 1)\n",
    "                axs[0].axis('off')\n",
    "\n",
    "                axs[1].add_patch(symbol_right)\n",
    "                axs[1].set_xlim(0, 1)\n",
    "                axs[1].set_ylim(0, 1)\n",
    "                axs[1].axis('off')\n",
    "\n",
    "                plt.show()\n",
    "\n",
    "            display(widgets.HTML(\"<h4>Choose Left or Right:</h4>\"))\n",
    "            display(button_box)\n",
    "\n",
    "            choice = -1\n",
    "            with ui_events() as poll:\n",
    "                while choice == -1:\n",
    "                    poll(10)\n",
    "                    time.sleep(0.1)\n",
    "                    if time.time() - start_time > 60:\n",
    "                        return\n",
    "\n",
    "            if choice == location_of_first_symbol:\n",
    "                total_reward += 1\n",
    "                message = \"You received a reward of +1.\"\n",
    "            else:\n",
    "                total_reward -= 1\n",
    "                message = \"You received a penalty of -1.\"\n",
    "\n",
    "    clear_output(wait=True)\n",
    "    display(widgets.HTML(f\"<h3>Your total reward: {total_reward}, congratulations! Do you have any idea what you should do to maximize the reward?</h3>\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Data retrieval\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Data retrieval\n",
    "\n",
    "import os\n",
    "import requests\n",
    "import hashlib\n",
    "\n",
    "# Variables for file and download URL\n",
    "fname = \"Evolution.pt\" # The name of the file to be downloaded\n",
    "url = \"https://osf.io/wmvh4/download\" # URL from where the file will be downloaded\n",
    "expected_md5 = \"d0a74898e56549f7c5206e4c8f373ced\" # MD5 hash for verifying file integrity\n",
    "\n",
    "if not os.path.isfile(fname):\n",
    "    try:\n",
    "        # Attempt to download the file\n",
    "        r = requests.get(url) # Make a GET request to the specified URL\n",
    "    except requests.ConnectionError:\n",
    "        # Handle connection errors during the download\n",
    "        print(\"!!! Failed to download data !!!\")\n",
    "    else:\n",
    "        # No connection errors, proceed to check the response\n",
    "        if r.status_code != requests.codes.ok:\n",
    "            # Check if the HTTP response status code indicates a successful download\n",
    "            print(\"!!! Failed to download data !!!\")\n",
    "        elif hashlib.md5(r.content).hexdigest() != expected_md5:\n",
    "            # Verify the integrity of the downloaded file using MD5 checksum\n",
    "            print(\"!!! Data download appears corrupted !!!\")\n",
    "        else:\n",
    "            # If download is successful and data is not corrupted, save the file\n",
    "            with open(fname, \"wb\") as fid:\n",
    "                fid.write(r.content) # Write the downloaded content to a file"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Set device (GPU or CPU)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Set device (GPU or CPU)\n",
    "\n",
    "def set_device():\n",
    "    \"\"\"\n",
    "    Determines and sets the computational device for PyTorch operations based on the availability of a CUDA-capable GPU.\n",
    "\n",
    "    Outputs:\n",
    "    - device (str): The device that PyTorch will use for computations ('cuda' or 'cpu'). This string can be directly used\n",
    "    in PyTorch operations to specify the device.\n",
    "    \"\"\"\n",
    "\n",
    "    device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
    "    if device != \"cuda\":\n",
    "        print(\"GPU is not enabled in this notebook. \\n\"\n",
    "              \"If you want to enable it, in the menu under `Runtime` -> \\n\"\n",
    "              \"`Hardware accelerator.` and select `GPU` from the dropdown menu\")\n",
    "    else:\n",
    "        print(\"GPU is enabled in this notebook. \\n\"\n",
    "              \"If you want to disable it, in the menu under `Runtime` -> \\n\"\n",
    "              \"`Hardware accelerator.` and select `None` from the dropdown menu\")\n",
    "\n",
    "    return device\n",
    "\n",
    "device = set_device()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Set random seed\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Set random seed\n",
    "\n",
    "import random\n",
    "import numpy as np\n",
    "import torch\n",
    "\n",
    "def set_seed(seed=None, seed_torch=True):\n",
    "  if seed is None:\n",
    "    seed = np.random.choice(2 ** 32)\n",
    "  random.seed(seed)\n",
    "  np.random.seed(seed)\n",
    "  if seed_torch:\n",
    "    torch.manual_seed(seed)\n",
    "    torch.cuda.manual_seed_all(seed)\n",
    "    torch.cuda.manual_seed(seed)\n",
    "    torch.backends.cudnn.benchmark = False\n",
    "    torch.backends.cudnn.deterministic = True\n",
    "\n",
    "set_seed(seed = 42)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "---\n",
    "\n",
    "# Section 0: Let's play a game!\n",
    "\n",
    "First, watch the video to understand our shift to reinforcement learning. Then, try out your own reinforcement learning skills! Below, you will play an interactive game, and your task is to maximize the total reward you receive!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Video 1: Reinforcement Learning Task\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "remove-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Video 1: Reinforcement Learning Task\n",
    "\n",
    "from ipywidgets import widgets\n",
    "from IPython.display import YouTubeVideo\n",
    "from IPython.display import IFrame\n",
    "from IPython.display import display\n",
    "\n",
    "class PlayVideo(IFrame):\n",
    "  def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
    "    self.id = id\n",
    "    if source == 'Bilibili':\n",
    "      src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
    "    elif source == 'Osf':\n",
    "      src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
    "    super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
    "\n",
    "def display_videos(video_ids, W=400, H=300, fs=1):\n",
    "  tab_contents = []\n",
    "  for i, video_id in enumerate(video_ids):\n",
    "    out = widgets.Output()\n",
    "    with out:\n",
    "      if video_ids[i][0] == 'Youtube':\n",
    "        video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
    "                             height=H, fs=fs, rel=0)\n",
    "        print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
    "      else:\n",
    "        video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
    "                          height=H, fs=fs, autoplay=False)\n",
    "        if video_ids[i][0] == 'Bilibili':\n",
    "          print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
    "        elif video_ids[i][0] == 'Osf':\n",
    "          print(f'Video available at https://osf.io/{video.id}')\n",
    "      display(video)\n",
    "    tab_contents.append(out)\n",
    "  return tab_contents\n",
    "\n",
    "video_ids = [('Youtube', 'gMokzgVkf5o'), ('Bilibili', 'BV1hf421D7y9')]\n",
    "tab_contents = display_videos(video_ids, W=730, H=410)\n",
    "tabs = widgets.Tab()\n",
    "tabs.children = tab_contents\n",
    "for i in range(len(tab_contents)):\n",
    "  tabs.set_title(i, video_ids[i][0])\n",
    "display(tabs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_rl_task\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "The rules are simple: on each turn, you are shown two distinct objects in two different hands (left or right). You should pick a hand; after that, you will immediately observe the reward for this particular choice. Good luck with maximizing your score! After playing, discuss in a group whether you have any clues about the underlying structure of the game and whether the most optimal strategy exists to play this game."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Make sure you execute this cell to play the game!\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Make sure you execute this cell to play the game!\n",
    "game()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_rl_game\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "---\n",
    "\n",
    "# Section 1: Harlow Experiment & Advantage Actor Critic (A2C) Agent\n",
    "\n",
    "*Estimated timing to here from start of tutorial: 10 minutes*\n",
    "\n",
    "In this section, we will introduce the reinforcement learning environment, replicating the 1940s Harlow experiment, and observe how different agents can learn to perform a single task."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Video 2: Reinforcement Learning on the Harlow Task\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "remove-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Video 2: Reinforcement Learning on the Harlow Task\n",
    "\n",
    "from ipywidgets import widgets\n",
    "from IPython.display import YouTubeVideo\n",
    "from IPython.display import IFrame\n",
    "from IPython.display import display\n",
    "\n",
    "class PlayVideo(IFrame):\n",
    "  def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
    "    self.id = id\n",
    "    if source == 'Bilibili':\n",
    "      src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
    "    elif source == 'Osf':\n",
    "      src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
    "    super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
    "\n",
    "def display_videos(video_ids, W=400, H=300, fs=1):\n",
    "  tab_contents = []\n",
    "  for i, video_id in enumerate(video_ids):\n",
    "    out = widgets.Output()\n",
    "    with out:\n",
    "      if video_ids[i][0] == 'Youtube':\n",
    "        video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
    "                             height=H, fs=fs, rel=0)\n",
    "        print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
    "      else:\n",
    "        video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
    "                          height=H, fs=fs, autoplay=False)\n",
    "        if video_ids[i][0] == 'Bilibili':\n",
    "          print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
    "        elif video_ids[i][0] == 'Osf':\n",
    "          print(f'Video available at https://osf.io/{video.id}')\n",
    "      display(video)\n",
    "    tab_contents.append(out)\n",
    "  return tab_contents\n",
    "\n",
    "video_ids = [('Youtube', 'QLu6MD_SUno'), ('Bilibili', 'BV1o7421R7Gu')]\n",
    "tab_contents = display_videos(video_ids, W=730, H=410)\n",
    "tabs = widgets.Tab()\n",
    "tabs.children = tab_contents\n",
    "for i in range(len(tab_contents)):\n",
    "  tabs.set_title(i, video_ids[i][0])\n",
    "display(tabs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_harlow_experiment\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Any RL system consists of an agent who tries to succeed in a task by observing the state of the environment, executing an action, and receiving the outcome (reward).\n",
    "\n",
    "First, we will define the environment for this task."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "class HarlowExperimentEnv():\n",
    "    def __init__(self, reward = 1, punishment = -1):\n",
    "        \"\"\"Initialize Harlow Experiment environment.\"\"\"\n",
    "        self.reward = reward\n",
    "        self.punishment = punishment\n",
    "\n",
    "        self.rewarded_digit = -1\n",
    "        self.punished_digit = -1\n",
    "\n",
    "        self.state = np.array([self.rewarded_digit, self.punished_digit])\n",
    "\n",
    "    def update_state(self):\n",
    "        \"\"\"Update state by selecting rewarded hand for random.\"\"\"\n",
    "        if np.random.rand() < 0.5:\n",
    "            self.state = np.array([self.rewarded_digit, self.punished_digit])\n",
    "        else:\n",
    "            self.state = np.array([self.punished_digit, self.rewarded_digit])\n",
    "\n",
    "    def reset(self):\n",
    "        \"\"\"Reset environment by updating new rewarded and punished digits as well as create current state of the world (tuple of observations).\"\"\"\n",
    "        self.rewarded_digit, self.punished_digit = np.random.choice(10, 2, replace=False)\n",
    "        self.update_state()\n",
    "        return self.state\n",
    "\n",
    "    def step(self, action):\n",
    "        \"\"\"Evaluate agent's perfromance, return reward and next observation.\"\"\"\n",
    "        if self.state[action] == self.rewarded_digit:\n",
    "            feedback = self.reward\n",
    "        else:\n",
    "            feedback = self.punishment\n",
    "        self.update_state()\n",
    "        return self.state, feedback"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Let's evaluate a simple strategy for this task: an agent always chooses the side that was previously rewarded (meaning it stays with the same hand if it received a reward and changes its action if it was punished). Do you think this agent uses information about the current state? How much cumulative reward do you expect this \"dummy\" agent to get?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Make sure you execute this cell to observe the plot!\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Make sure you execute this cell to observe the plot!\n",
    "\n",
    "set_seed(42)\n",
    "num_trials = 100\n",
    "env = HarlowExperimentEnv()\n",
    "env.reset()\n",
    "rewards = run_dummy_agent(env)\n",
    "\n",
    "plot_cumulative_rewards(rewards)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "**For now, simply run all the cells in this section without exploring the content. You can come back to look through the code if you have time after completing all the tutorials.**\n",
    "\n",
    "The dummy agent's strategy doesn't use object identity, which is key to consistently selecting the right action. Let's see if we can do better than the dummy agent's strategy. After defining the environment, observing the agent's behavior, and implementing such a simple policy, it is the right time to remind ourselves about more sophisticated agent architectures capable of learning the environment's dynamics. For this, we will use the Advantage Actor-Critic (A2C) algorithm.\n",
    "\n",
    "The main idea behind A2C, as its name suggests, is that it consists of two networks, named actor and critic. The actor network learns the policy (mapping states to actions), while the critic network learns the value function (estimating the expected future rewards from a given state). In most cases, they share the same \"body\" (i.e., model layers), and only the last linear projection to the output is specific to each network. The \"advantage\" term comes from the training step: instead of raw rewards, in A2C, we calculate the advantage function, which estimates how much better or worse an action is compared to the average action value for a given state.\n",
    "\n",
    "The architecture of the agent is as follows: it receives the previous state, previous reward, and previously chosen action as input, which is linearly projected to the `hidden_size` (this creates an embedding); then, its core consists of recurrent `LSTM` cells, their number is exactly `hidden_size`. Right after this RNN layer, there are two distinct linear projections: one for the actor (output dimension coincides with the number of actions; for the Harlow experiment, it is 2) and the other for the critic (outputs one value).\n",
    "\n",
    "We don't propose an exercise to code for the agent; when you have time, simply go through the cell below to understand the implementation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "class ActorCritic(nn.Module):\n",
    "    def __init__(self, hidden_size, num_inputs = 5, num_actions = 2):\n",
    "        \"\"\"Initialize Actor-Critic agent.\"\"\"\n",
    "        super(ActorCritic, self).__init__()\n",
    "\n",
    "        #num_actions is 2 because left/right hand\n",
    "        self.num_actions = num_actions\n",
    "\n",
    "        #num_inputs is 5 because one-hot encoding of action (2) + reward (1) + previous state (2)\n",
    "        self.num_inputs = num_inputs\n",
    "\n",
    "        self.hidden_size = hidden_size\n",
    "\n",
    "        #hyperparameters involved in training (important to keep assigned to the agent)\n",
    "        self.learning_rate = 0.00075 #learning rate for optimizer\n",
    "        self.discount_factor = 0.91 #gamma\n",
    "        self.state_value_estimate_cost = 0.4 #beta_v\n",
    "        self.entropy_cost = 0.001 #beta_e\n",
    "\n",
    "        self.emb = nn.Linear(num_inputs, hidden_size)\n",
    "        self.rnn = nn.LSTM(hidden_size, hidden_size)\n",
    "        self.critic_linear = nn.Linear(hidden_size, 1)\n",
    "        self.actor_linear = nn.Linear(hidden_size, num_actions)\n",
    "\n",
    "\n",
    "    def forward(self, state, h, c):\n",
    "        \"\"\"Implement forward pass through agent.\"\"\"\n",
    "        #at first, input goes through embedding\n",
    "        state = self.emb(state.unsqueeze(0))\n",
    "\n",
    "        #then through RNNs (observe that we pass hidden states too!)\n",
    "        state, hidden_states = self.rnn(state.unsqueeze(0), (h, c))\n",
    "        state = state.squeeze(0)\n",
    "        h, c = hidden_states\n",
    "        state = state.squeeze(0)\n",
    "\n",
    "        #critic -> value\n",
    "        value = self.critic_linear(state)\n",
    "\n",
    "        #actor -> policy\n",
    "        policy_logits = self.actor_linear(state)\n",
    "\n",
    "        return value, policy_logits, (h, c)\n",
    "\n",
    "    def get_init_hidden_states(self, batch_size=1, device = device):\n",
    "        \"\"\"Initialize hidden state with 0.\"\"\"\n",
    "        #initialize hidden state in RNNs\n",
    "        return (torch.zeros(1, batch_size, self.hidden_size).to(device), torch.zeros(1, batch_size, self.hidden_size).to(device))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "In the cell below, we define the training procedure for the A2C agent and its evaluation afterward. The function `train_evaluate_agent` performs `num_gradient_steps` gradient steps (by default 25), and for each of the steps, the agent is exposed to the environment's states sequence of length num_trials (by default 6, as in the classical Harlow experiment). Each gradient step, it performs backpropagation of the loss for these 6 trials by calculating advantage and weighting actor and critic losses with the entropy of the policy (for more information, please refer to [this resource](https://www.biorxiv.org/content/10.1101/295964v1.full.pdf), p.14). After the training, the evaluation phase starts, gathering rewards for `num_evaluation_trials` trials (by default 20).\n",
    "\n",
    "Note: the evaluation is completed on the same task (i.e., the same set of two objects) as the model was trained on."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "def train_evaluate_agent(env, agent, optimizer_func, num_gradient_steps=25, num_trials=6, num_evaluation_trials=20):\n",
    "    \"\"\"Training and evaluation for agent in Harlow experiment environment.\n",
    "    Evaluation goes only after all gradient steps.\n",
    "\n",
    "    Inputs:\n",
    "    - env (HarlowExperimentEnv): environment.\n",
    "    - agent (ActorCritic): particular instance of Actor Critic agent to train.\n",
    "    - optimizer_func (torch.optim.Optimizer): optimizer to use for training.\n",
    "    - num_gradient_steps (int, default=25): number of gradient steps to perform.\n",
    "    - num_trials (int, default=6): number of times the agent is exposed to the environment per gradient step to be trained.\n",
    "    - num_evaluation_trials (int, default=20): number of times the agent is exposed to the environment to evaluate it (no training happens during this phase).\n",
    "\n",
    "    Outputs:\n",
    "    - score (int): cumulative reward over all trials of evaluation.\n",
    "    \"\"\"\n",
    "    device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
    "    agent.to(device)\n",
    "\n",
    "    # Training\n",
    "\n",
    "    # Reset environment\n",
    "    state = env.reset()\n",
    "\n",
    "    # Define optimizer\n",
    "    optimizer = optimizer_func(agent.parameters(), lr=agent.learning_rate, eps=1e-5)\n",
    "\n",
    "    for _ in range(num_gradient_steps):\n",
    "\n",
    "        # For storing variables for training\n",
    "        log_probs = []\n",
    "        values = []\n",
    "        rewards = []\n",
    "        entropy_term = torch.tensor(0., device=device)\n",
    "\n",
    "        # Start conditions\n",
    "        h, c = agent.get_init_hidden_states()\n",
    "        preceding_reward = torch.tensor([0], device=device)\n",
    "        preceding_action = torch.tensor([0, 0], device=device)\n",
    "\n",
    "        for trial in range(num_trials):\n",
    "\n",
    "            # State + reward + one-hot encoding of action\n",
    "            full_state = torch.cat((torch.from_numpy(state).float().to(device), preceding_reward, preceding_action), dim=0)\n",
    "            value, policy_logits, step_hidden_states = agent(full_state, h, c)\n",
    "            h, c = step_hidden_states\n",
    "            value = value.squeeze(0)\n",
    "\n",
    "            # Sample action from policy\n",
    "            dist = torch.distributions.Categorical(logits=policy_logits.squeeze(0))\n",
    "            action = dist.sample()\n",
    "\n",
    "            # Perform action to get reward and new state\n",
    "            new_state, reward = env.step(action)\n",
    "\n",
    "            # Update preceding variables\n",
    "            preceding_reward = torch.tensor([reward], device=device)\n",
    "            preceding_action = F.one_hot(action, num_classes=2).float().to(device)\n",
    "            state = new_state\n",
    "\n",
    "            # For training\n",
    "            log_prob = dist.log_prob(action)\n",
    "            entropy = dist.entropy()\n",
    "            rewards.append(reward)\n",
    "            values.append(value)\n",
    "            log_probs.append(log_prob)\n",
    "            entropy_term += entropy\n",
    "\n",
    "        # Calculating loss\n",
    "        Qval = 0\n",
    "        Qvals = torch.zeros(len(rewards), device=device)\n",
    "        for t in reversed(range(len(rewards))):\n",
    "            Qval = rewards[t] + agent.discount_factor * Qval\n",
    "            Qvals[t] = Qval\n",
    "        values = torch.stack(values)\n",
    "        log_probs = torch.stack(log_probs)\n",
    "        advantage = Qvals - values\n",
    "        actor_loss = (-log_probs * advantage.detach()).mean()\n",
    "        critic_loss = advantage.pow(2).mean()\n",
    "        entropy_term = entropy_term / num_trials\n",
    "\n",
    "        # Loss incorporates actor/critic terms + entropy\n",
    "        loss = actor_loss + agent.state_value_estimate_cost * critic_loss - agent.entropy_cost * entropy_term\n",
    "\n",
    "        optimizer.zero_grad()\n",
    "        loss.backward()\n",
    "        optimizer.step()\n",
    "\n",
    "    # Evaluation (on the same task after all gradient steps!)\n",
    "    score = 0\n",
    "\n",
    "    # Start conditions\n",
    "    h, c = agent.get_init_hidden_states()\n",
    "    preceding_reward = torch.tensor([0], device=device)\n",
    "    preceding_action = torch.tensor([0, 0], device=device)\n",
    "\n",
    "    for _ in range(num_evaluation_trials):\n",
    "\n",
    "        # State + reward + one-hot encoding of action\n",
    "        full_state = torch.cat((torch.from_numpy(state).float().to(device), preceding_reward, preceding_action), dim=0)\n",
    "        value, policy_logits, step_hidden_states = agent(full_state, h, c)\n",
    "        h, c = step_hidden_states\n",
    "        value = value.squeeze(0)\n",
    "\n",
    "        # Sample action from policy\n",
    "        dist = torch.distributions.Categorical(logits=policy_logits.squeeze(0))\n",
    "        action = dist.sample()\n",
    "\n",
    "        # Perform action to get reward and new state\n",
    "        new_state, reward = env.step(action)\n",
    "\n",
    "        # Update preceding variables\n",
    "        preceding_reward = torch.tensor([reward], device=device)\n",
    "        preceding_action = F.one_hot(action, num_classes=2).float().to(device)\n",
    "        state = new_state\n",
    "\n",
    "        # Add reward to the score of agent\n",
    "        score += reward\n",
    "\n",
    "    return score"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Let's see what the score is for the default A2C agent in this Harlow experiment (as the number of evaluation trials is 20, the maximum score to obtain is exactly 20)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "set_seed(42)\n",
    "\n",
    "#define environment\n",
    "env = HarlowExperimentEnv()\n",
    "\n",
    "#define agent and optimizer\n",
    "agent = ActorCritic(hidden_size=20).to(device)  # Move the agent to the device\n",
    "optimizer_func = optim.RMSprop\n",
    "\n",
    "#calculate score\n",
    "score = train_evaluate_agent(env, agent, optimizer_func)\n",
    "print(f\"Score is {score}.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Can we think of a way to improve the network's learning of the Harlow tasks?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_a2c_agent\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "---\n",
    "\n",
    "# Section 2: Baldwin Effect\n",
    "\n",
    "*Estimated timing to here from start of tutorial: 20 minutes*\n",
    "\n",
    "This section introduces the meta-nature of the Harlow experiment and how it can be related to concepts we learned in the previous meta-learning tutorial. It also discusses the Baldwin effect in evolutionary biology and proposes that you code its implementation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Video 3: Baldwin Effect\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "remove-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Video 3: Baldwin Effect\n",
    "\n",
    "from ipywidgets import widgets\n",
    "from IPython.display import YouTubeVideo\n",
    "from IPython.display import IFrame\n",
    "from IPython.display import display\n",
    "\n",
    "class PlayVideo(IFrame):\n",
    "  def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
    "    self.id = id\n",
    "    if source == 'Bilibili':\n",
    "      src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
    "    elif source == 'Osf':\n",
    "      src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
    "    super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
    "\n",
    "def display_videos(video_ids, W=400, H=300, fs=1):\n",
    "  tab_contents = []\n",
    "  for i, video_id in enumerate(video_ids):\n",
    "    out = widgets.Output()\n",
    "    with out:\n",
    "      if video_ids[i][0] == 'Youtube':\n",
    "        video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
    "                             height=H, fs=fs, rel=0)\n",
    "        print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
    "      else:\n",
    "        video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
    "                          height=H, fs=fs, autoplay=False)\n",
    "        if video_ids[i][0] == 'Bilibili':\n",
    "          print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
    "        elif video_ids[i][0] == 'Osf':\n",
    "          print(f'Video available at https://osf.io/{video.id}')\n",
    "      display(video)\n",
    "    tab_contents.append(out)\n",
    "  return tab_contents\n",
    "\n",
    "video_ids = [('Youtube', 'p1xAn6V28Yo'), ('Bilibili', 'BV1zJ4m1g7px')]\n",
    "tab_contents = display_videos(video_ids, W=730, H=410)\n",
    "tabs = widgets.Tab()\n",
    "tabs.children = tab_contents\n",
    "for i in range(len(tab_contents)):\n",
    "  tabs.set_title(i, video_ids[i][0])\n",
    "display(tabs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_baldwin_effect\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Coding Exercise 1: Agent's Rate to Learn\n",
    "\n",
    "As introduced in the video, the Baldwin effect argues that we don't inherit the features/weights that make us good at specific tasks but rather the ability to learn quickly to gain the needed features in the context of the tasks we face during our own lifetime. This way, evolution works like the outer loop in a meta-learning context.\n",
    "\n",
    "In the next section, we will implement this evolutionary approach to meta-learning. But first, we need to write a function that lets us evaluate how well an agent can learn each instantiation of the Harlow task. This looks something like this:\n",
    "\n",
    "\n",
    "\\begin{align*}\n",
    "& [1] \\: \\theta \\: \\text{- network parameters} \\\\\n",
    "& [2] \\: \\text{Sample batch of tasks }\\tau_i \\sim p(\\tau) \\\\\n",
    "& [3] \\: \\text{for all }\\tau_i \\text{ do} \\\\\n",
    "& [4] \\: \\quad\\quad \\theta_i \\leftarrow \\theta \\\\\n",
    "& [5] \\: \\quad\\quad \\text{for} \\: k \\: \\text{in range(number of gradient steps)}  \\\\\n",
    "& [6] \\: \\quad\\quad\\quad\\quad\\text{Evaluate }\\nabla_{\\theta_i} \\mathcal{L}_{\\tau_i}(\\theta_i) \\\\\n",
    "& [7] \\: \\quad\\quad\\quad\\quad\\text{Compute adapted parameters with gradient descent:} \\: \\: \\theta_i \\leftarrow \\theta_i - \\alpha \\nabla_{\\theta_i} \\mathcal{L}_{\\tau_i}(\\theta_i) \\\\\n",
    "& [8] \\: \\quad\\quad\\text{end for} \\\\\n",
    "& [9] \\: \\quad\\quad\\text{Calculate score of updated agent on this task }f_i = \\text{score}(\\tau_i(\\theta_i)) \\\\\n",
    "& [10] \\: \\text{end for} \\\\\n",
    "& [11] \\: \\text{Score of the agent for all tasks is} \\: F = \\sum_{i} f_i \\\\\n",
    "\\end{align*}\n",
    "\n",
    "At first, we sample a bunch of tasks from the environment (different pairs of objects; line [2]). The crucial concept involved in this algorithm is preserved in line [4], where for each new task, we don't start with updated parameters but with the ones we had before training and evaluating the agent. Then, we perform training for the defined number of gradient steps and evaluate the agent's performance on this same task (we have defined this function in the second section of the tutorial; it basically covers lines [5] - [9]). One task is not enough to evaluate the agent's ability to learn quickly — this is why we sampled a bunch of them, and the general score for the agent is defined as the sum of rewards for all tasks.\n",
    "\n",
    "In the coding exercise, you are invited to complete the implementation of the evaluation of a randomly created agent on 10 tasks (thus, the maximum score that can be obtained is: 10 (number of tasks) x 20 (number of evaluation trials per task) = 200). In the next section of the tutorial, we will provide the evolutionary framework in which we are going to learn \"base\" or \"starting\" weights.\n",
    "\n",
    "Observe the box plot of the scores as well as their sum."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "def evaluate_individual(env, agent, optimizer_func, num_tasks = 10, num_gradient_steps = 25, num_trials = 6, num_evaluation_trials = 20):\n",
    "    \"\"\"Training and evaluation for agent in Harlow experiment environment for the bunch of tasks (thus measuring overall potential for agent's generalization across the tasks).\n",
    "    Evaluation goes only after all gradient steps.\n",
    "\n",
    "    Inputs:\n",
    "    - env (HarlowExperimentEnv): environment.\n",
    "    - agent (ActorCritic): particular instance of Actor Critic agent to train.\n",
    "    - optimizer_func (torch.Optim): optimizer to use for training.\n",
    "    - num_tasks (int, default = 10): number of tasks to evaluate agent on.\n",
    "    - num_gradient_steps (int, default = 25): number of gradient steps to perform.\n",
    "    - num_trials (int, default = 6): number of times the agent is exposed to the environment per gradient step to be trained .\n",
    "    - num_evaluation_trials (int, default = 20): number of times the agent is exposed to the environment to evaluate it (no training happend during this phase).\n",
    "\n",
    "    Outputs:\n",
    "    - scores (list): list of scores obtained during evaluation on the specific tasks.\n",
    "    \"\"\"\n",
    "    ###################################################################\n",
    "    ## Fill out the following then remove\n",
    "    raise NotImplementedError(\"Student exercise: complete evaluation function with Baldwin effect in mind.\")\n",
    "    ###################################################################\n",
    "    scores = []\n",
    "    for _ in range(num_tasks): #lines[2-3]; notice that environment resets inside `train_evaluate_agent`\n",
    "      agent_copy = copy.deepcopy(...) #line[4]; remember that we don't want to change agent's parameters!\n",
    "      score = train_evaluate_agent(env, ..., optimizer_func, num_gradient_steps, num_trials, num_evaluation_trials)\n",
    "      scores.append(score)\n",
    "    return np.sum(scores), scores\n",
    "\n",
    "set_seed(42)\n",
    "\n",
    "#define environment\n",
    "env = HarlowExperimentEnv()\n",
    "\n",
    "#define agent and optimizer\n",
    "agent = ActorCritic(hidden_size = 20)\n",
    "optimizer_func = optim.RMSprop\n",
    "\n",
    "#calculate score\n",
    "total_score, scores = evaluate_individual(env, agent, optimizer_func)\n",
    "print(f\"Total score is {total_score}.\")\n",
    "plot_boxplot_scores(scores)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "[*Click for solution*](https://github.com/neuromatch/NeuroAI_Course/tree/main/tutorials/W2D4_Macrolearning/solutions/W2D4_Tutorial4_Solution_a47b2770.py)\n",
    "\n",
    "*Example output:*\n",
    "\n",
    "<img alt='Solution hint' align='left' width=777.0 height=577.0 src=https://raw.githubusercontent.com/neuromatch/NeuroAI_Course/main/tutorials/W2D4_Macrolearning/static/W2D4_Tutorial4_Solution_a47b2770_1.png>\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "Not surprisingly, this random agent does not perform very well."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_agents_rate_to_learn\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Coding Exercise 2: Genetic Algorithm\n",
    "\n",
    "Genetic algorithms (GA) mimic some of the evolutionary processes while generating better and better (= more desired) individuals in the population. At first, we initialize a population that consists of randomly defined agents (so we have a list of `population_size` A2C agents, which will, in the very end, evolve into agents that quickly learn the new task from the Harlow experiment environment). Each epoch (which is the classical term for machine learning) is defined as a generation in GA as we generate new individuals in the population. For each generation, we choose top-score individuals from the population (of size `tournament_size`; it should be big enough to preserve diversity and not too big for selecting top-score ones; this is exactly where selection occurs and it is the only such place in the whole algorithm) and then we select a random batch of these high-performing individuals of size `parents_num`. From those, we create offspring of size `new_generation_new_individuals`, which will replace random individuals from the population. Notice that we replace random individuals with new ones without comparing their performance (it is just one of the paradigms in genetic algorithms). We continue running generations until we are happy with the best-fit individual appearing in the population or until we run out of time (reaching the maximum number of generations).\n",
    "\n",
    "The funniest part happens when we create offspring—simulating evolutionary processes, we randomly select two parents (two agents), and for each of the layers in their networks, we randomly select which one will go to the child (simulating crossing over), and then we add Gaussian noise to each of the layers (simulating mutation).\n",
    "\n",
    "The following cells consist of 3 functions:\n",
    "\n",
    "- `create_initial_population`, which creates a population and evaluates each individual by their score;"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "def create_initial_population(env, optimizer_func, population_size=50, hidden_size=20):\n",
    "    \"\"\"\n",
    "    Creates an initial population of agents.\n",
    "\n",
    "    Inputs:\n",
    "    - env (HarlowExperimentEnv): environment.\n",
    "    - optimizer_func (torch.Optim): optimizer to use for training.\n",
    "    - population_size (int, default = 50): the size of the initial population.\n",
    "    - hidden_size (int, default = 20): the size of LSTM layer in A2C agent.\n",
    "\n",
    "    Outputs:\n",
    "    - population (list): initial population which consists of tuples (agent, score).\n",
    "    - best_score (int): the best score for the individual in the population registered so far.\n",
    "    \"\"\"\n",
    "    population = []\n",
    "    total_score = 0\n",
    "    best_score = 0\n",
    "\n",
    "    for _ in tqdm(range(population_size), desc=\"Creating Initial Population\"):\n",
    "        agent = ActorCritic(hidden_size)\n",
    "        score, _ = evaluate_individual(env, agent, optimizer_func)\n",
    "        best_score = max(best_score, score)\n",
    "        total_score += score\n",
    "        population.append((agent, score))\n",
    "\n",
    "    print(f\"Generation: 0, mean population score: {total_score / population_size}, best score: {best_score}\")\n",
    "\n",
    "    return population, best_score"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "- `create_new_agent`, which performs operations of crossing over and mutation on parents' networks to create the offspring;\n",
    "  \n",
    "The first cell defines the noise constants to be used for each of the (hyper)parameters while mutating them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "#for mutation of (hyper)parameters\n",
    "parameters_noise = 0.02\n",
    "learning_rate_noise = 0.00005\n",
    "discount_factor_noise = 0.01\n",
    "state_value_estimate_cost_noise = 0.05\n",
    "entropy_cost_noise = 0.001"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "def create_new_agent(agent1, agent2):\n",
    "    \"\"\"\n",
    "    Creates new agent using crossing over technique over layers of network and mutation of the parameters with Gaussian noise.\n",
    "\n",
    "    Inputs:\n",
    "    - agent1 (ActorCritic): first parent agent.\n",
    "    - agent2 (ActorCritic): second parent agent.\n",
    "\n",
    "    Outputs:\n",
    "    - new_agent (ActorCritic): new agent which is offspring of the given two.\n",
    "    \"\"\"\n",
    "    #creates agent as copy of the first one\n",
    "    new_agent = copy.deepcopy(agent1)\n",
    "\n",
    "    #evolving network parameters with crossing over (over separate layes) & mutating (Gaussian noise)\n",
    "    for name, module in new_agent.named_modules():\n",
    "        if isinstance(module, nn.Linear):\n",
    "            if random.random() < 0.5:\n",
    "                module.weight.data = agent2._modules[name].weight.data\n",
    "                module.bias.data = agent2._modules[name].bias.data\n",
    "            #add noise\n",
    "            module.weight.data += torch.randn_like(module.weight.data) * parameters_noise\n",
    "            module.bias.data += torch.randn_like(module.bias.data) * parameters_noise\n",
    "        elif isinstance(module, nn.LSTM):\n",
    "            if random.random() < 0.5:\n",
    "                module.weight_ih_l0.data = agent2._modules[name].weight_ih_l0.data\n",
    "                module.weight_hh_l0.data = agent2._modules[name].weight_hh_l0.data\n",
    "                module.bias_ih_l0.data = agent2._modules[name].bias_ih_l0.data\n",
    "                module.bias_hh_l0.data = agent2._modules[name].bias_hh_l0.data\n",
    "            #add noise\n",
    "            module.weight_ih_l0.data += torch.randn_like(module.weight_ih_l0.data) * parameters_noise\n",
    "            module.weight_hh_l0.data += torch.randn_like(module.weight_hh_l0.data) * parameters_noise\n",
    "            module.bias_ih_l0.data += torch.randn_like(module.bias_ih_l0.data) * parameters_noise\n",
    "            module.bias_hh_l0.data += torch.randn_like(module.bias_hh_l0.data) * parameters_noise\n",
    "\n",
    "    #evolving & mutating hyperparameters\n",
    "    if random.random() < 0.5:\n",
    "        new_agent.learning_rate = agent2.learning_rate\n",
    "    new_agent.learning_rate += np.random.normal(size = 1).item() * learning_rate_noise\n",
    "    new_agent.learning_rate = min(max(new_agent.learning_rate, 0.0001), 0.01)\n",
    "    if random.random() < 0.5:\n",
    "        new_agent.discount_factor = agent2.discount_factor\n",
    "    new_agent.discount_factor += np.random.normal(size = 1).item() * discount_factor_noise\n",
    "    new_agent.discount_factor = min(max(new_agent.discount_factor, 0.6), 0.99)\n",
    "    if random.random() < 0.5:\n",
    "        new_agent.state_value_estimate_cost = agent2.state_value_estimate_cost\n",
    "    new_agent.state_value_estimate_cost += np.random.normal(size = 1).item() * state_value_estimate_cost_noise\n",
    "    new_agent.state_value_estimate_cost = min(max(new_agent.discount_factor, 0.1), 0.7)\n",
    "    if random.random() < 0.5:\n",
    "        new_agent.entropy_cost = agent2.entropy_cost\n",
    "    new_agent.entropy_cost += np.random.normal(size = 1).item() * entropy_cost_noise\n",
    "    new_agent.entropy_cost = min(max(new_agent.discount_factor, 0.0001), 0.05)\n",
    "\n",
    "    return new_agent"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "- `update_population`, which deletes random individuals at the end of the generation and evaluates and adds new ones.\n",
    "\n",
    "Your task is to complete this function!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "def update_population(env, optimizer_func, population, parents_population, best_score, new_generation_new_individuals = 5):\n",
    "    \"\"\"\n",
    "    Updates population with new individuals which are the result of crossing over and mutation of two parents agents.\n",
    "    Removes the same amount of random agents from the population.\n",
    "\n",
    "    Inputs:\n",
    "    - env (HarlowExperimentEnv): environment.\n",
    "    - optimizer_func (torch.Optim): optimizer to use for training.\n",
    "    - population (list): current population which consists of tuples (agent, score).\n",
    "    - parents_population (list) : parents individuals (part of current population) for creating new individuals.\n",
    "    - best_score (int): the best score for the individual in the population registered so far.\n",
    "    - new_generation_new_individuals (int, default = 5): the number of individuals to create (and the old ones to remove).\n",
    "    \"\"\"\n",
    "    ###################################################################\n",
    "    ## Fill out the following then remove\n",
    "    raise NotImplementedError(\"Student exercise: complete update of the population logic.\")\n",
    "    ###################################################################\n",
    "\n",
    "    # Create new individuals with progress bar\n",
    "    new_individuals = []\n",
    "    for _ in tqdm(range(new_generation_new_individuals), desc=\"Creating New Individuals\"):\n",
    "        agent1, agent2 = random.choices(..., k=2)\n",
    "        new_agent = create_new_agent(agent1[0], agent2[0])\n",
    "        score, _ = evaluate_individual(env, ..., optimizer_func)\n",
    "        # Evaluate whether best score has increased\n",
    "        best_score = max(score, best_score)\n",
    "        new_individuals.append((new_agent, score))\n",
    "\n",
    "    # Remove random old individuals with progress bar\n",
    "    for _ in tqdm(range(new_generation_new_individuals), desc=\"Removing Old Individuals\"):\n",
    "        population.pop(random.randint(0, len(population) - 1))\n",
    "\n",
    "    return population + new_individuals, best_score"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "[*Click for solution*](https://github.com/neuromatch/NeuroAI_Course/tree/main/tutorials/W2D4_Macrolearning/solutions/W2D4_Tutorial4_Solution_2bbabefe.py)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "To get the desired results of the genetic algorithm, one should wait for the population to evolve enough. Unfortunately, we don't have that much time, so to see the initial results, we will only run for 1 generation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {}
   },
   "outputs": [],
   "source": [
    "# Selection - random\n",
    "random.seed(42)\n",
    "\n",
    "# Define environment\n",
    "env = HarlowExperimentEnv()\n",
    "\n",
    "# Define agent and optimizer\n",
    "agent = ActorCritic(hidden_size=20)\n",
    "optimizer_func = optim.RMSprop\n",
    "\n",
    "# GA constants\n",
    "num_generations = 1\n",
    "tournament_size = 20\n",
    "parents_size = 4\n",
    "new_generation_new_individuals = 5\n",
    "\n",
    "mean_population_scores = []\n",
    "\n",
    "# Measure the time for creating the initial population\n",
    "population, best_score = create_initial_population(env, optimizer_func)\n",
    "\n",
    "for generation in range(1, num_generations + 1):\n",
    "\n",
    "    # Selection\n",
    "    sorted_population = sorted(population, key=lambda x: x[1], reverse=True)\n",
    "    tournament_population = sorted_population[:tournament_size]\n",
    "\n",
    "    # Random choice of parents from tournament population\n",
    "    parents_population = random.choices(tournament_population, k=parents_size)\n",
    "\n",
    "    # Update population\n",
    "    population, best_score = update_population(env, optimizer_func, population, parents_population, best_score)\n",
    "    mean_population_scores.append(np.mean([agent_score[1] for agent_score in population]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    " If you change `num_generations` to 800 in the previous code cell, the plot for the mean score in the population will roughly take the following form.\n",
    "\n",
    "![Picture which depicts the plot of mean scores per generation.](https://github.com/neuromatch/NeuroAI_Course/blob/main/tutorials/W2D4_Macrolearning/static/evolution.png?raw=true)\n",
    "\n",
    "At the very start of the tutorial, we downloaded the best agent we obtained from training on 800 generations (you can get the same if you add extra infrastructure code to \"catch\" an agent as soon as its score reaches some threshold value; in this case, it can even be set up to 200). In the next section, we are going to compare its performance with a randomly initialized agent and observe that, indeed, during evolutionary processes, we developed agents with parameters that are able to learn more quickly."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_genetic_algorithm\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Think!: Evolutionary Theories in Code\n",
    "\n",
    "In this section, we have observed how the Baldwin effect evolves individuals' parameters so that they quickly learn. We would like to propose that you think about other evolutionary biology ideas. For example, what would this process look like if we took a [Lamarckian approach](https://en.wikipedia.org/wiki/Lamarckism)? Discuss what should be changed in the implementation to use these ideas (simply put, what parts of the code should be changed to implement Lamarckian inheritance).\n",
    "\n",
    "Take time to think and then discuss as a group."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "execution": {}
   },
   "source": [
    "[*Click for solution*](https://github.com/neuromatch/NeuroAI_Course/tree/main/tutorials/W2D4_Macrolearning/solutions/W2D4_Tutorial4_Solution_fa5c39c8.py)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_evolutionary_theories_in_code\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "---\n",
    "\n",
    "# Section 3: Newbie & Experienced Bird\n",
    "\n",
    "*Estimated timing to here from start of tutorial: 50 minutes*\n",
    "\n",
    "This section proposes a comparison of the evolutionary trained (found) agent that performs the Harlow experiment with the previously mentioned model, which is initialized from scratch (and thus only trains on the given task but does not benefit from meta-learning)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  Make sure you execute this cell to observe the plot!\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Make sure you execute this cell to observe the plot!\n",
    "set_seed(42)\n",
    "\n",
    "#define environment\n",
    "env = HarlowExperimentEnv()\n",
    "\n",
    "#define newbie agent and optimizer\n",
    "newbie = ActorCritic(hidden_size = 20)\n",
    "optimizer_func = optim.RMSprop\n",
    "\n",
    "#calculate newbie's score\n",
    "total_score, newbie_scores = evaluate_individual(env, newbie, optimizer_func)\n",
    "print(f\"Total score of newbie agent is {total_score}.\")\n",
    "\n",
    "#define experienced agent\n",
    "experienced = torch.load(\"Evolution.pt\")\n",
    "\n",
    "#calculate experienced's score\n",
    "total_score, experienced_scores = evaluate_individual(env, experienced, optimizer_func)\n",
    "print(f\"Total score of experienced agent is {total_score}.\")\n",
    "\n",
    "plot_two_boxplot_scores(newbie_scores, experienced_scores)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "## Why biology needs meta-learning\n",
    "\n",
    "If aspects of the environment change over time, previously learned hard-coded strategies might not work anymore. Still, as in the previous tutorial, these changes share similar features across the tasks. The Harlow experiment illustrates that although there is no direct instruction on how to obtain the maximum possible reward, and the environment's state changes with each new pair of objects, the agent is still able to capture the meta-structure of the experiment—the reward is associated with the object, not its relative placement. Only one trial is needed to identify which of the two new objects is rewarded."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  Submit your feedback\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Submit your feedback\n",
    "content_review(f\"{feedback_prefix}_newbie_experienced\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "---\n",
    "\n",
    "# Section 4: An alternative model of how the brain solves the Harlow experiment\n",
    "\n",
    "*Estimated timing to here from start of tutorial: 60 minutes*\n",
    "\n",
    "There are no code snippets or exercises in this section. However, you can watch the video to learn about an alternative approach to biological learning, demonstrating how learning within a single brain can work in the Harlow experiment."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  Video 4: An Alternative Model\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "form",
    "execution": {},
    "tags": [
     "remove-input"
    ]
   },
   "outputs": [],
   "source": [
    "# @title Video 4: An Alternative Model\n",
    "\n",
    "from ipywidgets import widgets\n",
    "from IPython.display import YouTubeVideo\n",
    "from IPython.display import IFrame\n",
    "from IPython.display import display\n",
    "\n",
    "class PlayVideo(IFrame):\n",
    "  def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n",
    "    self.id = id\n",
    "    if source == 'Bilibili':\n",
    "      src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n",
    "    elif source == 'Osf':\n",
    "      src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n",
    "    super(PlayVideo, self).__init__(src, width, height, **kwargs)\n",
    "\n",
    "def display_videos(video_ids, W=400, H=300, fs=1):\n",
    "  tab_contents = []\n",
    "  for i, video_id in enumerate(video_ids):\n",
    "    out = widgets.Output()\n",
    "    with out:\n",
    "      if video_ids[i][0] == 'Youtube':\n",
    "        video = YouTubeVideo(id=video_ids[i][1], width=W,\n",
    "                             height=H, fs=fs, rel=0)\n",
    "        print(f'Video available at https://youtube.com/watch?v={video.id}')\n",
    "      else:\n",
    "        video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n",
    "                          height=H, fs=fs, autoplay=False)\n",
    "        if video_ids[i][0] == 'Bilibili':\n",
    "          print(f'Video available at https://www.bilibili.com/video/{video.id}')\n",
    "        elif video_ids[i][0] == 'Osf':\n",
    "          print(f'Video available at https://osf.io/{video.id}')\n",
    "      display(video)\n",
    "    tab_contents.append(out)\n",
    "  return tab_contents\n",
    "\n",
    "video_ids = [('Youtube', 'PuwoPug9tOw'), ('Bilibili', 'BV18z421b7Vi')]\n",
    "tab_contents = display_videos(video_ids, W=730, H=410)\n",
    "tabs = widgets.Tab()\n",
    "tabs.children = tab_contents\n",
    "for i in range(len(tab_contents)):\n",
    "  tabs.set_title(i, video_ids[i][0])\n",
    "display(tabs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "execution": {}
   },
   "source": [
    "---\n",
    "# Summary\n",
    "\n",
    "*Estimated timing of tutorial: 70 minutes*\n",
    "\n",
    "Here is a summary of what we've learned:\n",
    "\n",
    "1. The Baldwin effect says that evolution will select organisms that are good at learning.\n",
    "\n",
    "2. We can use evolutionary/genetic algorithms to replicate this process. This is the \"outer loop\" of a meta-learning problem\n",
    "\n",
    "3. To be more biologically plausible, we can use reinforcement learning as the inner loop.\n",
    "\n",
    "4. This process creates agents that can quickly find the rewarding object in a Harlow experiment."
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "include_colab_link": true,
   "name": "W2D4_Tutorial4",
   "provenance": [],
   "toc_visible": true
  },
  "kernel": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.19"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}