Open In Colab   Open in Kaggle


Macrocircuits: Leveraging neural architectural priors and modularity in embodied agents

By Neuromatch Academy

Content creators: Divyansha Lachi, Kseniia Shilova

Content reviewers: Eva Dyer, Hannah Choi

Production editors: Konstantine Tsafatinos, Ella Batty, Spiros Chavlis, Samuele Bolotta, Hlib Solodzhuk


This project explores how we can build a biologically inspired artificial neural network (ANN) architecture, derived from the C. Elegans motor circuit, for the control of a simulated Swimmer agent. Traditional motor control ANNs often rely on generic, fully connected multilayer perceptrons (MLPs), which demand extensive training data, offer limited transferability, and possess complex internal dynamics that challenge interpretability. The project aims to understand how the biologically motivated ANN, which is shaped by evolution to be highly structured and sparse, could help to solve these problems and provide advantages in the domain of motor control. We will train MLPs using algorithms such as PPO, DDPG, and ES, and compare their performance in terms of rewards and sample efficiency with our bio-inspired ANN. The project also includes visualizing the C. Elegans connectome and building the network using this circuitry. We will conduct various ablation analyses by removing sign and weight-sharing constraints, and altering environmental parameters like the swimmer’s length or viscosity. These investigations aim to understand how architecture and modularity impact performance and learning across different environments. Finally, the project aims at building an agent that is robust to environmental variations, navigating towards specific targets, and enhancing our understanding of bio-inspired motor control.

Relevant references:

This notebook uses code from the following GitHub repository: ncap by Nikhil X. Bhattasali and Anthony M. Zador and Tatiana A. Engel.

Infrastructure note: This notebook contains GPU install guide as well as CPU ones for different OS.

Install and import feedback gadget#

Hide code cell source
# @title Install and import feedback gadget

!pip install vibecheck datatops --quiet

from vibecheck import DatatopsContentReviewContainer
def content_review(notebook_section: str):
    return DatatopsContentReviewContainer(
        "",  # No text prompt
            "url": "",
            "name": "neuromatch_neuroai",
            "user_key": "wb2cxze8",

feedback_prefix = "Project_Macrocircuits"

Project Background#

Submit your feedback#

Hide code cell source
# @title Submit your feedback

Project slides#

If you want to download the slides:

Project Template#

Hide code cell source
#@title Project Template
from IPython.display import Image, display
import os
from pathlib import Path

url = ""


Submit your feedback#

Hide code cell source
# @title Submit your feedback

Tutorial links

This particular project connects a couple of distinct ideas explored throughout the course. Firstly, the innate ability to learn a certain set of actions quickly is the main topic of Tutorial 4 for W2D4 on biological meta-learning. The focus comes with the observation that the brain is not of a generic architecture but is a highly structured and optimized hierarchy of modules, the importance of which is highlighted in Tutorial 3 for W2D1, forming inductive bias for efficient motor control. The default model for the agent used here is already known Actor-Critic; you had the opportunity to observe in already mentioned tutorials as well as in Tutorial 3 for W1D2.

Scetion 0: Initial setup#

IF USING COLAB (recommended):

Uncomment the cell below and run it.

Installing Dependencies (Colab GPU case, uncomment if you want to use this one)#

Hide code cell source
#@title Installing Dependencies (Colab GPU case, uncomment if you want to use this one)

# import distutils.util
# import os
# import subprocess
# if'nvidia-smi').returncode:
#   raise RuntimeError(
#       'Cannot communicate with GPU. '
#       'Make sure you are using a GPU Colab runtime. '
#       'Go to the Runtime menu and select Choose runtime type.')

# # Add an ICD config so that glvnd can pick up the Nvidia EGL driver.
# # This is usually installed as part of an Nvidia driver package, but the Colab
# # kernel doesn't install its driver via APT, and as a result the ICD is missing.
# # (
# NVIDIA_ICD_CONFIG_PATH = '/usr/share/glvnd/egl_vendor.d/10_nvidia.json'
# if not os.path.exists(NVIDIA_ICD_CONFIG_PATH):
#   with open(NVIDIA_ICD_CONFIG_PATH, 'w') as f:
#     f.write("""{
#     "file_format_version" : "1.0.0",
#     "ICD" : {
#         "library_path" : ""
#     }
# }
# """)

# print('Installing dm_control...')
# !pip install -q dm_control>=1.0.16

# # Configure dm_control to use the EGL rendering backend (requires GPU)
# %env MUJOCO_GL=egl

# !echo Installed dm_control $(pip show dm_control | grep -Po "(?<=Version: ).+")
# !pip install -q dm-acme[envs]
# !mkdir output_videos

IF USING KAGGLE (recommended):

Uncomment the cell below and run it.

Installing Dependencies (Kaggle GPU case, uncomment if you want to use this one)#

Hide code cell source
#@title Installing Dependencies (Kaggle GPU case, uncomment if you want to use this one)

# import subprocess

#["sudo", "apt-get", "install", "-y", "libgl1-mesa-glx", "libosmesa6"])
#["pip", "install", "-q", "imageio[ffmpeg]"])

# print('Installing dm_control...')
# !pip install -q dm_control>=1.0.16

# %env MUJOCO_GL=osmesa

# !echo Installed dm_control $(pip show dm_control | grep -Po "(?<=Version: ).+")
# !pip install -q dm-acme[envs]
# !mkdir output_videos


Uncomment the relevant lines of code depending on your OS.

Installing Dependencies (CPU case, comment if you want to use GPU one)#

Hide code cell source
#@title Installing Dependencies (CPU case, comment if you want to use GPU one)

import subprocess
import os

############### For Linux #####################
#["sudo", "apt-get", "install", "-y", "libglew-dev"])
#["sudo", "apt-get", "install", "-y", "libglfw3"])
#["sudo", "apt", "install", "ffmpeg"])

############### For MacOS #####################
#["brew", "install", "glew"])
#["brew", "install", "glfw"])
###############################################["pip", "install", "-q", "ffmpeg"])["pip", "install", "-q", "dm-acme[envs]"])["pip", "install", "-q", "dm_control>=1.0.16"])

!mkdir output_videos

Imports and Utility Functions

Importing Libraries#

Hide code cell source
#@title Importing Libraries
import numpy as np
import collections
import argparse
import os
import yaml
import typing as T
import imageio
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import pandas as pd
import seaborn as sns
from IPython.display import HTML

import dm_control as dm
import dm_control.suite.swimmer as swimmer
from dm_control.rl import control
from dm_control.utils import rewards
from dm_control import suite
from dm_control.suite.wrappers import pixels

from acme import wrappers

from torch import nn

Utility code for displaying videos#

Hide code cell source
#@title Utility code for displaying videos
def write_video(
  filepath: os.PathLike,
  frames: T.Iterable[np.ndarray],
  fps: int = 60,
  macro_block_size: T.Optional[int] = None,
  quality: int = 10,
  verbose: bool = False,
  Saves a sequence of frames as a video file.

  - filepath (os.PathLike): Path to save the video file.
  - frames (Iterable[np.ndarray]): An iterable of frames, where each frame is a numpy array.
  - fps (int, optional): Frames per second, defaults to 60.
  - macro_block_size (Optional[int], optional): Macro block size for video encoding, can affect compression efficiency.
  - quality (int, optional): Quality of the output video, higher values indicate better quality.
  - verbose (bool, optional): If True, prints the file path where the video is saved.
  - **kwargs: Additional keyword arguments passed to the imageio.get_writer function.

  None. The video is written to the specified filepath.

  with imageio.get_writer(filepath,
                        **kwargs) as video:
    if verbose: print('Saving video to:', filepath)
    for frame in frames:

def display_video(
  frames: T.Iterable[np.ndarray],
  Displays a video within a Jupyter Notebook from an iterable of frames.

  - frames (Iterable[np.ndarray]): An iterable of frames, where each frame is a numpy array.
  - filename (str, optional): Temporary filename to save the video before display, defaults to 'output_videos/temp.mp4'.
  - fps (int, optional): Frames per second for the video display, defaults to 60.
  - **kwargs: Additional keyword arguments passed to the write_video function.

  HTML object: An HTML video element that can be displayed in a Jupyter Notebook.

  # Write video to a temporary file.
  filepath = os.path.abspath(filename)
  write_video(filepath, frames, fps=fps, verbose=False, **kwargs)

  height, width, _ = frames[0].shape
  dpi = 70
  orig_backend = matplotlib.get_backend()
  matplotlib.use('Agg')  # Switch to headless 'Agg' to inhibit figure rendering.
  fig, ax = plt.subplots(1, 1, figsize=(width / dpi, height / dpi), dpi=dpi)
  matplotlib.use(orig_backend)  # Switch back to the original backend.
  ax.set_position([0, 0, 1, 1])
  im = ax.imshow(frames[0])
  def update(frame):
    return [im]
  interval = 1000/fps
  anim = animation.FuncAnimation(fig=fig, func=update, frames=frames,
                                  interval=interval, blit=True, repeat=False)
  return HTML(anim.to_html5_video())

In this notebook we will explore the major components essential for this project.

  • Understanding the DeepMind Control Suite Swimmer Agent: We will begin by exploring the swimmer agent provided by the DeepMind Control Suite. This section includes a detailed exploration of the agent’s API, task customization capabilities, and how to adapt the environment to fit our experimental needs.

  • Training Models Using Various Reinforcement Learning Algorithms: Next, we move on to learn how can we train models for the agents we created. We will be using Tonic_RL library to train our model. We will first train a standard MLP model using the Proximal Policy Optimization (PPO) algorithm.

  • Training the NCAP model: Finally we will define the NCAP model from Neural Circuit Architectural Priors for Embodied Control paper. We will train it using PPO and compare it against the MLP model we trained before.

Submit your feedback#

Hide code cell source
# @title Submit your feedback

Section 1: Exploring the DeepMind Swimmer#

1.1 Create a basic swim task for the swimmer environment#

First, we’ll initialize a basic swimmer agent consisting of 6 links. Each agent requires a defined task and its corresponding reward function. In this instance, we’ve designed a swim forward task that involves the agent swimming forward in any direction.

The environment is flexible, allowing for modifications to introduce additional tasks such as “swim only in the x-direction” or “move towards a ball.”


def swim(
  """Returns the Swim task for a n-link swimmer."""
  model_string, assets = swimmer.get_model_and_assets(n_links)
  physics = swimmer.Physics.from_xml_string(model_string, assets=assets)
  task = Swim(desired_speed=desired_speed, random=random)
  return control.Environment(

class Swim(swimmer.Swimmer):
  """Task to swim forwards at the desired speed."""
  def __init__(self, desired_speed=_SWIM_SPEED, **kwargs):
    self._desired_speed = desired_speed

  def initialize_episode(self, physics):
    # Hide target by setting alpha to 0.
    physics.named.model.mat_rgba['target', 'a'] = 0
    physics.named.model.mat_rgba['target_default', 'a'] = 0
    physics.named.model.mat_rgba['target_highlight', 'a'] = 0

  def get_observation(self, physics):
    """Returns an observation of joint angles and body velocities."""
    obs = collections.OrderedDict()
    obs['joints'] = physics.joints()
    obs['body_velocities'] = physics.body_velocities()
    return obs

  def get_reward(self, physics):
    """Returns a smooth reward that is 0 when stopped or moving backwards, and rises linearly to 1
    when moving forwards at the desired speed."""
    forward_velocity =['head_vel'][1]
    return rewards.tolerance(
      bounds=(self._desired_speed, float('inf')),

1.2 Vizualizing an agent that takes random actions in the environment#

Let’s visualize the environment by executing a sequence of random actions on a swimmer agent. This involves applying random actions over a series of steps and compiling the rendered frames into a video to visualize the agent’s behavior.

""" Renders the current environment state to an image """
def render(env):
    return env.physics.render(camera_id=0, width=640, height=480)

""" Tests a DeepMind control suite environment by executing a series of random actions """
def test_dm_control(env):
    env = wrappers.CanonicalSpecWrapper(env, clip=True)
    env = wrappers.SinglePrecisionWrapper(env)

    spec = env.action_spec()
    timestep = env.reset()
    frames = [render(env)]

    for _ in range(60):
        action = np.random.uniform(low=spec.minimum, high=spec.maximum, size=spec.shape)
        timestep = env.step(action)
    return display_video(frames)

env = suite.load('swimmer', 'swim', task_kwargs={'random': 1})

1.3 Swimmer Agent API#

The observation space consists of 25 total dimensions, combining joint positions and body velocities, while the action space involves 5 dimensions representing normalized joint forces.

Observation Space: 4k - 1 total (k = 6 \(\rightarrow\) 23)

  • k - 1: joint positions \(q_i \in [-\pi, \pi]\) (joints)

  • 3k: link linear velocities \(vx_i, vy_i \in \mathbb{R}\) and rotational velocity \(wz_i \in \mathbb{R}\) (body_velocities)

              Array(shape=(5,), dtype=dtype('float64'), name='joints')),
              Array(shape=(18,), dtype=dtype('float64'), name='body_velocities'))])

Action Space: k - 1 total (k = 6 \(\rightarrow\) 5)

  • k - 1: joint normalized force \(\ddot{q}_i \in [-1, 1]\)

BoundedArray(shape=(5,), dtype=dtype('float64'), name=None, minimum=[-1. -1. -1. -1. -1.], maximum=[1. 1. 1. 1. 1.])

1.4 Example of simple modification to the agent#

Let’s make a new swimmer agent with 12 links instead of 6, introducing complexity. Additionally, we have the flexibility to adjust various other parameters.

def swim_12_links(
  """Returns the Swim task for a n-link swimmer."""
  model_string, assets = swimmer.get_model_and_assets(n_links)
  physics = swimmer.Physics.from_xml_string(model_string, assets=assets)
  task = Swim(desired_speed=desired_speed, random=random)
  return control.Environment(

env = suite.load('swimmer', 'swim_12_links', task_kwargs={'random': 1})

We can visualize this longer agent using our previously defined test_dm_control function.

Using the API provided by Deepmind we can create any kind of changes to the agent and the environment.

Try to make the following changes to make yourself more familiar with the swimmer.

  • Adding a target (like a ball) to this environment at some x distance away from the agent.

  • Increasing the viscosity of the environment.

Have a look at the following links to see what kind of assets you will need to modify to make these changes.

# add your code

Submit your feedback#

Hide code cell source
# @title Submit your feedback

Section 2: Training models on the swim task#

To train the agents we defined in the previous section, we will utilize standard reinforcement learning (RL) algorithms. For the purposes of this tutorial, we will employ the tonic_rl library, which provides a robust framework for training RL agents. Throughout most of this project, you will primarily be modifying the environment or the model architecture. Therefore, I suggest treating these algorithms as a “black box” for now. Simply put, you input an untrained model, and the algorithm processes and returns a well-trained model. This approach allows us to focus on the impact of different architectures and environmental settings without delving deeply into the algorithmic complexities at this stage.

Download and install tonic library for training agents#

Hide code cell source
#@title Download and install tonic library for training agents

import contextlib
import io

with contextlib.redirect_stdout(io.StringIO()): #to suppress output
    !git clone
    %cd tonic

Section 2.1 Defining the train function#

First we defined a general training function to train any agent on any given environment with a variety of available algorithms. Given below are some of the parameter definitions of the function. You’ll likely want to adjust these parameters to customize the training process for an agent in a specific environment using your chosen algorithm from the tonic library:

  • Header: Python code required to run before training begins, primarily for importing essential libraries or modules.

  • Agent: The agent that will undergo training; refer to section 3.2 and 4.2 for definitions of MLP and NCAP respectively.

  • Environment: The training environment for the agent. Ensure it is registered with the DeepMind Control Suite as detailed in section 2.

  • Name: The experiment’s name, which will be utilized for log and model saving purposes.

  • Trainer: The trainer instance selected for use. It allows the configuration of the training steps, model saving frequency, and other training-related parameters.

import tonic
import tonic.torch

def train(
  name = 'test',
  trainer = 'tonic.Trainer()',
  before_training = None,
  after_training = None,
  parallel = 1,
  sequential = 1,
  seed = 0
  Some additional parameters:

  - before_training: Python code to execute immediately before the training loop commences, suitable for setup actions needed after initialization but prior to training.
  - after_training: Python code to run once the training loop concludes, ideal for teardown or analytical purposes.
  - parallel: The count of environments to execute in parallel. Limited to 1 in a Colab notebook, but if additional resources are available, this number can be increased to expedite training.
  - sequential: The number of sequential steps the environment runs before sending observations back to the agent. This setting is useful for temporal batching. It can be disregarded for this tutorial's purposes.
  - seed: The experiment's random seed, guaranteeing the reproducibility of the training process.

  # Capture the arguments to save them, e.g. to play with the trained agent.
  args = dict(locals())

  # Run the header first, e.g. to load an ML framework.
  if header:

  # Build the train and test environments.
  _environment = environment
  environment = tonic.environments.distribute(lambda: eval(_environment), parallel, sequential)
  test_environment = tonic.environments.distribute(lambda: eval(_environment))

  # Build the agent.
  agent = eval(agent)
    action_space=test_environment.action_space, seed=seed)

  # Choose a name for the experiment.
  if hasattr(test_environment, 'name'):
    environment_name =
    environment_name = test_environment.__class__.__name__
  if not name:
    if hasattr(agent, 'name'):
      name =
      name = agent.__class__.__name__
    if parallel != 1 or sequential != 1:
      name += f'-{parallel}x{sequential}'

  # Initialize the logger to save data to the path environment/name/seed.
  path = os.path.join('data', 'local', 'experiments', 'tonic', environment_name, name)
  tonic.logger.initialize(path, script_path=None, config=args)

  # Build the trainer.
  trainer = eval(trainer)
  # Run some code before training.
  if before_training:

  # Train.

  # Run some code after training.
  if after_training:

Section 2.2 Training MLP model on swim task#

Now we are going to define a function for creating an actor-critic model suitable for Proximal Policy Optimization (PPO) using a Multi-Layer Perceptron (MLP) architecture.

from tonic.torch import models, normalizers
import torch

def ppo_mlp_model(
  actor_sizes=(64, 64),
  critic_sizes=(64, 64),

  Constructs an ActorCritic model with specified architectures for the actor and critic networks.

  - actor_sizes (tuple): Sizes of the layers in the actor MLP.
  - actor_activation (torch activation): Activation function used in the actor MLP.
  - critic_sizes (tuple): Sizes of the layers in the critic MLP.
  - critic_activation (torch activation): Activation function used in the critic MLP.

  - models.ActorCritic: An ActorCritic model comprising an actor and a critic with MLP torsos,
    equipped with a Gaussian policy head for the actor and a value head for the critic,
    along with observation normalization.

  return models.ActorCritic(
      torso=models.MLP(actor_sizes, actor_activation),
      torso=models.MLP(critic_sizes, critic_activation),

Next we call the train function which initiates the training process for the provided agent using the Tonic library. It specifies the components necessary for training, including the model, environment, and training parameters:

Agent: A Proximal Policy Optimization (PPO) agent with a custom Multi-Layer Perceptron (MLP) model architecture, configured with 256 units in each of two layers for both the actor and the critic.

Environment: The training environment is set to “swimmer-swim” from the Control Suite, a benchmark suite for continuous control tasks.

Name: The experiment is named ‘mlp_256’, which is useful for identifying logs and saved models associated with this training run.

Trainer: Specifies the training configuration, including the total number of steps (5e5) and the frequency of saving the model (1e5 steps).

Note: The model will checkpoint every ‘save_steps’ amount of training steps*

The model can take some time to train so feel free to skip the training for now. We have provided the pretrained model for you to play with. Move on to the next section to vizualize a agent with the pretrained model.

Uncomment the cell below if you want to perform the training.

# train('import tonic.torch',
#       'tonic.torch.agents.PPO(model=ppo_mlp_model(actor_sizes=(256, 256), critic_sizes=(256,256)))',
#       'tonic.environments.ControlSuite("swimmer-swim")',
#       name = 'mlp_256',
#       trainer = 'tonic.Trainer(steps=int(5e5),save_steps=int(1e5))')

Try playing with the parameters of the trainer and the MLP model and see how it affects the performance.

  • How do the actor and the critic model size affect the performance.

  • Consider increasing the number of steps in trainer to train the model for longer.

  • Explore Tonic library to see what algorithms we can use to train our agents. (D4PG is usually faster than PPO)

# add your code

Section 2.3 Function to run any model on the environment and generate video#

One of the most fun things about these environments is their visualization. We don’t want to just look at the reward to know how good our model is we want to see how well the agent swims. This is particularly important to avoid “reward hacking,” where an agent learns to exploit the reward system in ways that are unintended and potentially detrimental to the desired outcomes. Moreover visualizing the agent also help us understand where the model is going wrong.

Here we have defined a function that will generate the videos of the agent using the input model. The function requires path to the checkpoint folder and the environment you wanna run the trained model on.

def play_model(path, checkpoint='last',environment='default',seed=None, header=None):

    Plays a model within an environment and renders the gameplay to a video.

    - path (str): Path to the directory containing the model and checkpoints.
    - checkpoint (str): Specifies which checkpoint to use ('last', 'first', or a specific ID). 'none' indicates no checkpoint.
    - environment (str): The environment to use. 'default' uses the environment specified in the configuration file.
    - seed (int): Optional seed for reproducibility.
    - header (str): Optional Python code to execute before initializing the model, such as importing libraries.

  if checkpoint == 'none':
    # Use no checkpoint, the agent is freshly created.
    checkpoint_path = None
    tonic.logger.log('Not loading any weights')
    checkpoint_path = os.path.join(path, 'checkpoints')
    if not os.path.isdir(checkpoint_path):
      tonic.logger.error(f'{checkpoint_path} is not a directory')
      checkpoint_path = None

    # List all the checkpoints.
    checkpoint_ids = []
    for file in os.listdir(checkpoint_path):
      if file[:5] == 'step_':
        checkpoint_id = file.split('.')[0]

    if checkpoint_ids:
      if checkpoint == 'last':
        # Use the last checkpoint.
        checkpoint_id = max(checkpoint_ids)
        checkpoint_path = os.path.join(checkpoint_path, f'step_{checkpoint_id}')
      elif checkpoint == 'first':
        # Use the first checkpoint.
        checkpoint_id = min(checkpoint_ids)
        checkpoint_path = os.path.join(checkpoint_path, f'step_{checkpoint_id}')
        # Use the specified checkpoint.
        checkpoint_id = int(checkpoint)
        if checkpoint_id in checkpoint_ids:
          checkpoint_path = os.path.join(checkpoint_path, f'step_{checkpoint_id}')
          tonic.logger.error(f'Checkpoint {checkpoint_id} not found in {checkpoint_path}')
          checkpoint_path = None
      tonic.logger.error(f'No checkpoint found in {checkpoint_path}')
      checkpoint_path = None

  # Load the experiment configuration.
  arguments_path = os.path.join(path, 'config.yaml')
  with open(arguments_path, 'r') as config_file:
    config = yaml.load(config_file, Loader=yaml.FullLoader)
  config = argparse.Namespace(**config)

  # Run the header first, e.g. to load an ML framework.
    if config.header:
    if header:

  # Build the agent.
  agent = eval(config.agent)

  # Build the environment.
  if environment == 'default':
    environment  = tonic.environments.distribute(lambda: eval(config.environment))
    environment  = tonic.environments.distribute(lambda: eval(environment))
  if seed is not None:

  # Initialize the agent.

  # Load the weights of the agent form a checkpoint.
  if checkpoint_path:

  steps = 0
  test_observations = environment.start()
  frames = [environment.render('rgb_array',camera_id=0, width=640, height=480)[0]]
  score, length = 0, 0

  while True:
      # Select an action.
      actions = agent.test_step(test_observations, steps)
      assert not np.isnan(actions.sum())

      # Take a step in the environment.
      test_observations, infos = environment.step(actions)
      frames.append(environment.render('rgb_array',camera_id=0, width=640, height=480)[0])
      agent.test_update(**infos, steps=steps)

      score += infos['rewards'][0]
      length += 1

      if infos['resets'][0]:
  video_path = os.path.join(path, 'video.mp4')
  print('Reward for the run: ', score)
  return display_video(frames,video_path)

Let’s visualize the agent with a pretrained MLP model. Once you have your pretrained model, you can replace the experiment path to visualize the agent with your model.

# play_model('data/local/experiments/tonic/swimmer-swim/mlp_256')

Loading weights from data/local/experiments/tonic/swimmer-swim/pretrained_mlp_ppo/checkpoints/
Reward for the run:  975.623381242156