Tutorial 2: Continual learning

Tutorial 2: Continual learning#

Week 2, Day 4: Macro-Learning

By Neuromatch Academy

Content creators: Hlib Solodzhuk, Ximeng Mao, Grace Lindsay

Content reviewers: Aakash Agrawal, Alish Dipani, Hossein Rezaei, Yousef Ghanbari, Mostafa Abdollahi, Hlib Solodzhuk, Ximeng Mao, Samuele Bolotta, Grace Lindsay

Production editors: Konstantine Tsafatinos, Ella Batty, Spiros Chavlis, Samuele Bolotta, Hlib Solodzhuk

Tutorial Objectives#

Estimated timing of tutorial: 25 minutes

In this tutorial, you will observe how further training on new data or tasks causes forgetting of past tasks. You will also explore how different learning schedules impact performance.

Setup#

Install and import feedback gadget#

Imports#

Figure settings#

Plotting functions#

Set random seed#

Video 1: Continual learning#

Submit your feedback#

Section 1: Catastrophic forgetting#

In this section, we will discuss the concept of continual learning and argue that training on new data does not guarantee that the model will remember the relationships on which it was trained earlier.

Catastrophic forgetting occurs when a neural network, or any machine learning model, forgets the information it previously learned upon learning new data. This is particularly problematic in scenarios where a model needs to perform well across multiple types of data or tasks that evolve over time.
Continual learning is an approach in machine learning that aims to mitigate the issues of catastrophic forgetting. The goal is to develop algorithms that can adapt to new data while preserving knowledge about the old data. This is essential for applications where the model must adapt to changes dynamically without losing the ability to perform tasks it was previously trained on.

Coding Exercise 1: Fitting new data#

Let’s assume now that we want our model to predict not only the summer prices but also the autumn ones. We have already trained the MLP to predict summer months effectively but observed that it performs poorly during the autumn period. Let’s try to make the model learn new information about the prices and see whether it can remember both. First, we will need to retrain the model for this tutorial on summer days.

#define variables
A = .005
B = 0.1
phi = 0
C = 1

#define days (observe that those are not 1, ..., 365 but proxy ones to make model function neat)
days = np.arange(-26, 26 + 1/7, 1/7)
prices = A * days**2 + B * np.sin(np.pi * days + phi) + C

#take only summer data for intro-training
summer_days = np.expand_dims(days[151:243], 1)
summer_prices = prices[151:243]

#take autumn data for further training
autumn_days = np.expand_dims(days[243:334], 1)
autumn_prices = prices[243:334]

#divide summer data into train and test sets
summer_days_train, summer_days_test, summer_prices_train, summer_prices_test = train_test_split(summer_days, summer_prices, random_state = 42)

#divide autumn data into train and test sets
autumn_days_train, autumn_days_test, autumn_prices_train, autumn_prices_test = train_test_split(autumn_days, autumn_prices, random_state = 42)

#apply normalization for days (we take parameters for whole summer-autumn period)
days_mean, days_std = np.mean(days[151:334]), np.std(days[151:334])

summer_days_train_norm = (summer_days_train - days_mean) / days_std
summer_days_test_norm = (summer_days_test - days_mean) / days_std

#notice that normalization is taken from summer parameters as obviously model is going to forget old data if we reassign it (by making normalization the same)
autumn_days_train_norm = (autumn_days_train - days_mean) / days_std
autumn_days_test_norm = (autumn_days_test - days_mean) / days_std

#define MLP
base_model = MLPRegressor(hidden_layer_sizes=(100, 100), max_iter=10000, random_state = 42, solver = "lbfgs") # LBFGS is better to use when there is small amount of data

#train MLP
base_model.fit(summer_days_train_norm, summer_prices_train)

#evaluate MLP on test data
print(f"R-squared value is: {base_model.score(summer_days_test_norm, summer_prices_test):.02f}.")

R-squared value is: 0.67.

Now, let’s incrementally fit the autumn data to the trained model and monitor the R-squared values for the summer and autumn data test sets during each iteration. In the following code snippet, you are requested to complete further training using the partial_fit function, which allows us to train the existing model on new data (it runs through the data only once, and thus, we can control for the number of iterations). As we iterate through the epochs, we can append new R-squared values for each epoch. We will visually explore the performance of this model on both the summer and autumn datasets as well as R-squared values progression over epochs.

###################################################################
## Fill out the following then remove
raise NotImplementedError("Student exercise: complete partial fit and calculate r-squared values for both summer and autumn data")
###################################################################

# Initial r-squared calculations
summer_r_squared = [base_model.score(summer_days_test_norm, summer_prices_test)]
autumn_r_squared = [base_model.score(autumn_days_test_norm, autumn_prices_test)]
num_epochs = 10

# Progress bar integration with tqdm
for _ in tqdm(range(num_epochs - 1), desc="Training Progress"):

    # Fit new data for one epoch
    base_model.partial_fit(..., ...)

    # Calculate r-squared values on test sets
    summer_r_squared.append(base_model.score(..., ...))
    autumn_r_squared.append(base_model.score(..., ...))

model = base_model
plot_performance(num_epochs, summer_r_squared, autumn_r_squared)

#predict for test sets
summer_prices_predictions = model.predict(summer_days_test_norm)
autumn_prices_predictions = model.predict(autumn_days_test_norm)

plot_summer_autumn_predictions(summer_prices_predictions, autumn_prices_predictions)

Click for solution

Example output:

Notice how disruptive the change is for R-squared values — even one iteration is enough to drastically alter the performance. The model has learned to perform perfectly on the autumn data, while it completely messes up predictions for the summer days. Indeed, the model forgot the relationships for the old data and lost its predictive power while training on the new dataset. In the next section of the tutorial, we are going to explore a different approach—what if, instead of training sequentially, we train the model on both datasets together?

Submit your feedback#

Section 2: Joint training#

Estimated timing to here from start of tutorial: 10 minutes

In this section, we are going to explore whether joint training on both datasets simultaneously, specified in different formats, improves predictive performance, thus allowing the model to perform well in both summer and autumn.

Coding Exercise 2a: Sequential joint training#

In this coding exercise, let us take a look at the following setup: we will sample \(K\) distinct random training examples from summer data and \(K\) random examples from autumn, training the model in total on \(2K\) examples.

In sequential joint training, epochs of each data type are alternated. So, for example, the first epoch will be the \(K\) examples from summer data, and then the second will be the \(K\) examples from autumn data, then the summer data again, then autumn again, and so on.

Please complete the partial fits for the corresponding data to implement sequential joint training in the coding exercise. We are using the partial_fit function to control the number of times the model “meets” data. With every new iteration, the model isn’t a new one; it is partially fitted for all previous epochs.

def sequential_training(K: int, num_epochs: int):
    """
    Perform sequential training for a multi-layer perceptron (MLP) regression model.
    The function trains the model separately on the summer and autumn data in alternating epochs.

    Inputs:
        - K (int): The number of training examples to sample from each season (summer and autumn) in each epoch.
        - num_epochs (int): The number of training epochs.

    Returns:
        - model (MLPRegressor): The trained MLP regression model.
        - summer_r_squared (list): A list containing the R-squared values for the summer season at each epoch.
        - autumn_r_squared (list): A list containing the R-squared values for the autumn season at each epoch.
    """

    model = MLPRegressor(hidden_layer_sizes=(100, 100), max_iter=10000, random_state=42, solver="lbfgs")
    summer_r_squared = []
    autumn_r_squared = []

    for _ in tqdm(range(num_epochs // 2), desc="Training Progress"):
        ###################################################################
        ## Fill out the following then remove
        raise NotImplementedError("Student exercise: sample indices for summer and autumn data and sequentially train on summer and then on autumn data")
        ###################################################################

        # Sample random training examples from summer and autumn
        sampled_summer_indices = np.random.choice(np.arange(summer_days_train_norm.shape[0]), size=K, replace=False)
        sampled_autumn_indices = np.random.choice(np.arange(autumn_days_train_norm.shape[0]), size=K, replace=False)

        model.partial_fit(..., ...)

        summer_r_squared.append(model.score(summer_days_test_norm, summer_prices_test))
        autumn_r_squared.append(model.score(autumn_days_test_norm, autumn_prices_test))

        model.partial_fit(..., ...)

        summer_r_squared.append(model.score(summer_days_test_norm, summer_prices_test))
        autumn_r_squared.append(model.score(autumn_days_test_norm, autumn_prices_test))

    return model, summer_r_squared, autumn_r_squared

set_seed(42)
num_epochs = 100
K = 30

sequential_training_model, summer_r_squared, autumn_r_squared = sequential_training(K, num_epochs)

plot_performance(num_epochs, summer_r_squared, autumn_r_squared)

#predict for test sets
summer_prices_predictions = sequential_training_model.predict(summer_days_test_norm)
autumn_prices_predictions = sequential_training_model.predict(autumn_days_test_norm)

plot_summer_autumn_predictions(summer_prices_predictions, autumn_prices_predictions)

Click for solution

Example output:

As we can see, this approach performs better than first fully learning from the summer data and then learning from the autumn data. Sequential joint training helps maintain the model’s performance across both datasets by continually refreshing its memory with information from both periods. This method prevents the model from completely forgetting the relationships learned from the first dataset while training on the second.

Coding Exercise 2b: Interspersed training#

Now, we will try fully interspersed training. Unlike sequential joint training, in this approach, we will generate epochs that contain both summer and autumn data, exposing the model to both sets equally and simultaneously.

In this exercise, you are tasked with completing the code snippets that correspond to creating the labels for the interspersed epochs and training the model. This method aims to integrate the data from both periods within each training epoch, which can help in achieving a more balanced and robust model performance across different seasonal datasets.

def interspersed_training(K: int, num_epochs: int):
    """
    Perform interspersed training for a multi-layer perceptron (MLP) regression model.

    Inputs:
        - K (int): The number of training examples to sample from each season (summer and autumn) in each epoch.
        - num_epochs (int): The number of training epochs.

    Returns:
        - model (MLPRegressor): The trained MLP regression model.
        - summer_r_squared (list): A list containing the R-squared values for the summer season at each epoch.
        - autumn_r_squared (list): A list containing the R-squared values for the autumn season at each epoch.
    """

    model = MLPRegressor(hidden_layer_sizes=(100, 100), max_iter=10000, random_state = 42, solver = "lbfgs")
    summer_r_squared = []
    autumn_r_squared = []

    for _ in tqdm(range(num_epochs), desc="Training Progress"):

        # Sample random training examples from summer and autumn
        sampled_summer_indices = np.random.choice(np.arange(summer_days_train_norm.shape[0]), size=K, replace=False)
        sampled_autumn_indices = np.random.choice(np.arange(autumn_days_train_norm.shape[0]), size=K, replace=False)


        mixed_days_train = np.expand_dims(np.append(autumn_days_train_norm[sampled_autumn_indices], summer_days_train_norm[sampled_summer_indices]), 1)
        ###################################################################
        ## Fill out the following then remove
        raise NotImplementedError("Student exercise: make price labels for mixed epochs and train")
        ###################################################################
        mixed_prices_train = ...
        model.partial_fit(..., ...)

        summer_r_squared.append(model.score(summer_days_test_norm, summer_prices_test))
        autumn_r_squared.append(model.score(autumn_days_test_norm, autumn_prices_test))

    return model, summer_r_squared, autumn_r_squared

set_seed(42)
num_epochs = 50
K = 30

interspersed_training_model, summer_r_squared, autumn_r_squared = interspersed_training(K, num_epochs)

plot_performance(num_epochs, summer_r_squared, autumn_r_squared)

#predict for test sets
summer_prices_predictions = interspersed_training_model.predict(summer_days_test_norm)
autumn_prices_predictions = interspersed_training_model.predict(autumn_days_test_norm)

plot_summer_autumn_predictions(summer_prices_predictions, autumn_prices_predictions)

Click for solution

Example output:

Coding Exercise 2 Discussion#

Note that the number of epochs is doubled in sequential training mode compared to interspersed mode. Why is this the case?
Which training scheduler performed better in this particular example? Why do you think this occurred?

Click for solution

Summary#

Estimated timing of tutorial: 25 minutes

Here is a summary of what we’ve learned:

Simply continuing to train on a new data distribution causes catastrophic forgetting
Joint training, wherein different datasets are interspersed to varying degrees, helps fight catastrophic forgetting.

You can explore more advanced methods of continual learning in the following resources:

Tutorial 2: Continual learning

Contents

Tutorial 2: Continual learning#

Tutorial Objectives#

Setup#

Install and import feedback gadget#

Imports#

Figure settings#

Plotting functions#

Set random seed#

Video 1: Continual learning#

Submit your feedback#

Section 1: Catastrophic forgetting#

Coding Exercise 1: Fitting new data#

Submit your feedback#

Section 2: Joint training#

Coding Exercise 2a: Sequential joint training#

Coding Exercise 2b: Interspersed training#

Coding Exercise 2 Discussion#

Summary#