Machine Learning in Jupyter Notebook

Started My Dive Into Machine Learning

Sunday, September 18, 2022

I recently started doing a deep study into machine learning. I’m pursing this path right now partly because of some trends we are exploring at my work, and partly for a personal challenge to stay up to date with the quickly changing technology world.

I program mostly in Python and C# at work and personally, so this was right up my ally since so much machine learing is done in Python (at least at my introductory level). In this article, I will go over how I created my first machine learning model using a few Python frameworks like NumPy, Pandas, and PyTorch. Most of what I am learning is from Sebastian Raschka’s book, “Machine Learing with PyTorch and Scikit-Learn” (2022 edition) as well as Daniel Bourke’s online deep dive into machine learning. Both are excellent resources and fit my learning style.

Creating The Machine Learning Data

The first thing I did was import the needed frameworks. Bourke’s class goes into additional details for having the data be device agnostic, which I love. This way I can code the model to run locally on my laptop’s CPU or upload to an online GPU (ie. Google’s Colab or other service).

To initiate, I imported PyTorch. For those who may not know, PyTorch is a Python framework for machine learing. NumPy will be used later, and is useful for dealing with Arrays and vector math. Finally, MatPlotLib is a graphic framework that will let me plot out the results of the machine learning model.

Normally I would do this in my IDE and run a .py Python script, but for this exercise, Jupyter Notebook made it much easier to iterate and experiment with different numbers to instantly see the restults. All the graphs below were created right in Jupyter Notebook with Python’s MathPlotLib framework.

I have taken multple statistics, probability, calculus, and a few other math classes that deal with more complex learning functions, but for now, for my first model, I’m only building a single linear regression model.

Here’s the first block of code that I created in Jupyter Notebook to start things off:

import matlibplot as plt
import torch
from torch import nn

# This line looks to see if a GPU is available. 
# If so, it sets the computation device to be the GPU, 
# otherwise, it is set to the CPU
device = "cuda if torch.cuda.is_available() else "cpu"

Next, I wanted to create some random data. I’ve been playing with using Yahoo’s API to import stock prices, but that’s a bit too unpredictable for now. Gotta walk before I run, right!

So to do that, I followed Bourke’s lead by building a large tensor of values. What’s a tensor? It’s basically an array of various sizes. The number 15 can be an array sized 1×1. An 2-dimensional array (1×2) would look like [[1, 2, 3],[10, 11, 12]]. Visually, it would look like a grid of numbers. A 3-dimensional array would be [[[1, 2, 3], [10, 11, 12], [20, 21, 22]]] and would be like a 3x3x3 cube with a value in each square. All of these are tensors, the number 15 is a 1×1 tensor and the cube is the 3x3x3 tensor.

Instead of using random values, I did so using a very methodical, linear equation to make the machine learning iterations easier for me to see how they are working.

A Linear Regression model for machine learning typically uses the equation y = w * x + b where w is the weight and b is the bias. Each value in the code below steps through values from 0 to 1 and feeds the x value with a predetermined weight and bias that I can test against later.

# These are predetermined weights and biases.
# I made them 0 - 1 to help with the normalization I'll be doing later. 
w = 0.7  # The weight
b = 0.3  # The bias

# Now I'll build a PyTorch tensor and fill it with values
start = 0
end = 1
step = 0.2
X = torch.arange(start, end, step).unsqueeze(dim=1)
y = w * X + b

If I print out the values of X and y, I get two tensors as follows (this is a truncated list of the first 10 values only):

At this point I broke up the list to a larger training set and a smaller set of numbers that I’ll save until later to test with. Hopefully, after training the model, I’ll be able to feed my test values for x into the model and I’ll get values for Y that are very close to the actual numbers in my Y tensor above.

Building the PyTorch Linear Model

The easiest way to keep everything as reusable as possible, I created a custom class based on the Linear Regression Model so I could feed a PyTorch module into to train it.

This custom Python class will take the PyTorch model and set the layer to linear from the x value. I’ll then set a manual seed (so the data it uses for calculations are random, but repeatable, and

class LinearRegressionModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear_layer = nn.Linear(in_features=1, out_features=1)
    
    def forward(self, x: torch.Tensor):
        return self.linear_layer(x)

# Here I set the manual seed
torch.manual_seed(13)
model = LinearRegressionModel()
model = model.state_dict()

Training the PyTorch Model

I’m now going to train the model. Machine learning uses what is called a “loss function” and an “optimizer.” The loss function is similar in some ways to a standard deviation or variance. The goal is to minimize the loss function. If you are familiar with calculus, it is like trying to find where the derivative is zero to find a minium or maximum of the original function.

In our case, a the closer our loss is to, the closer our prediction will be. Imagine a simple parabolic curve. If we randomly pick a spot on the curve and use the slope to tell us which way nudge our next number, eventually, we would end up at the minimum.

Fortunately, PyTorch has several loss functions built in that I can choose. The same goes for the optimizer.

lossfunction = nn.L1Loss()
optimizer = torch.optim.SGD(params = model.parameters(), lr=0.1)

I ran the following training loop on my machine learning model. The epoch is each loop that I run it and it tests a weight and bias. In theory, each epoch will get me closer to a loss of zero. Most likely, I’ll never get a loss of zero, but I want to minimize it as much as I can.

There are four sets of data in the following code block. The first two are the training sets for X and y. The second pair is a test set. I’ll use the test set to compare against my own model’s prediction to see how far off it is in each epoch. The amount that the model is off is the loss.

The model will iterate using the learning rate (lr in the optizer line above) to find a new value for my weight and bias before running the next epoch with the new values.

epochs = 100
for epoch in range(epochs):
    model.train()
    y_prediction = model(xTrain)
    loss = lossfunction(y_prediction, yTrain)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    # Test the values
    model.eval()
    with torch.inference_mode():
        test_prediction = model(xTest)
        test_loss = lossfunction(test_prediction, yTest)

After running for 400 epochs, I could see the loss going down.

I used MatPlotLib to take all the data and plot it. The training data is blue, the predictions use the new model that I just trained and are red. Obviously, because this is a simple model, the values are pretty straightforward and we can see it is a linear function.

The first graph is how closely the predictions are after one epoch. As you can see, it isn’t very close. This is to be expected since the values used to predict the weight and bias of our linear function were randomly chosen.

However, take a look at how much close it gets after 400 epochs. It is much closer, almost exactly on.

At this point, I can go in and make tweaks to the model. I can change the learning rate, let it run for additional epochs, or even try different optimizer functions. When I call the function ‘state_dict()’ on my model, I can see what the actual weight and biases are for the trained model:

The weight is 0.06936, which is pretty close to the preset 0.7 in our original. And our bias of 0.2980 is, again, close to the 0.3 we set it to be for our actual function.

I’m amazed at how the actual optimization code is fairly concise ( in the ‘training’ section above). Basically just 6 lines of code, plus the loops and testing lines. Again, this was very simple since it was a prebuilt linear model where we had the known weights and biases.

I’m looking forward to expanding this knowledge with different, more complex models. In addition, I want to test this on data from ‘the wild’ by scraping some stock prices, housing data, or temperature data to see if I can get a model that can come close to matching a known reality.

Thanks for sticking with me through such a dense blog post. And I’m very thankful we live in a time where we have resources like PyTorch to make the machine learning code more concise and easier to optimize, and especially thankful for Raschka’s book and Bourke’s onine course for going into such depth to learn the ways of such a fast moving field.




Ciao! I'm Scott Sullivan, a software engineer with a specialty in machine learning. I spend my time in the tranquil countryside of Lancaster, Pennsylvania, and northern Italy, visiting family, close to Cinque Terre and La Spezia. Professionally, I'm using my Master's in Data Analytics and my Bachelor's degree in Computer Science, to turn code into insights with Python, PyTorch and DFE superpowers while on a quest to create AI that's smarter than your average bear.