A Simple Introduction to Deep Learning — Recognising Handwritten Digits

Published in

DataSeries

6 min readSep 25, 2019

Learn Deep Learning by building your very own digit recognition algorithm. NO DOWNLOADS REQUIRED.

Applied Deep Learning is tough for anyone! Neural networks are difficult to understand, build and it is usually unclear how the network arrived at that result it did!

Luckily for you this article will be a great start to learning what these artificial neural networks are all about! The best way to learn is by building one yourself!

Step #1

Google is a great company for many reasons. One of these reasons is that they provide a online platform that we can use to build our deep learning model for free! They even installed all the modules we will need!

The platform is called Google CoLab. Simply, CoLab is a platform that allows us to do machine learning on a NoteBook interface. NoteBooks (like Jupyter) are popular in Machine Learning as they enable us to easily communicate and share our work!

Go to this link: research.colab.Google.com

2. Sign in

3. Make a new notebook using Python3 (New Python3 Notebook)

**Click this or go to FILE > New Python 3 Notebook**

Step #2- Lets start to Code!

In case you get stuck anywhere along the way, here is the link to my notebook for reference: https://github.com/MatthewByra1/Machine_Learning_Projects/blob/master/IntroNueralNetworks.ipynb

You cannot do machine learning without having data to learn from. We will be using a dataset called MNIST. It contains 70000 handwritten digits!

When we teach a neural network we have to train and then we need to test it to see if is actually recognising our digits correctly.

The MNIST Dataset already splits up our data for us! Most of it is for training and the rest is used for testing. Our data looks like this:

Better handwriting then mine!

We need to import a few tools first! :

import numpy as np #A math tool we can use to deal with arrays
import torch #PyTorch is the machine learning library we will use
import torchvision #Gives us access to the dataset we need
import matplotlib.pyplot as plt #Plotting tool
from torchvision import datasets, transforms
from torch import nn, optim #Neural Network optimization tool

Copy and Paste this code into the first cell and hit SHIFT + ENTER to test if the imports worked without errors (no output).

When it comes to picture recognition computer problems, the computer needs to be able to actually read the data it is given. This is accomplished by assigning the image color channels (green, blue, red) to numbers. We also need to be able to compare the image data to each other. This called normalizing data. We can easily convert and normalize our data using one of our imports!

C&P (Copy and Paste) this code into the next cell:

transform = transforms.Compose([transforms.ToTensor(),
                              transforms.Normalize((0.5,), (0.5,))])

We can now use this ‘transform’ variable on our data!

Here we download the data. If you take a look, we also shuffle the data so that we get a good representation of the images between our training and testing datasets!

training_set = datasets.MNIST('-/.pytorch/MNIST_data/', download=True, train=True, transform=transform)trainloader = torch.utils.data.DataLoader(training_set, batch_size=64, shuffle=True)dataiter = iter(trainloader)images, labels = dataiter.next()

Make sure to test that the data downloads and runs without issues!

Lets take a look at our data by using MatPlotLib:

plt.imshow(images[1].numpy().squeeze(), cmap= "Greys_r")

Awesome! Our data has imported and we are able to visualise it!

Step #2.2 — Building your Deep Neural Network

If you have having seen a depiction of the general archectiture of an artificial neural network, you would see that it consists of inputs, layers and then outputs. It generally looks something like this:

Src: https://www.oreilly.com/library/view/deep-learning/9781491924570/ch04.html

There a many ways to modify this basic architecture so that we may optimise learning and enhance predictive success. For our recognition problem, we will use the a basic feed-forward architecture.

What does feed-forward mean and why are we using it? Great question!

Information flows through a neural network in two ways; as it is trained and following training. Information is fed into the network first to the input layer, which triggers the hidden layers, and these in turn arrive at the output. This is what is referred to as a feed-forward network.

Not every layer is fired at once. The information is transferred from left to right being modified by assigned weights. At each layer there is a threshold value. If that value is met by a sum of the information fed from the layer to the left, it proceeds to the next layer. This process is iterative, it is done multiple times so that the model can learn!

This parallels well with how we learn! A form of trial and error. We make a decision, recognise the problem in that decision and attempt it again and again until we get it just right!

There is quite a bit of mathematics involved when the output data is compared with the expected result and fed back into the model to improve learning. I won’t overwhelm you with it, but if you are interested take a look at gradient descent and back-propagation.

#784 inputs, 128  hiddenl (ReLU), 64 hidden2 (ReLU), 10 output (Softmax)
input_size = 784
hidden_sizes = [128, 64]
output_size = 10#Build a Feed-forward network
model= nn.Sequential(nn.Linear(input_size, hidden_sizes[0]), nn.ReLU(), nn.Linear(hidden_sizes[0], hidden_sizes[1]), 
                     nn.Linear(hidden_sizes[1], output_size), nn.Softmax(dim=1))
dataiter = iter(trainloader)
images, labels = dataiter.next()
images.resize_(images.shape[0], 1, 784)
ps = model.forward(images[0, :])#define loss function
criterion = nn.NLLLoss()
#iterate over the data
dataiter = iter(trainloader)
images, labels = dataiter.next()
#reshape data
images = images.view(images.shape[0], -1)
#Feed forward
logps = model(images)
loss = criterion(logps, labels)

Our input size is the amount of pixels in one image.

Our hidden layer sizes refer to the sizes of each layer respectively. Each layer takes in a size of 64 images (find by print(images.size)) and we are building two hidden layers for this simple network.

We have an output of 10 and we use a SoftMax loss activation function which is typical with recognition tasks.

And that's it, we have built our digit recognition model! Now we need to optimise it and feed it back the output information so that it may learn!

Step 2.3 — Lets make our model learn!

As you now know, our neural network learns by running information through it, comparing that result to what it should be and then sending the information back while adjusting layer weights and until the model reaches predictive success.

In our case, we run (epochs) through only 2 iterations of learning. This means that our model probably won’t be as great as if we did say 20 times. Ultimately, we want to maximise our time against predictive success. If you have a powerful computer then run it as many times as you want! I opted for 2 since it still shows us how the model learns.

optimizer = optim.SGD(model.parameters(), lr=0.001)
epochs = 2
for e in range(epochs):
    running_loss = 0
    for images, labels in trainloader:
        images = images.view(images.shape[0], -1)
        optimizer.zero_grad()
        output = model(images)
        loss = criterion(output, labels)
        #Backpropagation
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    else:
        print(e, running_loss/len(trainloader))

If we would like to check how are your model preforms you simply take the number of correctly predicted images over the total amount of images in which the model trained on:

correct = 0
all_counted = 0
for images,labels in trainloader:
  for i in range(len(labels)):
    img = images[i].view(1, 784)
    with torch.no_grad():
        logps = model(img)    ps = torch.exp(logps)
    probability = list(ps.numpy()[0])
    pred_label = probability.index(max(probability))
    true_label = labels.numpy()[i]
    if(true_label == pred_label):
      correct += 1
    all_counted += 1print("Model Accuracy =", (correct/all_counted))

Conclusion

Now that you have a basic understanding of how neural networks work and that you have built one for yourself, I hope you can go onto making some awesome models in different domains such as NLP, Computer Vision and finance!

If you would like to learn a lot more about how the math works behind neural networks or different more complex tutorials about A.I and Machine Learning please feel free to follow me and give this tutorial a clap!

References:

I drew some direction for this tutorial from Amitrajit Bose, I would like to encourage you to look at his tutorials!