# A neural network for Fashion MNIST data

The final step of this series looking at chapter 4 of the fastai book tackles the final step where we construct a very simple 3-layer neural network which learns to distinguish a pullover from a dress.

```
!pip install -Uqq fastbook nbdev torch
import fastbook
fastbook.setup_book()
from fastai.vision.all import *
from fastbook import *
import torch
from torch.utils.data import Dataset
from torchvision import datasets
from torchvision.transforms import ToTensor
import matplotlib.pyplot as plt
training_data = datasets.FashionMNIST(
root="data",
train=True,
download=True,
transform=ToTensor()
)
test_data = datasets.FashionMNIST(
root="data",
train=False,
download=True,
transform=ToTensor()
)
training_dresses = [item[0][0] for item in training_data if item[1] == 3]
training_pullovers = [item[0][0] for item in training_data if item[1] == 2]
test_dresses = [item[0][0] for item in test_data if item[1] == 3]
test_pullovers = [item[0][0] for item in test_data if item[1] == 2]
training_dresses_tensor = torch.stack(training_dresses)
training_pullovers_tensor = torch.stack(training_pullovers)
test_dresses_tensor = torch.stack(test_dresses)
test_pullovers_tensor = torch.stack(test_pullovers)
train_x = torch.cat([training_dresses_tensor, training_pullovers_tensor]).view(-1, 28*28)
train_y = torch.cat([torch.ones(len(training_dresses)), torch.zeros(len(training_pullovers))]).unsqueeze(1)
valid_x = torch.cat([test_dresses_tensor, test_pullovers_tensor]).view(-1, 28*28)
valid_y = torch.cat([torch.ones(len(test_dresses)), torch.zeros(len(test_pullovers))]).unsqueeze(1)
train_dset = list(zip(train_x, train_y))
valid_dset = list(zip(valid_x, valid_y))
train_dl = DataLoader(train_dset, batch_size=256, shuffle=True)
valid_dl = DataLoader(valid_dset, batch_size=256, shuffle=True)
dls = DataLoaders(train_dl, valid_dl)
def initialise_params(size, std=1.0):
return (torch.randn(size) * std).requires_grad_()
def fashion_mnist_loss(predictions, targets):
predictions = predictions.sigmoid()
return torch.where(targets==1, 1 - predictions, predictions).mean()
def batch_accuracy(x_batch, y_batch):
preds = x_batch.sigmoid()
correct = (preds > 0.5) == y_batch
return correct.float().mean()
```

In the previous post we used stochastic gradient descent to train a model to fit a linear function to our Fashion MNIST data, specifically the difference between a pullover and a dress.

In this final stage, we will take the next step to creating a neural network in code that will be used to detect that same difference between a pullover and a dress. The key difference here is that we will need to 'add non-linearity' to our function. I have no mathematics background so I have very little intuitive (or learned!) understanding of specifically what that means, but my current mental model as learned during the course is that linear functions just aren't flexible enough to learn more complex patterns. In the end, what we want is a function that will fit to the patterns in our training data (as mapped to a multidimensional space). Simple linear functions aren't going to cut it.

What this looks like in code is this:

```
weights1 = initialise_params((28*28, 30))
bias1 = initialise_params(30)
weights2 = initialise_params((30, 1))
bias2 = initialise_params(1)
def simple_network(x_batch):
result = x_batch@weights1 + bias1
result = result.max(tensor(0.0))
result = result@weights2 + bias2
return result
```

You can see the three layers of our simple network pretty clearly in the code above. The middle layer is what otherwise is known as a ReLU or rectified linear unit. It basically means that negative values passing through that function become zero and all positive values are unchanged. When you plot the function it looks like this:

```
plot_function(F.relu)
```

When we put a non-linear function in between two linear functions, then this network is able to encode and express more complicated patterns. This is basically all we're doing with deep learning: we stack these layers on to make the functions more and more capable of modelling and representing complex things.

We can express the above simple network in PyTorch-specific code (functionally it's the same):

```
simple_net = nn.Sequential(
nn.Linear(28*28, 30),
nn.ReLU(),
nn.Linear(30, 1)
)
```

At this point, training a model is similar to what we did last time round:

```
learn = Learner(dls, simple_net, opt_func=SGD, loss_func=fashion_mnist_loss, metrics=batch_accuracy)
learn.fit(30, 0.1)
```

```
plt.plot(L(learn.recorder.values).itemgot(2))
```

```
learn.recorder.values[-1][2]
```

We don't actually emerge at the end of this with a vastly superior score to what we had at the end of the last notebook, but this basis (the simple neural network) has far more open vistas within which we can work and build upon.

Finally, to round out my understanding, I put together a little diagram showing
the various pieces that go into the `Learner`

class when we instantiate it,
adding some of the other concepts etc below it as I felt was appropriate. This
isn't a complete picture by any means, but I find it helpful to visualise how
things are layered and pieced together: