Creating a neural network from the ground up for classifying your own images in Keras/TensorFlow, including all the super basic things people miss

Kai Brooks
21 min readJan 10, 2020
Actual picture of the author cooking up this article

This article will go over training a neural network to classify images as either horses or humans. It’s also going to cover all the fundamental things that the data nerds scoff at and ignore when writing tutorials. We’ll set up your environment and note which commands to put in the command line. This project is also flexible enough that we could add any of our own images, and the network will train and classify them.

This article assumes the reader is an absolute beginner and includes some basic things (like setting up an environment) that others ignore, as well as simplifying a few complicated concepts and omitting some others.

Formalities before we begin:

I just came here to download the code

I got you:

I just want to follow some commands and not read this whole thing

Run the Docker container:

docker run -it -p 8888:8888 kaibrooks/demo-horsemans:latest

Go to the URL that displays at the end. Done.

Why another Keras/TensorFlow tutorial?

This tutorial assumes zero initial knowledge, and it covers the entire process to a greater extent than I’ve seen in one place, including:

  1. All the absolute beginner information, such as how to set up and environment for the first time.
  2. Creating augmented data from your own images to train on
  3. Training the network
  4. Testing the trained network
  5. Importing and exporting the trained .h5 file both as a weights file and a compiled .h5 file
  6. Using callbacks to remember and save the best networks
  7. Using network optimizations to fix some issues with training
  8. Fancy plots of network performance

I’m going to keep the math light because for me, it was more useful to know how to create a general-purpose neural network, and then learn the mathematical nuances later.

Why should I read this from you?

I wrote this because I couldn’t find a decent tutorial that assumed no knowledge on the part of the reader, started from the ground-up and incorporated all of the above points. A tutorial for starting with nothing and ending with a fully functional network. Most ‘tutorials’ use completely arbitrary examples that narrowly work only inside the code as listed, making it impossible for the beginner to turn those examples into something usable for them.

Secondly, I’ve taught this to people who have zero experience in machine learning and can barely manage the command line, so I mean for this to cover everything from the standpoint of the average person, not someone well-versed in computer science or machine learning.

For the PhD-endowed data scientists and the tenured professors who started their coding career in Pascal who have stumbled on this article, note that I’m simplifying many ideas here to make it easier for beginners to understand. Sometimes we forget how complicated new ideas are when our experience distances us from learning them for the first time.

What we’re going to do

This process overall has a few parts:

  1. Create an augmented training set.
  2. Split the images into a training/validation/test group.
  3. Train the network on the augmented images.
  4. Test the network against new images.

What do I need to know that you aren’t going to explain here?

  1. How to use Git to clone a repo, or how to download a repo from Github
  2. Basic command line things, like navigating to a folder
  3. You don’t actually need to know Python, but it will help
  4. If you’re not sure what a library is, mentally replace “library” with “software”, except its not a separate program

What’s TensorFlow / Keras / Docker / whatever else we’re using?

TensorFlow is a library that we will use for creating and training our network. It has no interface, so it’s not a program that you just open and click some buttons. Because of this, we need to write some Python scripts which invoke TensorFlow to get it to run as we want. It’s also probably the biggest player in the machine learning space, and it’s from Google, so it’s got some backing.

Keras is a library that runs on top of TensorFlow, which makes TensorFlow simpler to use and faster to set up training. Think of Keras as just adding some extra commands to TensorFlow which let us use short, simple code, instead of long, obnoxious code. It’s kind of like making fire by rubbing sticks vs. using matches. The result is still the same, but the latter method is considerably easier to get there.

Docker is a virtualization software that makes an environment for Keras/Tensorflow to run in. Think of it like a virtual machine that only runs in the command line (so, no interface). Docker isn’t strictly necessary for running Keras/Tensorflow; it just makes it a lot less headache to set up and run. With one command, you can download and install every single file this entire project needs, instead of downloading everything individually.

Jupyter Notebook a Python environment that runs in your browser. Instead of opening some separate program to write code and then running some compile command in the command line, you just open your browser, and Jupyter Notebook lets you write and run the code quickly.

To recap:
Tensorflow — the library that’s going to create the neural network and train it
Keras — the library that’s going to make it easy to use TensorFlow
Docker — the software that makes it easy to install and run everything
Jupyter Notebook — where we’re going to do the coding and running and looking at what the neural network is doing

Set up to get everything up and running

Github houses the code for this project. I also containerized the project through Docker and uploaded it to the Docker Hub, so it’s easy to begin. A few lines here and there might be different from what’s in Github since I’m continually developing it, so don’t sweat it if it’s a little different.

How do I download and install this project?

  1. Install Docker from here. You need a free account. If you’re running Windows, after installing Docker, click the little icon in the taskbar and go into the settings. There’s an option for which drives that you want Docker to have access. Check the drive where you put your own files!
  2. Run the following:
docker pull kaibrooks/demo-horsemans:latest

This command tells Docker to download the container, which has all the demos, code, and other files in it. The download and decompressing might take about 10 minutes, but it’ll install everything automatically, and you only need to do this once.

Running Docker

Run the image with the command below. By default, Docker cannot access any files on your computer unless you give it access to the folder when you use the docker run command!

In the commands below, replace the MYSWEETFOLDER part with whatever folder you want to give Docker access. Don’t change the :/tf/notebooks part! This command tells Docker to make the contents of your own folder available to Jupyter Notebook inside the folder ‘notebooks.’

For mounting a folder on macOS / Linux:

docker run -it -v ~/MYSWEETFOLDER/:/tf/notebooks -p 8888:8888 kaibrooks/demo-horsemans:latest

For mounting a folder on Windows:

docker run -it -v C:/MYSWEETFOLDER/:/tf/notebooks -p 8888:8888 kaibrooks/demo-horsemans:latest

Note that the :/tf/notebooks part doesn’t change!

If you don’t want to use your own files, you can cut out the -v command:

docker run -it -p 8888:8888 kaibrooks/demo-horsemans:latest

Let’s begin

After using a docker run command, the command line will output an address like Copy this entire address and paste it into your browser.

Copy one of these lines, easiest is probably the bottom one. The number at the end changes each run.

The container runs a Jupyter Notebook server at that address. Open your browser and paste in that address to open it.

Jupyter Notebook up and running

Jupyter Notebook is where all the training/testing/coding/running/everything else will take place.

Inside the notebooks folder is the contents of the folder we specified with the docker run command (see the MYSWEETFOLDER example above).

If the notebooks folder isn’t there, it’s because you’re using the command that doesn’t include local folder access. You can still run the demo projects and save the outputs, but you can’t import your own data.

In addition to your notebooks folder, there’s the demo-horsemans folder, which contains all the code we’re going to work with.

The demo folder and its contents

demo-horsemans: Is it a Horse or a Human?

This project trains a neural network to learn the differences between horses and humans from this dataset.

Bad 3d renderings of horses and humans
Horses and humans created from what I can only assume to be the Playstation 2

Above we see some example images from the Horse or Human image set. There are about 1000 images in total, but we can make more images for better training with some data augmentation.

imageGenerator: Augmenting the data

Data augmentation is a way to increase the diversity of the data the network sees. Sure, we could train a network on those 1000 images alone, but what if we intentionally messed with the images to make the network work harder to learn what it’s supposed to be classifying? We’ll randomly rotate images, zoom them, or maybe adjust the contrast or throw some Instagram filters on them.

By feeding the network augmented data, training forces it to learn what it’s classifying in a more general sense since it can’t rely on cheap learning shortcuts. For example, without data augmentation, the network might learn that ‘humans’ are always in the shape of thin, tall lines going straight up and down, and will spectacularly fail when trying to classify an image of a human in any other orientation.

By feeding the network augmented data, it’s forced to learn what it’s classifying in a more general sense.

imageGenerator.ipynb is a file that takes each of the raw images and makes some permutations of it, such as rotating it, stretching it out, cropping it, or other adjustments. The reason we do this is twofold:

  1. Augmenting the data gives us many more images on which we can train. We can quickly turn 1,000 images into 100,000.
  2. Training the neural network with augmented data helps prevents overfitting, the bane of every application. More on that later.
Raw images and their augmented images

When we run imageGenerator, it creates a few augmented images for each image in the raw folder and saves the augmented images in the train folder. Your images might look different than mine, or someone else’s since the augmenter randomizes which transformations to use on each image!

Horse training folder with augmented images.

Because the augmenter randomizes the transforms for each image, it also means we can generate a different data set by rerunning imageGenerator, or giving it a different batch size for more or fewer images with to train.

Splitting training, validation, and testing sets

The network needs a few different sets of images to work with:

Training set — the set of images that the network will analyze to understand what makes something a horse or a human. This folder is where our augmented images go.

Validation set — the set of images that the network uses to check how well it’s learning during training. These should be raw images, not augmented ones! We want the network to compare itself against actual pictures, not warped ones.

Testing set — the set of images the network has never seen before. This part is the actual evaluation of how well the network classifies, which we run after the network finishes training.

Test images. Note they have no background and look different than what the network trained on.

In this case, the test set images have no background, only the class (horse or human). We do this because if the network learned to classify incorrectly (like it was looking at the background instead), it won’t ‘accidentally’ come up with the right answer, which gives us a better idea of how the network performs. The test set doesn’t necessarily need to be on a white background like this, just so long as they’re images the network has never seen.

We need to split these images up and put them in their appropriate folders. The test folder has the white-backgrounded images, the validation folder should have the raw images, and the train folder should have all the augmented images. There’s also a raw folder, which keeps the original images, so it’s easy to generate new augmented data sets. However, the program doesn’t use the images in it for any training/validation/testing.

If the imageGenerator file doesn’t put them in the right folders, you might need to do it by hand. Remember to keep things in horse or human subfolders, since this is how the network knows which is which for training! If we were using a different dataset, the subfolders would have that datasets labels on them instead (e.g., cats, bicycles, cheese).

Note that this folder ordering is entirely arbitrary, and TensorFlow/Keras can manage different folder names or layouts. This structuring is just what I picked since it works for me, and all the code looks for images in those folders.

horsemansTraining: training the network

Preparation over, it’s time to train.

We can just ‘Run All’ in horsemansTraining and the program will grab the images from the folders, train the network, test the network, and save the .h5 output files. However, let’s go over a few sections of the program.

We can just ‘Run All’ in horsemansTraining and the program will grab the images from the folders, train the network, test the network, and save the .h5 output files.

The network model

This block of code defines the structure of the neural network, with the network compiling command at the end.

## set up model
model = Sequential()
# input layer
model.add(Conv2D(32, (3, 3), padding='same', input_shape=train_images.shape[1:])) # filters, kernel size
# hidden layers
model.add(Conv2D(32, (3, 3))) # 2d convolutional layer
model.add(Activation('relu')) # rectified linear unit activation
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Conv2D(64, (3, 3)))
model.add(MaxPooling2D(pool_size=(2, 2)))
# output (convergence) layer
model.add(Dense(num_classes)) # one node for each class
# compile the model
print('Model summary:')
model.summary() # show the summary

First, we see this is a sequential neural network; its structure looks something like this:

General representation of a sequential nerural network. Our network has more layers (vertical groups) and more nodes (circles in each layer), but is otherwise similar.

On the left is the input layer, the middle is the hidden layer (though in this picture, there’s only one), and on the right is the output layer. For our network, we have one input layer, four hidden layers, and two output layers.

The network model: input layer

The input layer has nodes equal to the number of pixels in the image. This layer is where our image gets converted into data points and starts to move through the network.

The network model: hidden layers

Hidden layers are any layer that isn’t the input or output. We have a bunch of hidden layers of different sizes, and they’re all interconnected.

This area is where most of the ‘learning’ comes. As the image data moves through these nodes and layers, their values feed into other nodes, like combining different signals to get new outputs.

The network model: output layer

We have one output layer, though it has two output nodes because we have two classes. The values of the output layer are the probability the image is either a horse or a human (according to the network). Because this is a probability, these two nodes always sum to 1 (or, 100%). For example, if the network returned the value [0.21 0.79], that means it thinks there’s a 21% chance of the image being the first class and a 79% chance of it being the second class.

No matter what image we feed into this network, it will only ever output a probability vector (basically two numbers linked together), one being the probability of the image being a horse, the other being the probability of the image being a human.

What types of layers does this model have?

Here’s a brief description of what some of the layers here do:

2D Convolutional Layer — This layer performs a convolution on the image, which gives us a ‘high-level’ look at the features of the image, such as edge detection. Think of convolution as ‘sampling’ points around each pixel to find parts of the image that stand out.

Dense Layer — This is a layer where all the inputs and outputs connect to the previous/next layers, like the example above. This configuration is what people generally think of when they visualize a neural network. Dense layers also serve as the final output. Also known as a ‘fully-connected’ layer.

The loss function, or, ‘how do I know this is training’?

The loss function is a general mathematical formulation for “how well is this algorithm modeling the dataset,” with smaller being better. In a way, we could think of it as “how much error is this producing”? There are a few formulations for a loss function; we’re using one called Categorical Crossentropy, which is useful for images that only have one label (i.e., no horse-human combinations).

During training, we see the loss function in the plot, and it should go down over time. Individual runs might spike it up and down, but its overall trend should be toward zero.

Overall, we’re looking for the smallest loss we can get. This loss function is where that graph during training comes into play. As it decreases, the network performs better, though it will taper off at some point. Keep loss low!

A loss plot early on in training. Note that the loss function (blue) trends downward over time.

The blue line represents training loss. This line is the ‘main’ loss function we’re concerned with when evaluating how well the network trains. The orange line is validation loss. Optimally, we want the validation loss as close to the training loss as possible, and converge at the end. If the validation loss is greater than the training loss (orange > blue), then our network is overfitting.

Ultimately, we’re also interested in accuracy, which is why we measure it after each epoch as well. Final accuracy should hit around 80% with this project as-is, though we could improve it with some adjustments (more training time, different image sizes, more source material — several things).

What’s overfitting and why is it bad?

Overfitting is when a neural network learns way too well how to do the exact thing you ask it to do during training. We want to avoid this outcome because the network completely fails when giving it anything even slightly different than what it was trained.

Imagine an autonomous race car learning to drive, and the only track it learns on is an oval (NASCAR-style). The car will get good at driving fast and turning left because that’s all it ever needed to be successful. However, take this same, trained car and put it on a new windy track, and it will fail spectacularly as its ‘drive fast and turn left’ strategy doesn’t work anymore.

An overfit partitioning problem. Even though the (overfit) green line separates the points perfectly, it misses the overall ‘trend’, or big picture of the black line. The black line is more likely to be correct with new data.

In a way, we can think of overfitting as ‘forcing’ the model to unnaturally perform better with a very specific set of data, at the cost of being a better ‘all-purpose’ model.

Overfitting: memorization is not learning

In addition to overfitting, there’s also underfitting. Underfitting is when the model is just terrible overall, like the NASCAR car that doesn’t even know how to turn or trying to partition the example above with a straight line. Usually, we can solve underfitting by just training longer.

How does dropout help prevent overfitting?

Image augmentation helps overfitting, but so does dropout. We use both of these in our model.

Dropout randomly ‘turns off’ some nodes in the neural network during each epoch. So, a dropout of 0.25 would turn off a quarter of the nodes. When dropout turns off nodes, it forces the network to create connections ‘around’ them, kind of like shutting off a road and having traffic take alternate paths around it.

Network with 50% dropout. Note that the ‘dropped’ nodes change each epoch.

Dropout ensures that individual nodes and paths don’t dominate the entire network, like a deep rut in a dirt path that drives all the wagons into it.

h5 files: the trained network

When the network finishes training, it saves everything as an .h5 file. This file is the finished, trained neural network. There are two kinds of h5 files, a weights-only file, or the full weights+network file. I won’t go into the differences here, but generally, you want to use the weights+network file (it will be the larger file), unless the application you’re using it in needs a weights-only h5.

We don’t open the h5 file directly, but instead, we use some commands to load it and run testing on it. At the end of the training, the network tests itself automatically. There’s also a separate test program, horsemansTesting, that only loads an h5 file and runs the testing.

As the network trains, it remembers the overall best performing model. After training completes, the network copies this best performing model and permanently saves it. This copying means that our final h5 is always the best network during training, even if performance got worse later on. This is an example of using a callback, basically something that runs after each epoch. Callbacks are also how we have a plot update during training, and there’s even a callback to stop training early if the network isn’t making much progress.

horsemansTesting: loading and testing models

Use this if you only want to test a model you previously trained. It runs the same testing as horsemansTraining, but you can skip the setup and image loading and network creation parts and go right to the testing.

Generally, you want to load the full h5 file (it’s the larger one with ‘best’ in the name). If you try to load a weights-only h5 file, you will need to re-compile the network model that the network trained! I didn’t include the network compiling in the hosemansTesting code, but it’s in the Network Model section above.

You might want to use one or the other h5 file if, for instance, you hooked this up to a camera or incorporated it into a mobile app, and your program precisely needed one or the other type of file.

The network tries to classify every image in the test folder and then logs the results. The program shows us a sampling of the images it looked at, its classification, and its confidence level for that classification.

Some magestic test output

In the above, the network classified everything correctly. For most images, the network was certain of its classification, though, for one, it was less confident. The horse in the middle had a 58% classification confidence, and the two bars next to it show the confidence it had of each class. This confidence is exactly the same as the final weights of the two nodes in the output layer of the network architecture. If the network classified something incorrectly, the bar would be red, as an easy way to see the big picture.

There are a few other blocks of code that run the same tests, but more abstractly. One gives us the classification of a single image as a number, and another gives us the total percentage of correct classifications. These might be useful for other applications, but this block of images makes it easy to visualize the network’s performance.

List of errors that you’ll probably encounter

docker: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.40/containers/create: dial unix /var/run/docker.sock: connect: permission denied.
See ‘docker run — help’.

This is a weird Linux permissions error. Run this first:

sudo chmod 777 /var/run/docker.sock

docker: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?.
See ‘docker run — help’.

Make sure Docker is running.

docker: Pulling from library/node image operating system “linux” cannot be used on this platform”

Try switching Docker to Linux containers, by right-clicking on the little icon in the taskbar and selecting “Switch to Linux containers…”

Fix for some windows users. Sometimes Docker uses Windows containers by default.

As another option, I build a container on Windows 10 for this, though this fix isn’t as reliable:

docker pull kaibrooks/demo-horsemans:windows

List of network problems you’ll probably have

I write the following list because these were all problems I had, and this is how I managed them. There’s probably more methods than I list here, but this is my experience with it.

My network performs spectacularly during training and then fails spectacularly when I try to use it to do anything.


My network can’t figure out the difference between two images in this one case, even though it otherwise classifies them correctly.

Usually something in the background messing with it. Try cropping the images closer, or just removing those images from the pool.

My network takes forever to get to a point where it becomes accurate.

  • Reduce your image (picture) size to make the network smaller.
  • Make your network less complex (like fewer layers).
  • Increase your learning rate.
  • Alternately, throw money at the problem and buy a server cluster to run it on.

The graph of my network performance is jittering all over the place.

Reduce your learning rate, the network is making too big of adjustments and keeps overshooting and undershooting where it’s trying to reach.

My loss graph looks real good at first, it decreases for awhile but then suddenly shoots up and never recovers.

Learning rate probably too high. Reduce it.

Other questions

How do I know when the network is making progress in training?

Look at the plots for the loss function. Loss should decrease over time. A good time to stop is when the network stops making ‘reasonable’ progress (see below). There’s also an early stopping function which can terminate training early if the network isn’t improving at a ‘reasonable’ rate.

When’s a good time to stop training?

This one’s on you, and it depends on how much your time is worth, how fast you train, and how confident you are in your network structure and training parameters. There’s no use training for three days if the loss function is barely moving, and you’re not confident in the network configuration anyway. Keep in mind the network saves best_model.h5 each epoch, so stopping early doesn’t ‘waste’ your work!

Why is my network accuracy really good during training, but much worse during testing?

This is most likely due to overfitting. However, another essential thing to note is that the test set contains images with just the class (horse or human) on a white background. If the network wasn’t picking up the class in the image and was, say, looking too much at the background, having no background during testing can throw it off. Perhaps try adding more convolution layers to pick out features more, or try changing the size of the images to allow more resolution for finding those features.

Why can’t we use images the network has seen before for the test set? Won’t that improve the networks classification score?

Yes, it will, but it’s all a lie. The network will seemingly perform very well during testing, and then perform much worse during any ‘real-world’ application. The point of the test set isn’t to make the numbers better arbitrarily; it’s so we get an idea of how this thing will actually perform in a new environment with unseen data.

What are the main parts of the program we can tweak, without making any major changes to the code?

  • epochs — number of training ‘runs’. More is generally better, but after a certain point the network stops improving.
  • imsize — resize the images
  • learning_rate — the higher it goes, the faster the network improves, but also means the network might end up ‘bouncing’ up and down as it can’t converge on a specific value.
  • dropout — dropout rate on a per-layer basis
  • imageGenerator.ipynb — change the number of augmented images, and how many get split into train/validate sets.

I have an Nvidia GPU and want to use it to speed up training

This is the Docker run command. Older articles incorrectly list nvidia-docker and other weird builds that Docker needed in older versions. Note that you’ll probably need some additional configuration/drivers/etc. along with this, so it probably won’t ‘just work’:

docker run -it --rm --runtime=nvidia --gpus all -v ~/MYSWEETFOLDER/ -p 8888:8888 demo-horsemans:latest

My cat unplugged my computer mid-training and I lost it all!

Not quite. Each epoch, the network saves last_model.h5 and best_model.h5 in the trained_models folder. Grab one of them and rename it and you’ve got the model where it was last. You’ll have to start training over to produce a different model, but at least it’s not a total loss.

Can this classify images with more than one object?

Not as written, though you could modify it. Look for ‘multi label image classification’.

Using your own images

Let’s assume we have various images of three classes: Bicycles, Cats, Cheese

  1. Put the images you want to use in the raw folder, separated into folders by class (/raw/bicycles, /raw/cats, /raw/cheese).
  2. Run the imageGenerator program to create augmented images.
  3. Split the images into training / validation / testing folders, keeping them in their respective subfolders (like in #1).
  4. Update num_classes in horsemansTraining to the number of classes you’re using (in this example, 3).
  5. Change source to be whatever you’re naming this experiment as. This is just so the output files get a different name, which makes it easier to figure out which network was trained on which data.
  6. Update class_names with the labels for the new class (eg, [‘bicycles’, ‘cats’, ‘cheese’]). Keep in mind this needs to be ordered correctly, so if it the labels are backward here, they’ll be backward when the network runs its test! Just change the order in class_names if that happens.
  7. Run it!
Now we celebrate