Neuron

How to make image recognition model — Data Science

Nao Kawakami
5 min readJan 6, 2021

Summary

I will make Neural Network model to detect certain one object by training the model with its images.

Process

  1. Prepare images which has an object you want to detect, and also images which does not have the object
  2. Get mode information of the images
  3. Reshape the images to square
  4. Resize the images
  5. Reshape dimensionality
  6. Make Neural Network model

Prepare images which has an object you want to detect, and also images which does not have the object

Get images to detect and mix those images with any images, but the any images must not have the object you want to detect. I will create a model which detects the object.

This is Notre Dame. I will detect this.

There are many images which were taken at various angle and colored with some filters. I pick up pictures which look similar in perspective of colors and angles. I used 300 images for this model

Other image can be anything. but quantity of picture should be close to Notre Dame. I will collect 200–400 pictures.

Get mode information of the images

Clean and format images. Some pictures are taken in gray scale. Some are in color, and some has alpha information. In order to train a model, I need to feed formatted images so I will collect only RGB images.

If I do not format the images, the shapes will be different and I cannot train a model with different shaped images.

print(img1.shape) # This is color image
print(img2.shape) # This is gray scale image

Result

(354, 655, 3) # Color images has `3` at 3rd dimension
(443, 235, 1) # Gray scale images has `1` at 3rd dimension

I will format all of the 3 dimension but first of all, I want to get only colored images, which means I will get images which have 3 at 3rd index in its shape.

Import

# Import modulesimport requests
from PIL import Image
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os

I can get mode (color information) of images with using PIL.Image module. .mode method gives you the mode of image

img = Image.open('path'+image) # Get image data of the `path`
print('Color mode: ', img.mode)

Result

Color mode:  L

I found below results

  • Color mode: RGB
  • Color mode: L
  • Color mode: P
  • Color mode: RGBA

I want to have only RGB images so I will filter out the others.

mode = [] # List to store image data
for image in os.listdir('path'): # Get file names in the `path`
img = Image.open('path'+image) # Get image data of each file
if img.mode == 'RGB': # Filer
mode.append(img) # Store `RGB` images

Reshape the images to square

Now I have only colored images but height and width are still various

(233, 643, 3) # Shape of first image
(466, 877, 3) # Shape of second image
(79, 55, 3) # Shape of third image
(1256, 1560, 3) # Shape of forth

I formatted 3rd dimension to 3 at above section. I want to have same height and width for all images. I do not want to lose ratio information between height and width. However if I shrink or expand images with same ratio of height and width, I cannot get consistent height and width of all of images.

#Expand/shrink with same ratio between height and width# First image x 2 != second image
(233, 643, 3) x 2 = (466, 1286, 3) != (466, 877, 3)

So I will add background to images and convert all images to square. I can expand/shrink square images to get all of images in same scale remaining ratio between height and width.

Original image. This width is bigger than height, so I added background to let it have size of width x width

After adding background. Now this image is square.

I processed this on all of images and obtained square images.

img = Image.open('file path') # Get image data# Convert image data to `nparray` to get image shape
width = np.array(img).shape[0] # width of the image
height = np.array(img).shape[1] # height of the image
# If height is bigger than width, make square of height
if height > width:
# Make background which shaped height x height
result = Image.new(img.mode, (height, height))
result.paste(img)# Combine bacground and original image
return result
# If width is bigger than height, make square of width
elif height < width:
result = Image.new(img.mode, (width, width))
result.paste(img)
return result
else: # If an image is originally square, do nothing
return img

Resize the images

Now I resize all images with consistent height and width. I chose to use 256 x 256 images.

img.resize((256, 256))

Reshape dimensionality

I will set X and y variables. Now I have list of images and let’s look at the shape.

print(X.shape)
print(X[0].shape)

Result

(456,) # Number of images
(256, 256, 3) # Shape of each image

This means X has 456 rows and each of row shapes (246, 256,3). In order to fit a model with this X variables, I want to combine the dimensionality.

# So I want to convert this(456,)
(256, 256, 3)
# to this(456, 256, 256, 3)

I will make 4 dimension array first then add 3 dimensional images

# Make first row of `X` variable. This has 4 dimension
X = np.array(binary[0]).reshape(1, 256, 256, 3)
# Then add each iamge to X
for i in range(1, len(image_list)):
X = np.insert(X, 1, image_list[i], axis=0)

Let’s check the shape.

X.shape

Result

(456, 256, 256, 3)

Looks good, I got formatted Xvariable.

Now I will prepare y variable. I will set 1 for Notre Dame images and 0 for other images.

# Add 1 to y variable with length of `notredame`
y = [1 for _ in range(len(notredame))]
# Add 0 to y variable with lengh of `other`
for _ in range(len(other)):
y.append(0)

Make Neural Network model

I will use Sequential in Keras to assemble a model.

# Get Train and Test data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, stratify=y)
# Neural Network model
model = Sequential()
model.add(Flatten(input_shape=(256, 256, 3))) # Input shape of image
model.add(Dense(256, activation='relu'))
model.add(Dense(128, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='mse',optimizer='adam', metrics=['accuracy'])model.fit(X_train, y_train, batch_size=256, validation_data=(X_test, y_test), epochs=10, verbose=1)

Result

Train accuracy: 0.73
Test accuracy: 0.73

So this model recognizes Notre Dame in pictures at 73% accuracy.

--

--