Image recognition

Image

Image Recognition

Image recognition refers to technologies that identify places, logos, people, objects, buildings, and several other variables in digital images. It may be very easy for humans like you and me to recognise different images, such as images of animals. We can easily recognise the image of a cat and differentiate it from an image of a horse. But it may not be so simple for a computer.

A digital image is an image composed of picture elements, also known as pixels, each with finite, discrete quantities of numeric representation for its intensity or grey level. So the computer sees an image as numerical values of these pixels and in order to recognise a certain image, it has to recognise the patterns and regularities in this numerical data. Image recognition should not be confused with object detection. In object detection, we analyse an image and find different objects in the image while image recognition deals with recognising the images and classifying them into various categories.

How does Image recognition work?

Typically the task of image recognition involves the creation of a neural network that processes the individual pixels of an image. These networks are fed with as many pre-labelled images as we can, in order to “teach” them how to recognize similar images.

So let me break the process for you in some simple steps:

  1. We need a dataset containing images with their respective labels. For example, an image of a dog must be labelled as a dog or something that we can understand.
  2. Next, these images are to be fed into a Neural Network and then trained on them. Usually, for the tasks concerned with images, we use convolutional neural networks. These networks consist of convolutional layers and pooling layers in addition to Multiperceptron layers(MLP). The working of convolutional and pooling layers are explained in the below.
  3. We feed in the image that is not in the training set and get predictions.

In the coming sections, by following these simple steps we will make a classifier that can recognise RGB images of 10 different kinds of animals.

Working of Convolutional and Pooling layers

Convolutional layers and Pooling layers are the major building blocks used in convolutional neural networks. Let us see them in detail

How does Convolutional Layer work?

The convolutional layer’s parameters consist of a set of learnable filters (or kernels), which have a small receptive field. These filters scan through image pixels and gather information in the batch of pictures/photos. Convolutional layers convolve the input and pass its result to the next layer. This is like the response of a neuron in the visual cortex to a specific stimulus. 

How does Pooling Layer work?

The pooling operation involves sliding a two-dimensional filter over each channel of the feature map and summarising the features lying within the region covered by the filter. A pooling layer is usually incorporated between two successive convolutional layers. The pooling layer reduces the number of parameters and computation by down-sampling the representation. The pooling function can be either max or average. Max pooling is commonly used as it works better. The pooling operation involves sliding a two-dimensional filter over each channel of the feature map and summarising the features lying within the region covered by the filter.