Day 2: What are Neural Networks?

What are the main components of a Neural Network? How does a neural network work?

By Nandeshwar

Jun 02, 2021

What is a Neural Network?

As the name suggests a neural network is based on biological neural networks in our brain. Our brain is made up of neurons which interact with each other via signals through connection known as synapses. Each neuron sends signals to other neurons based on the signals they themselves receive from other neurons.

Similarly, a neural network has layers of neurons (or nodes) that are highly interconnected with other neurons. The neurons are organized in the form of layers and typically hold some information. Training data is introduced to the neural network through the input layer which is then communicated to hidden layers. A neural network can have one or more hidden layers and this is where all the processing actually happens. The last hidden layer is linked to the output layer which has one or more neuron for each possible desired output.

How components of Neural Network work?

Layers

A neural network has a range of layers.

Input Layer - There is one input layer in a neural network. Training data is introduced through this layer.
Hidden Layers - There can be one or more hidden layers in a neural network depending upon how deep or shallow it is. This is where the processing actually happens.
Output Layer - There is one output layer in a neural network. This layer is connected to the last hidden layer. This layer provides the result.

Neuron

As stated above a neuron holds some information and acts as a mathematical function. For the input layer, the neurons are fed information from the input data. This data is further processed at each neuron level and then passed to all the neurons of the next layer. In the case of data like images and text, they are first encoded into numerical values then fed to the neural network.

Each neuron is also assigned a weight W_k . These weights are what makes each neuron unique. They are fixed during testing, but during training these are the numbers we’re going to change in order to ‘tune’ our network.

Usually, from all the input coming to a neuron following steps happen

A Weighted sum is calculated from the input.
A Bias is added to it.
This value is passed through an activation function (read below).
The activation function decides to further pass this data to the next layer.

Activation Functions

The activation function regulates which neuron should be activated (or fired) to send data to the next layer. When we say a neuron is activated or triggered or fired we simply mean if that neuron will pass the containing data to the next layer of neurons. The value of an input can be anything between -inf to +inf. Activation functions bound these values typically between 0 and 1, based on this bounded value they decide if it should be activated or not

Activation functions introduce non-linearity to the model.

Why do we need non-linearity in a model? [Why we actually need activation functions?]

A neural network without an activation function is essentially just a linear regression model. The activation function provides the necessary non-linear transformations to the input to make neural networks more capable and take informed decisions.

Let's take an example of a Relu Function. It is widely used as an activation function among neural networks

$$ g(z) = max\lbrace 0, z \rbrace $$

When we code this in python it gives

# rectified linear function
def rectified(x):
	return max(0, x)

For the different value of x it will be

$$ rectified(10) = 10 $$

$$ rectified(-1) = 0 $$

$$ rectified(-10) = 0 $$

$$ rectified(1) = 1 $$

In this manner, the Relu function limits the neurons whose value is less than 0, and those above are activated.

Loss Function

The only thing left to define before we start talking about the training is Loss Function.

As quoted on Wikipedia :

A loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost" associated with the event.

The objective of this function is to evaluate how well an algorithm works. In our case of a neural network, if the accuracy of predictions is poor, your loss function will output a higher value. This function helps in evaluating our decisions to improve the accuracy of the model. If the Loss function is big then our network doesn’t perform very well, we want as small a number as possible.

These are divided into three categories i.e.

Regression loss - Regression Loss is used when we are predicting continuous values like the price of a house or sales of a company.
Binary Classification Loss. - Suppose we are dealing with a Yes/No situation like “a person has diabetes or not”, in this kind of scenario Binary Classification Loss Function is used.
Multi-Class Classification - If we take a dataset like Iris where prediction is based on the three-class labels: Setosa, Versicolor, and Virginia, in such cases where the target variable has more than two classes Multi-Class Classification Loss function is used.

Some of the commonly used loss functions are:

Mean Squared Error (MSE)
Hinge Loss
Binary Cross Entropy Loss
Categorical Cross Entropy Loss
Kullback Leibler Divergence Loss

We will cover them in another blog post.

How does training a Neural Network work?

Initialize weights

When a Neural Network is initialized, random weights and biases are assigned for each neuron. Obviously, it won’t give very good results. In the process of training, we want to start with a bad-performing neural network and wind up with a network with high accuracy. In terms of the loss function, we want our loss function to much lower at the end of training.

Training and Optimization

Improving the network is possible because we can change its function by adjusting weights. We want to find another set of weights that performs better than the initial one.

From here training of a neural network becomes a problem of minimizing the loss function..

There is a range of optimization algorithms available for the same purpose. The mathematical method behind this is called Gradient Descent Algorithm which is a topic that deserves another blog post.

What we are looking for is a certain set of weights, with which the neural network can make an accurate prediction, which automatically leads to a lower value of the loss function.

Backpropagation

The only thing left to do about gradient-based algorithms is how we compute the gradient. The fastest method for calculating would be to analytically find the derivative for each neural network architecture. It is already difficult to calculate derivatives for simple models with only 6 parameters. Modern architectures have millions of them.

Given an artificial neural network and a loss function, the method calculates the gradient of the loss function with respect to the neural network's weights. This makes it feasible to use gradient methods for training multilayer networks, updating weights to minimize loss.

When the model gives high accuracy after many iterations of this process. The weights of that model are saved and are further used for prediction.

What are the limitations of a neural network?

Unexplained functioning of the network - Deep Learning algorithms work as a black box and do not allow the user to understand the actual underlying pattern. Simply put, you don’t know how or why your NN came up with a certain output.
Hardware Dependence - Modern neural networks have millions of parameters. To run such kinds of models very high processing power is needed.
Data - Neural networks usually require much more data than traditional machine learning algorithms, as in at least thousands if not millions of labeled samples.

Mathematical part and understanding of Neural Network in next blog.

References

https://towardsdatascience.com/artificial-neural-networks-for-total-beginners-d8cd07abaae4