Day 4: What are Perceptrons?
Perceptrons are the building blocks of Neural Networks. How do Perceptrons work?
Jun 03, 2021

Perceptron
Perceptron was first discovered by Frank Rosenblatt in 1958 The Perceptron and was quoted as "the first machine which is capable of having an original idea" by him. It was inspired by an earlier model by McCullouch and Pitts.
Later it was carefully analyzed and refined by Minsky and Papert in 1969.
A perceptron can have n number of inputs, $ x_{1}, x_{2}, x_{3} ..... x_{n} $. In the model proposed by McCullouch and Pitts, we can not provide weights to the input. Rosenblatt introduced weights, $ w_{1}, w_{2}, w_{3} ..... w_{n} $ for respective input. This would help in implying the importance of each input.

Neuron output whether 0 or 1 was determined by the weighted sum of inputs $\sum_{i=1}^{n} w_{j} x_{j}$ and if they cross a certain threshold $\theta$ value or not
$$
\begin{aligned}
y &=1 \quad \text { if } \sum_{i=1}^{n} w_{i} * x_{i} \geq \theta \\
&=0 \quad \text { if } \sum_{i=1}^{n} w_{i} * x_{i}<\theta
\end{aligned}
$$
This can be rewritten as
$$
\begin{aligned}
y &=1 \quad \text { if } \sum_{i=1}^{n} w_{i} * x_{i}-\theta \geq 0 \\
&=0 \quad \text { if } \sum_{i=1}^{n} w_{i} * x_{i}-\theta<0
\end{aligned}
$$
Now instead of handling the $\theta$ on our own we instead treat it as an input. Therefore the above equation can be rewritten.
And the commonly used convention is.

Also, notice the change in the initialization of i from 1 instead of 0.
$$
\begin{aligned}
y &=1 \quad \text { if } \sum_{i=0}^{n} w_{i} * x_{i} \geq 0 \\
&=0 \quad \text { if } \sum_{i=0}^{n} w_{i} * x_{i}<0 \\
\text { where, } \quad x_{0} &=1 \quad \text { and } \quad w_{0}=-\theta
\end{aligned}
$$
The advantage over McCullouch and Pitts model
- We can introduce non-boolean inputs.
- The weights were not restricted to unity. Thus by changing the weights we can assign importance to the inputs.
- Unlike McCullouch and Pitts model there is no inhibitory or excitatory input.
The XOR affair and the AI winter!
A single Perceptron is incapable of implementing an XOR function. There are no perceptron solutions for non-linearly separated data. But the introduction of multi-layer perceptrons can easily solve these functions and hence used heavily in industry.
In his book, Perceptrons Minsky said certain functions can not be represented by Perceptrons (XOR function). This literally killed the field for decades. Later research with multi layer perceptrons showed how can such functions be implemented thus saving the field.
- Wikipedia