Stage 1: The Single Perceptron
The simplest neural network: a single computational unit that computes a weighted sum of inputs, adds a bias, and applies a step activation function.
where θ(z) = 1 if z ≥ 0, else 0 (Heaviside step function)
Perceptron Diagram
Decision Boundary
Stage 2: Logic Gates with Perceptrons
A single perceptron can implement linearly separable Boolean functions.
Perceptron Weights
Decision Boundary
Stage 3: The XOR Problem
Minsky & Papert (1969) proved that XOR cannot be solved by a single perceptron. This limitation nearly killed the field of neural networks for over a decade.
XOR Truth Table
| x₀ | x₁ | XOR |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 0 |
No single hyperplane (line in 2D) can separate the class-0 and class-1 points. Any line will misclassify at least one point.
Impossible Separation
The dashed lines show attempted decision boundaries — every attempt fails to correctly separate all four points.
Stage 4: Multi-Layer Perceptron Solves XOR
By adding a hidden layer with 2 neurons, we can solve XOR. The hidden layer transforms the input space so that the classes become linearly separable.
MLP Architecture (2→2→1)
Step-Through Computation
Click Next to walk through the forward pass for each input pair.
Stage 5: Cybenko's Universal Approximation
Cybenko (1989) proved that a neural network with a single hidden layer and sigmoid activations can approximate any continuous function on a compact set to arbitrary precision.
Building a Tower from Step Functions
θ(x + s) — step up
θ(−x + s) — step down
Tower = step up + step down − 1
Approximating a Function with Towers
More towers → better approximation of the target function (shown in orange).
Stage 6: The Rise of Deep Networks
While a single hidden layer is theoretically sufficient (Cybenko), deep networks achieve the same expressiveness with exponentially fewer parameters. Depth enables hierarchical feature learning.
Deep Network Architecture
Historical Timeline
Rosenblatt
Minsky & Papert
Rumelhart et al.
Cybenko
Krizhevsky et al.
Vaswani et al.