Common regularization techniques in machine learning - Dropout
Dropout is a regularization technique that helps improve the generalization of neural networks by preventing overfitting. Overfitting occurs when a model performs exceptionally well on the training data but struggles to generalize to unseen data.
Suppose we have a neural network with one input layer, two hidden layers, and one output layer. The input layer has 10 neurons, the first hidden layer has 20 neurons, the second hidden layer has 15 neurons, and the output layer has 5 neurons.
During training, dropout randomly sets a fraction of the inputs to zero at each update. Let's assume we set a dropout rate of 0.2 (20%). This means that during each update, approximately 20% of the neurons in the hidden layers will be randomly dropped out (set to zero).
For example, during the first update, the dropout algorithm randomly selects which neurons to drop out. Let's say it chooses to drop out neurons 3, 6, 12, and 17 in the first hidden layer, and neurons 4 and 9 in the second hidden layer.
As a result, the inputs of these dropped out neurons become zero, and their connections are temporarily removed. The remaining active neurons in the hidden layers and the neurons in the input and output layers continue to process information.
During the forward pass, the active neurons use their weighted connections to generate outputs based on the inputs they receive. The dropped out neurons are effectively ignored during this pass.
This process of randomly dropping out neurons is repeated for each update during training. By doing so, the network is forced to learn more robust representations because it cannot rely too heavily on any single input or feature. The network learns to make predictions by collectively considering different subsets of neurons in each forward pass.
It is important to mention that during testing or inference, dropout is turned off, and all the neurons are active again. However, to ensure consistency, the weights of the neurons are scaled down by the dropout rate used during training. This scaling factor ensures that the predictions made during inference reflect the influence of all neurons in the network.