TL;DR: One way to get uncertainty estimates in ML is to have multiple models, like a NN ensemble, and use the disagreement between their predictions as an estimate of uncertainty. This is computationally expensive, as it requires training and evaluating multiple models. Because NNs tend to be heavily overparametrized, we hypothesize that a single network’s excess capacity can be used to make diverse predictions. Specifically, we perform probabilistic reasoning over the depth of neural networks. Different depths correspond to subnetworks that share weights. Disagreement among their predictions yields model uncertainty. By exploiting the sequential structure of feed-forward networks, we are able to both evaluate our training objective and make predictions with a single forward pass.
Here is a cool gif of a Depth Uncertainty Network training: