The autoencoder that we saw in the previous recipe worked more like an identity network--they simply reconstruct the input. The emphasis is to reconstruct the image at the pixel level, and the only constraint is the number of units in the bottleneck layer; while it is interesting, pixel-level reconstruction does not ensure that the network will learn abstract features from the dataset. We can ensure that the network learns abstract features from the dataset by adding further constraints.
In sparse autoencoders, a sparse penalty term is added to the reconstruction error, which tries to ensure that fewer units in the bottleneck layer will fire at any given time. If m is the total number of input patterns, then we can define ...