The embedding can be trained with a model the one shown here:
As shown in the preceding image, the target word is predicted based on context or history. The prediction is based on the Softmax classifier. The hidden layer learns the embedding as a compact representation. Note that this is not a full deep learning model, but it still works well. Here is a low dimensional visualization of the embedding:
This visualization is generated using TensorBoard. ...