【Method】Word Embedding

Definition

A statistical language model is a probability distribution over sequences of words. Given such a sequence, say of length m, it assigns a probability \(P(w_{1},\ldots ,w_{m})\) to the whole sequence.

As the core component of Natural Language Processing (NLP) system, Language Model (LM) can provide word representation and probability indication of word sequences.

Neural Network Language Models (NNLMs) overcome the curse of dimensionality and improve the performance of tranditional LMs.

Classic Neural Network Language Models

One-hot encoder

Bag-of-word Model

TF-IDF

FFNN Language Models

autoencoder network (GAN)
Feedforward Neural Network Language Models
word2vec
- CBOW
- Skip-gram

RNN Language Models

LSTM-RNN Language Models

Improved Techniques

Techniques for Reducing Perplexity

Character-Aware Models
Factored Models
Bidirectional Models
Caching
Attention
- Transformer: first transduction model relying entirely on self-attention to compute representations of its input and output without using sequence-aligned RNNs or convolution.
- BERT

Speed-up Techniques on Large Corpora

Hierarchical Softmax
Sampling-based Approximations

Reference

Wikipedia: language model
Jing, Kun, and Jungang Xu. “A Survey on Neural Network Language Models,” 2012.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. “Attention Is All You Need.” Advances in Neural Information Processing Systems 2017-Decem, no. Nips (2017): 5999–6009.
快速解读Google BERT模型 + Word Embedding