【Method】Word Embedding
Definition
A statistical language model is a probability distribution over sequences of words. Given such a sequence, say of length m, it assigns a probability \(P(w_{1},\ldots ,w_{m})\) to the whole sequence.
As the core component of Natural Language Processing (NLP) system, Language Model (LM) can provide word representation and probability indication of word sequences.
Neural Network Language Models (NNLMs) overcome the curse of dimensionality and improve the performance of tranditional LMs.
Classic Neural Network Language Models
One-hot encoder
Bag-of-word Model
- TF-IDF
FFNN Language Models
-
autoencoder network (GAN)
- Feedforward Neural Network Language Models
- word2vec
- CBOW
- Skip-gram
RNN Language Models
LSTM-RNN Language Models
Improved Techniques
Techniques for Reducing Perplexity
- Character-Aware Models
- Factored Models
- Bidirectional Models
- Caching
- Attention
- Transformer: first transduction model relying entirely on self-attention to compute representations of its input and output without using sequence-aligned RNNs or convolution.
- BERT
Speed-up Techniques on Large Corpora
- Hierarchical Softmax
- Sampling-based Approximations
Reference
- Wikipedia: language model
- Jing, Kun, and Jungang Xu. “A Survey on Neural Network Language Models,” 2012.
- Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. “Attention Is All You Need.” Advances in Neural Information Processing Systems 2017-Decem, no. Nips (2017): 5999–6009.
- 快速解读Google BERT模型 + Word Embedding