【Method】NLP Preprocessing
- Preprocessing
- Tokenization
- Normalization
- stop words
- Stemming
- Lemmatization
- n-gram (optional)
- Parsing
- POS tagging
- Syntactical analysis
- Abstract meaning representation
- Word sense
- Knowledge graph
- Word sense disambiguation
- Named Entity Recognition
- Embeddings
Reference:
- Jon Ezeiza Alvarez, and Hannah Bast. “A Review of Word Embedding and Document Similarity Algorithms Applied to Academic Text.” Bachelor’s Thesis, 2017. http://ad-publications.informatik.uni-freiburg.de/theses/Bachelor_Jon_Ezeiza_2017.pdf.