1. Preprocessing
    1. Tokenization
    2. Normalization
      • stop words
      • Stemming
      • Lemmatization
    3. n-gram (optional)
  2. Parsing
    • POS tagging
    • Syntactical analysis
    • Abstract meaning representation
  3. Word sense
    • Knowledge graph
    • Word sense disambiguation
    • Named Entity Recognition
    • Embeddings

Reference:

  • Jon Ezeiza Alvarez, and Hannah Bast. “A Review of Word Embedding and Document Similarity Algorithms Applied to Academic Text.” Bachelor’s Thesis, 2017. http://ad-publications.informatik.uni-freiburg.de/theses/Bachelor_Jon_Ezeiza_2017.pdf.