Introduction to Topic Modeling

If you do not have labels, it is unsupervised learning
If you have label, it is supervised learning

Topic model is the type of statistical model which allows you to discover topic about a document which consist of cluster of words which frequently occurs together.

Different functionality of Topic Modeling

  • Find Latent variables regarding structure of document
  • Clustering of words together

Topic model belongs to fuzzy clustering where each document can belong to different degree to a different cluster.

Workflow of Topic Models
1. Input documents
2. Passing through topic models
3. Get topic (clusters of words (A word can belong to more than one topic))

Topic: Distribution of frequencies of word that are core in that topic
Document: Distribution of topics.

Different types of Algorithms: LSA, LDA, HDP etc.
Most important one: LDA (unsupervised)

Inconvenience of LDA: 

  1. Number of topics you want to find (k)
  2. Number of iterations that you have to iterate over
Results answer two questions in LDA
  1. Which topics occurs in this document
  2. What should be the topic for the word X in that document




Comments

Popular posts from this blog

Different Fields for Data Science

Different types of clustering for textual documents

Significance of Woman in Data Science