Understanding of Overview of Data Analytic

Machine Learning is a primary focus. Cover concepts of probability, statistics and regression analysis.

Related topics

  • Data mining
  • Statistical Learning
  • Pattern Recognition
More focus on algorithms which are going to be used to channelize data in useful patterns

Broad Classification
  • Supervised Learning
  • Unsupervised learning
These have already been defined in Introduction to Machine Learning.

Studying Further details:

Supervised Learning: Essentially a task of creating a function or a relationship on training data (historic data) - labeled data which has some explicit output variable. 

Functional and algorithmic mapping between these input and output variable.

Further Classified into
  • Classification Problem: 
    • Output variable is discrete categorical variable and not some continuous variable.
    • Predict as to which class the variable should belong to.
  • Regression Problem
    • Output variable is continuous variable. 
    • Marginally different from Regression Analysis
Unsupervised Learning: Task of creating patterns from data which have no explicit measure or signal guiding us - Data is not labeled. We have x variables.

Tools and Techniques
  • Advanced step towards Regression 
  • Logistic Regression (Categorical)
  • k-NN
  • Classification and Regression Trees - CART
  • Support Vector machine (SVM)
  • LDA/ QDA
  • ANN
  • Ensemble Methods
Major Topics in Unsupervised Learning:
  • Clustering: 
    • Task of grouping a set of objects into groups based on their similarities (common set of attributes/ features that these objects possess) 
      • k-NN
      • Hierarchical
      • Graph Based
      • Density based
  • Association Rule Mining (Market Basket Rule Mining)
    • Task of identifying relationship between features across the set of objects. 
    • Any variable can be part of rule mining
    • Not specific output.
    • Judge how good the rule is
Major Topics for Data Driven Decision Making

Design of Experiments
  • No Data
  • Need data
  • Conduct an Experiment
  • explicitly change the input variable and record data
Active Learning: 
  • Already have some data. 
  • Fairly expensive to gather this data. 
  • Partial knowledge. 
  • Sequentially query the system. 
  • Check which data we need to gather. 
  • Gather the critical data only in order to process it for insights
Area of reinforcement learning
  • No/ Partial data
  • No experiments to create data.
  • Possible because we cannot do experiments because it affects the end users (going offline)
  • Called Banded Problems
  • Form of experimentation or performance on data as well as you can online (live)
References:
Introduction to Data Analytic by www.nptel.iitm.ac.in 

Comments

Popular posts from this blog

Statistics and Graphical Models in Data Science

Significance of Woman in Data Science

Different types of clustering for textual documents