Understanding of Overview of Data Analytic

Machine Learning is a primary focus. Cover concepts of probability, statistics and regression analysis.

Related topics

  • Data mining
  • Statistical Learning
  • Pattern Recognition
More focus on algorithms which are going to be used to channelize data in useful patterns

Broad Classification
  • Supervised Learning
  • Unsupervised learning
These have already been defined in Introduction to Machine Learning.

Studying Further details:

Supervised Learning: Essentially a task of creating a function or a relationship on training data (historic data) - labeled data which has some explicit output variable. 

Functional and algorithmic mapping between these input and output variable.

Further Classified into
  • Classification Problem: 
    • Output variable is discrete categorical variable and not some continuous variable.
    • Predict as to which class the variable should belong to.
  • Regression Problem
    • Output variable is continuous variable. 
    • Marginally different from Regression Analysis
Unsupervised Learning: Task of creating patterns from data which have no explicit measure or signal guiding us - Data is not labeled. We have x variables.

Tools and Techniques
  • Advanced step towards Regression 
  • Logistic Regression (Categorical)
  • k-NN
  • Classification and Regression Trees - CART
  • Support Vector machine (SVM)
  • LDA/ QDA
  • ANN
  • Ensemble Methods
Major Topics in Unsupervised Learning:
  • Clustering: 
    • Task of grouping a set of objects into groups based on their similarities (common set of attributes/ features that these objects possess) 
      • k-NN
      • Hierarchical
      • Graph Based
      • Density based
  • Association Rule Mining (Market Basket Rule Mining)
    • Task of identifying relationship between features across the set of objects. 
    • Any variable can be part of rule mining
    • Not specific output.
    • Judge how good the rule is
Major Topics for Data Driven Decision Making

Design of Experiments
  • No Data
  • Need data
  • Conduct an Experiment
  • explicitly change the input variable and record data
Active Learning: 
  • Already have some data. 
  • Fairly expensive to gather this data. 
  • Partial knowledge. 
  • Sequentially query the system. 
  • Check which data we need to gather. 
  • Gather the critical data only in order to process it for insights
Area of reinforcement learning
  • No/ Partial data
  • No experiments to create data.
  • Possible because we cannot do experiments because it affects the end users (going offline)
  • Called Banded Problems
  • Form of experimentation or performance on data as well as you can online (live)
References:
Introduction to Data Analytic by www.nptel.iitm.ac.in 

Comments

Popular posts from this blog

Different Fields for Data Science

Different types of clustering for textual documents

Significance of Woman in Data Science