Understanding of Overview of Data Analytic

July 15, 2015

Machine Learning is a primary focus. Cover concepts of probability, statistics and regression analysis.

Related topics

Data mining
Statistical Learning
Pattern Recognition

More focus on algorithms which are going to be used to channelize data in useful patterns

Broad Classification

Supervised Learning
Unsupervised learning

These have already been defined in Introduction to Machine Learning.

Studying Further details:

Supervised Learning: Essentially a task of creating a function or a relationship on training data (historic data) - labeled data which has some explicit output variable.

Functional and algorithmic mapping between these input and output variable.

Further Classified into

Classification Problem:

Output variable is discrete categorical variable and not some continuous variable.
Predict as to which class the variable should belong to.

Regression Problem

Output variable is continuous variable.
Marginally different from Regression Analysis

Unsupervised Learning: Task of creating patterns from data which have no explicit measure or signal guiding us - Data is not labeled. We have x variables.

Tools and Techniques

Advanced step towards Regression
Logistic Regression (Categorical)
k-NN
Classification and Regression Trees - CART
Support Vector machine (SVM)
LDA/ QDA
ANN
Ensemble Methods

Major Topics in Unsupervised Learning:

Clustering:

Task of grouping a set of objects into groups based on their similarities (common set of attributes/ features that these objects possess)

k-NN
Hierarchical
Graph Based
Density based

Association Rule Mining (Market Basket Rule Mining)

Task of identifying relationship between features across the set of objects.
Any variable can be part of rule mining
Not specific output.
Judge how good the rule is

Major Topics for Data Driven Decision Making

Design of Experiments

No Data
Need data
Conduct an Experiment
explicitly change the input variable and record data

Active Learning:

Already have some data.
Fairly expensive to gather this data.
Partial knowledge.
Sequentially query the system.
Check which data we need to gather.
Gather the critical data only in order to process it for insights

Area of reinforcement learning

No/ Partial data
No experiments to create data.
Possible because we cannot do experiments because it affects the end users (going offline)
Called Banded Problems
Form of experimentation or performance on data as well as you can online (live)

References:

Introduction to Data Analytic by www.nptel.iitm.ac.in

Search This Blog

e Study India

Understanding of Overview of Data Analytic

Comments

Post a Comment

Popular posts from this blog

Statistics and Graphical Models in Data Science

Analysis and Research trends using Word Co-occurrence Network

Different Fields for Data Science