Understanding of Overview of Data Analytic
Machine Learning is a primary focus. Cover concepts of probability, statistics and regression analysis.
Related topics
Related topics
- Data mining
- Statistical Learning
- Pattern Recognition
More focus on algorithms which are going to be used to channelize data in useful patterns
Broad Classification
- Supervised Learning
- Unsupervised learning
These have already been defined in Introduction to Machine Learning.
Studying Further details:
Supervised Learning: Essentially a task of creating a function or a relationship on training data (historic data) - labeled data which has some explicit output variable.
Functional and algorithmic mapping between these input and output variable.
Further Classified into
- Classification Problem:
- Output variable is discrete categorical variable and not some continuous variable.
- Predict as to which class the variable should belong to.
- Regression Problem
- Output variable is continuous variable.
- Marginally different from Regression Analysis
Unsupervised Learning: Task of creating patterns from data which have no explicit measure or signal guiding us - Data is not labeled. We have x variables.
Tools and Techniques
- Advanced step towards Regression
- Logistic Regression (Categorical)
- k-NN
- Classification and Regression Trees - CART
- Support Vector machine (SVM)
- LDA/ QDA
- ANN
- Ensemble Methods
Major Topics in Unsupervised Learning:
- Clustering:
- Task of grouping a set of objects into groups based on their similarities (common set of attributes/ features that these objects possess)
- k-NN
- Hierarchical
- Graph Based
- Density based
- Association Rule Mining (Market Basket Rule Mining)
- Task of identifying relationship between features across the set of objects.
- Any variable can be part of rule mining
- Not specific output.
- Judge how good the rule is
Design of Experiments
- No Data
- Need data
- Conduct an Experiment
- explicitly change the input variable and record data
Active Learning:
- Already have some data.
- Fairly expensive to gather this data.
- Partial knowledge.
- Sequentially query the system.
- Check which data we need to gather.
- Gather the critical data only in order to process it for insights
Area of reinforcement learning
- No/ Partial data
- No experiments to create data.
- Possible because we cannot do experiments because it affects the end users (going offline)
- Called Banded Problems
- Form of experimentation or performance on data as well as you can online (live)
References:
Introduction to Data Analytic by www.nptel.iitm.ac.in
Comments
Post a Comment