Brief Introduction to Data Science and Hadoop

Getting data and performing some analysis to give visualization of data. This should be accompanied by results comparable to some ground truth.

Data Science refers to an emergence area of work concerned with collection, preparation, analysis, visualization, management and preservation of large collections of information [1]. Although we studied that Data scientist is introduced with question and big data, but Data Science is about building data products and not just answering the questions

Data Science in relation with
·        Business intelligence
·         Statistics
·         Database management
·         Visualization
·         Machine learning


Future Scope for Data Science:
  • 1.     Databases: Unstructured Data
  • 2.       Statistics: To fit data in memory
  • 3.       Computers: Statistical modelling and communication of results
  • 4.       Business Analysis: Algorithms and tradeoffs at scale

Hadoop is an implementation of an abstraction called MapReduce
Tool
Abstraction
Hadoop
MapReduce
PostgreSQL
Relational Algebra
R
Logistical Regression
Tableau
Visualization

eScience is data science about analysis of  real time data like that of web, astronomy, oceanography and life sciences

Data comes from
·         Customer: orders, click stream, advertisement clicks


References

Jeffery Stanton, “An introduction to data science”

Comments

Popular posts from this blog

Different Fields for Data Science

Different types of clustering for textual documents

Significance of Woman in Data Science