Brief Introduction to Data Science and Hadoop
Getting data and performing some analysis to give
visualization of data. This should be accompanied by results comparable to some
ground truth.
Data Science refers to an emergence area of work concerned
with collection, preparation, analysis, visualization, management and
preservation of large collections of information [1]. Although we studied that Data scientist is introduced with question and big data, but Data Science is about building data products and not just
answering the questions
Data Science in relation with
· Statistics
· Database management
· Visualization
· Machine learning
Future Scope for Data Science:
- 1. Databases: Unstructured Data
- 2. Statistics: To fit data in memory
- 3. Computers: Statistical modelling and communication of results
- 4. Business Analysis: Algorithms and tradeoffs at scale
Hadoop is an implementation of an abstraction called
MapReduce
Tool
|
Abstraction
|
Hadoop
|
MapReduce
|
PostgreSQL
|
Relational Algebra
|
R
|
Logistical Regression
|
Tableau
|
Visualization
|
eScience is data science about analysis of real time data like that of web, astronomy,
oceanography and life sciences
Data comes from
References
Jeffery Stanton, “An introduction to data science”
Comments
Post a Comment