Difference between Hadoop, HDFS, Big Data, MapReduce, Data Science | Who is data scientist

Data Science

Data Science is the science to study data and convert it into actionable form. . For this, there are number of frameworks available. Data Science inherently deals with huge amount of data which is analyzed by framework. 

 

          Terminologies of Data Science


Data Science is an art for converting the big data to actionable form. Big Data, huge amount of data, is the input for this processing. Hadoop and Spark are frameworks to work with Big Data and perform Data Science over it. HDFS (Hadoop Distributed File System) is the concept of Hadoop storage through which it distributes data over multiple nodes and maintains it thereby. Map Reduce is the concept of machine learning when data is mapped (grouped) on basis of one criteria and Reduce in another form (on the basis of other criteria).

 

         Data Scientist


Data scientist is someone who finds new discoveries with data. They look for meaning/ knowledge of data. They look for patterns in data. Knowledge of Mathematics, statistics and computers is essential for any data scientist. To be a good data scientist, implementation of the most optimum algorithm is the need of an hour.
Data scientist is given big data and a question to answer. Different patterns are analyzed using differently discovered algorithms on these input parameters to give best results.

The knowledge of data scientist lies somewhere between

  1. ·         Programming
  2. ·         Business Intelligence
  3. ·         Algorithms
This is probably an indifferent skill set. It does not deal only with programming and implementations but with discovering new algorithms and finding predictive knowledge.





Comments

Popular posts from this blog

Different Fields for Data Science

Different types of clustering for textual documents

Significance of Woman in Data Science