Understanding basic points for Data Extraction

Data science is a science behind business decisions. Different business start-ups are accompanied by new concepts

Hadoop is non- real time framework. It works only on stored data. However, Apache 
Spark is real-time framework. Few Applications of big data analysis are
  1. Prediction of elections results
  2. Sentiment Analysis
  3. Page Rank
  4. Bibliometrics
  5. Food pairing
  6. Popularity over time
  7. Flu trends
  8. Traffic analysis
  9. Side effects associated with particular drug
  10. Predict Earthquake

Data extraction undergoes
  • Collecting data
  • Cleaning data
  • Integrating data

This can be performed using
  • Python programming
  • R programming
  • MSBI – SSIS package
  • Using importXML in MS Excel
  • Using scraping tools
  • Flume tool in Hadoop

Different formats of extraction
  • JSON
  • XML
  • String format (programming)

Comments

Popular posts from this blog

Different types of clustering for textual documents

Different Fields for Data Science

Significance of Woman in Data Science