Understanding basic points for Data Extraction
Data
science is a science behind business decisions. Different business start-ups are
accompanied by new concepts
Hadoop
is non- real time framework. It works only on stored data. However, Apache
Spark is real-time framework. Few Applications of big data analysis are
- Prediction of elections results
- Sentiment Analysis
- Page Rank
- Bibliometrics
- Food pairing
- Popularity over time
- Flu trends
- Traffic analysis
- Side effects associated with particular drug
- Predict Earthquake
Data extraction undergoes
- Collecting data
- Cleaning data
- Integrating data
This can be performed using
- Python programming
- R programming
- MSBI – SSIS package
- Using importXML in MS Excel
- Using scraping tools
- Flume tool in Hadoop
Different formats of extraction
- JSON
- XML
- String format (programming)
Comments
Post a Comment