Different Fields for Data Science

Data science, also known as data-driven science, is an interdisciplinary field about scientific methods, processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured,similar to Knowledge Discovery in Databases (KDD).
Data science is a "concept to unify statistics, data analysis and their related methods" in order to "understand and analyze actual phenomena" with data. It employs techniques and theories drawn from many fields within the broad areas of mathematics, statistics, information science, and computer science, in particular from the subdomains of machine learning, classification, cluster analysis, data mining, databases, and visualization.

Turing award winner Jim Gray imagined data science as a "fourth paradigm" of science (empirical, theoretical, computational and now data-driven) and asserted that "everything about science is changing because of the impact of information technology" and the data deluge.
When Harvard Business Review titled "The Sexiest Job of the 21st Century" the term became a buzzword, and is now often applied to business analytics, or even arbitrary use of data, or used as a sexed-up term for statistics. While many university programs now offer a data science degree, there exists no consensus on a definition or curriculum contents. Because of the current popularity of this term, there is a lot of "advocacy efforts" surrounding it.



Fig 1: Different fields for Data Science


1. Statistics: Statistics techniques are needed to explore data and answer questions from the data by finding common patterns. Some examples include the ability to create predictive models (ex: who is most likely to win the NCAA basketball championship?), cluster groups of information (ex: what types of shoppers are there?), and detect anomalies (ex: which bank transactions are untypical?). 


2. Computer Science: Computer Science techniques are needed for most backend processes involved in Data Science. This includes knowing which technologies to use to ingest, process, and manage data. This also includes understanding how to scale systems as data grows and where to apply various algorithms for the best computational efficiency. For example, say you work for a bank and have identified a common pattern in the data for fraudulent transactions using statistical techniques. Now you want to take those techniques and apply them to all your customers (future and current) on an ongoing basis. Computer Science techniques will help you do this. 


3. Communication: Being able to get to results is one thing. If you are unable to demonstrate to stakeholders why your results are important or what they mean, your analysis will do little to nothing for the business. Communicating results goes beyond basic communication skills and public speaking. It includes building practical models (even if it's not the 'best' model), and visualizing your results. A sub-category of Communication might also be Business. You need to understand the bigger picture of why you are doing what you are doing. There are many Data Scientists that can do really cool complex things, but few that recognize the simple practical solution that may be sitting right in front of them.

Comments

Popular posts from this blog

Analysis and Research trends using Word Co-occurrence Network

Schedule for Machine Learning