Different Fields for Data Science
Data science, also known as data-driven science, is an
interdisciplinary field about scientific methods, processes and systems to
extract knowledge or insights from data in various forms, either structured or unstructured,similar to Knowledge
Discovery in Databases (KDD).
Data science is a "concept to
unify statistics, data analysis and their related methods" in order to
"understand and analyze actual phenomena" with data. It
employs techniques and theories drawn from many fields within the broad areas
of mathematics, statistics, information science, and computer science, in particular from the subdomains of machine
learning, classification, cluster analysis, data mining, databases, and visualization.
Turing award winner Jim
Gray imagined data science as a "fourth paradigm" of
science (empirical, theoretical, computational and now data-driven) and asserted that
"everything about science is changing because of the impact of information
technology" and the data deluge.
When Harvard
Business Review titled "The Sexiest Job of the 21st Century" the
term became a buzzword, and is now often applied to business
analytics, or even arbitrary use of data, or used as a sexed-up term for
statistics. While many university programs now offer a data science degree,
there exists no consensus on a definition or curriculum contents. Because
of the current popularity of this term, there is a lot of "advocacy
efforts" surrounding it.
Fig 1: Different fields for Data Science
1. Statistics: Statistics techniques are needed to explore data and answer
questions from the data by finding common patterns. Some examples include the
ability to create predictive models (ex: who is most likely to win the NCAA
basketball championship?), cluster groups of information (ex: what types of
shoppers are there?), and detect anomalies (ex: which bank transactions are
untypical?).
2. Computer Science: Computer Science techniques are needed for most
backend processes involved in Data Science. This includes knowing which technologies
to use to ingest, process, and manage data. This also includes understanding
how to scale systems as data grows and where to apply various algorithms for
the best computational efficiency. For example, say you work for a bank and
have identified a common pattern in the data for fraudulent transactions using
statistical techniques. Now you want to take those techniques and apply them to
all your customers (future and current) on an ongoing basis. Computer Science
techniques will help you do this.
3. Communication: Being able to get to results is one thing. If you are
unable to demonstrate to stakeholders why your results are important or what
they mean, your analysis will do little to nothing for the business.
Communicating results goes beyond basic communication skills and public
speaking. It includes building practical models (even if it's not the 'best'
model), and visualizing your results. A sub-category of Communication might
also be Business. You need to understand the bigger picture of why you are
doing what you are doing. There are many Data Scientists that can do really
cool complex things, but few that recognize the simple practical solution that
may be sitting right in front of them.
Comments
Post a Comment