What is data science? Transforming data into value

Data science is a method for transforming business data into assets that help organizations improve revenue, reduce costs, seize business opportunities, improve customer experience, and more.

data science certification scientist with beakers

Credit: Thinkstock

What is data science?

Data science is a method for gleaning insights from structured and unstructured data using approaches ranging from statistical analysis to machine learning. For most organizations, it is employed to transform data into value in the form of improved revenue, reduced costs, business agility, improved customer experience, the development of new products, and the like. Data science gives the data collected by an organization a purpose.

Data science vs. data analytics

While closely related, data analytics is a component of data science, used to understand what an organization’s data looks like. Data science takes the output of analytics to solve problems. Data scientists say that investigating something with data is simply analysis. Data science takes analysis another step to explain and solve problems. The difference between data analytics and data science is also one of timescale. Data analytics describes the current state of reality, whereas data science uses that data to predict and/or understand the future.

The benefits of data science

The business value of data science depends on organizational needs. Data science could help an organization build tools to predict hardware failures, enabling the organization to perform maintenance and prevent unplanned downtime. It could help predict what to put on supermarket shelves, or how popular a product will be based on its attributes.

For further insight into the business value of data science, see “The unexpected benefits of data analytics” and “Demystifying the dark science of data analytics.”

Data science jobs

While the number of data science degree programs are increasing at a rapid clip, they aren’t necessarily what organizations look for when seeking data scientists. Candidates with a statistics background are popular, especially if they can demonstrate they know whether they are looking at real results; have domain knowledge to put results in context; and communication skills that allow them to convey results to business users.

Many organizations look for candidates with PhDs, especially in physics, math, computer science, economics, or even social science. A PhD proves a candidate is capable of doing deep research on a topic and disseminating information to others.

Some of the best data scientists or leaders in data science groups have non-traditional backgrounds, even ones with very little formal computer training. In many cases, the key ability is being able to look at something from a non-traditional perspective and understand it.

For further information about data scientist skills, see “What is a data scientist? A key data analytics role and a lucrative career,” and “Essential skills and traits of elite data scientists.”

Data science salaries

Here are some of the most popular job titles related to data science and the average salary for each position, according to data from PayScale:

Analytics manager: $71K-$131K
Associate data scientist: $61K-$101K
Business intelligence analyst: $52K-$97K
Data analyst: $45K-$87K
Data architect: $79K-$159K
Data engineer: $66K-$132K
Data scientist: $60K-$159K
Data scientist, IT: $$60K-$159K
Lead data scientist: $98K-$178K
Research analyst: $43K-$82K
Research scientist: $52K-$123K
Senior data scientist: $96K-$162K
Statistician: $55K-$117K

Data science degrees

According to Fortune, these are the top graduate degree programs in data science:

University of Illinois at Urbana-Champaign
University of California—Berkeley
Texas Tech University
Bay Path University
Worcester Polytechnic Institute
Loyola University Maryland
University of Missouri—Columbia
New Jersey Institute of Technology
CUNY School of Professional Studies
Syracuse University

Data science training and bootcamps

Given the current shortage of data science talent, many organizations are building out programs to develop internal data science talent.

Bootcamps are another fast-growing avenue for training workers to take on data science roles. For more details on data science bootcamps, see “15 best data science bootcamps for boosting your career.”

Data science certifications

Organizations need data scientists and analysts with expertise in techniques for analyzing data. They also need big data architects to translate requirements into systems, data engineers to build and maintain data pipelines, developers who know their way around Hadoop clusters and other technologies, and system administrators and managers to tie everything together. Certifications are one way for candidates to show they have the right skillset.

Some of the top big data and data analytics certifications include:

Certified Analytics Professional (CAP)
Cloudera Data Platform Generalist Certification
Data Science Council of America (DASCA) Senior Data Scientist (SDS)
Data Science Council of America (DASCA) Principal Data Scientist (PDS)
IBM Data Science Professional Certificate
Microsoft Certified: Azure Data Scientist Associate
Open Certified Data Scientist (Open CDS)
SAS Certified Data Scientist

For more information about big data and data analytics certifications, see “The top 11 big data and data analytics certifications,” and “12 data science certifications that will pay off.”

Data science teams

Data science is generally a team discipline. Data scientists are the core of most data science teams, but moving from data to analysis to production value requires a range of skills and roles. For example, data analysts should be on board to investigate the data before presenting it to the team and to maintain data models. Data engineers are necessary to build data pipelines to enrich data sets and make the data available to the rest of the company.

For further insight into building data science teams, see “How to assemble a highly effective analytics team” and “The secrets of highly successful data analytics teams.”

Data science goals and deliverables

The goal of data science is to construct the means for extracting business-focused insights from data. This requires an understanding of how value and information flows in a business, and the ability to use that understanding to identify business opportunities. While that may involve one-off projects, more typically data science teams seek to identify key data assets that can be turned into data pipelines that feed maintainable tools and solutions. Examples include credit card fraud monitoring solutions used by banks, or tools used to optimize the placement of wind turbines in wind farms.

Incrementally, presentations that communicate what the team is up to are also important deliverables.

Data science processes and methodologies

Production engineering teams work on sprint cycles, with projected timelines. That’s often difficult for data science teams to do because a lot of time upfront can be spent just determining whether a project is feasible. Data must be collected and cleaned. Then the team must determine whether it can answer the question efficiently.

Data science ideally should follow the scientific method, though that is not always the case, or even feasible. Real science takes time. You spend a little bit of time confirming your hypothesis and then a lot of time trying to disprove yourself. In business, time-to-answer is important. As a result, data science can often mean going with the “good enough” answer rather than the best answer. The danger, though, is results can fall victim to confirmation bias or overfitting.

Data science tools

Data science teams make use of a wide range of tools, including SQL, Python, R, Java, and a cornucopia of open source projects such as Hive, oozie, and TensorFlow. These tools are used for a variety of data-related tasks, ranging from extracting and cleaning data, to subjecting data to algorithmic analysis via statistical methods or machine learning. Some common tools include:

SAS” This proprietary statistical tool is used for data mining, statistical analysis, business intelligence, clinical trial analysis, and time-series analysis.
Tableau: Now owned by Salesforce, Tableau is a data visualization tool.
TensorFlow: Developed by Google and licensed under Apache License 2.0, TensorFlow is a software library for machine learning used for training and inference of deep neural networks.
DataRobot: This automated machine learning platform is used for building, deploying, and maintaining AI.
BigML: BigML is machine learning platform focused on simplifying the building and sharing of datasets and models.
Knime: Knime is an open source data analytics, reporting, and integration platform.
Apache Spark: This unified analytics engine is designed for processing large-scale data, with support for data cleansing, transformation, model building, and evaluation.
RapidMiner: This data science platform is geared to support teams, with support for data prep, machine learning, and predictive model deployment.
Matplotlib: This open source plotting library for Python offers tools for creating static, animated, and interactive visualizations.
Excel: Microsoft’s spreadsheet software is perhaps the most extensively used BI tool around. It’s also handy for data scientists, working with smaller datasets.
js: This JavaScript library is used to make interactive visualizations in web browsers.
ggplot2: This advanced data visualization package for R eanbles data scientists to create visualizations from analyzed data.
Jupyter: This open source tool based on Python is used for writing live code, visualizations, and presentations.

Africa

Americas

Asia

Europe

Oceania

Topics

About

Policies

Our Network

More

What is data science? Transforming data into value

Data science is a method for transforming business data into assets that help organizations improve revenue, reduce costs, seize business opportunities, improve customer experience, and more.

What is data science?

Data science vs. data analytics

The benefits of data science

Data science jobs

Data science salaries

Data science degrees

Data science training and bootcamps

Data science certifications

Data science teams

Data science goals and deliverables

Data science processes and methodologies

Data science tools

Show me more

Oracle adds AI capabilities to its Fusion Cloud CX

What LinkedIn learned leveraging LLMs for its billion users

IBM doubles down on hybrid cloud with $6.4B HashiCorp acquisition

CIO Leadership Live Middle East with Ahmed Wattar, Group Information Technology Director at Alfa Medical Group

CIO Leadership Live Middle East with Dr. Mohammad Alshehri, CISO and Cybersecurity Consultant

CIO Leadership Live Middle East with Wissam Al Adany, Chief Information Officer, ADES Holding

3 Leadership Tips: Renate Cuneen, Vice President, Global Corporate Technology, Canada Life

GenAI and Trust: How Companies Are Thinking About the Trustworthiness of AI and GenAI Tools

CIO Leadership Live Middle East with Ahmed Wattar, Group Information Technology Director at Alfa Medical Group

What is data science? Transforming data into value

Data science is a method for transforming business data into assets that help organizations improve revenue, reduce costs, seize business opportunities, improve customer experience, and more.

What is data science?

Data science vs. data analytics

The benefits of data science

Data science jobs

Data science salaries

Data science degrees

Data science training and bootcamps

Data science certifications

Data science teams

Data science goals and deliverables

Data science processes and methodologies

Data science tools

Related content

TransUnion transforms its business with IT

The 10 highest-paying industries for IT talent

M&A action is gaining momentum, are your cloud security leaders prepared?

CIOs eager to scale AI despite difficulty demonstrating ROI, survey finds

From our editors straight to your inbox

Show me more

Oracle adds AI capabilities to its Fusion Cloud CX

What LinkedIn learned leveraging LLMs for its billion users

IBM doubles down on hybrid cloud with $6.4B HashiCorp acquisition

CIO Leadership Live Middle East with Ahmed Wattar, Group Information Technology Director at Alfa Medical Group

CIO Leadership Live Middle East with Dr. Mohammad Alshehri, CISO and Cybersecurity Consultant

CIO Leadership Live Middle East with Wissam Al Adany, Chief Information Officer, ADES Holding

3 Leadership Tips: Renate Cuneen, Vice President, Global Corporate Technology, Canada Life

GenAI and Trust: How Companies Are Thinking About the Trustworthiness of AI and GenAI Tools

CIO Leadership Live Middle East with Ahmed Wattar, Group Information Technology Director at Alfa Medical Group