Data science is the field of applying advanced
analytics techniques and scientific principles to extract valuable information
from data for business decision-making, strategic planning, and other uses.
It's increasingly critical to businesses: The insights that data science
generates help organizations increase operational efficiency, identify new
business opportunities, and improve marketing and sales programs, among other
benefits. Ultimately, they can lead to competitive advantages over business
rivals.
Data science incorporates various disciplines
-- for example, data engineering, data preparation, data mining,
predictive analytics, machine learning and data visualization, as
well as statistics, mathematics, and software programming. It's primarily done
by skilled data scientists, although lower-level data analysts may also be
involved. In addition, many organizations now rely partly on citizen data
scientists, a group that can include business intelligence (BI) professionals,
business analysts, data-savvy business users, data engineers and other workers
who don't have a formal data science background.
This comprehensive guide to data science
further explains what it is, why it's important to organizations, how it works,
the business benefits it provides and the challenges it poses. You'll also find
an overview of data science applications, tools, and techniques, plus
information on what data scientists do and the skills they need. Throughout the
guide, there are hyperlinks to related TechTarget articles that delve more
deeply into the topics covered here and offer insight and expert advice on data
science initiatives.
Why is data science important?
Data science plays an important role in
virtually all aspects of business operations and strategies. For example, it
provides information about customers that helps companies create stronger
marketing campaigns and targeted advertising to increase product sales. It aids
in managing financial risks, detecting fraudulent transactions, and
preventing equipment breakdowns in manufacturing plants and other industrial
settings. It helps block cyber-attacks and other security threats in IT
systems.
From an operational standpoint, data science
initiatives can optimize management of supply chains, product inventories,
distribution networks and customer service. On a more fundamental level, they
point the way to increased efficiency and reduced costs. Data science also
enables companies to create business plans and strategies that are based on informed
analysis of customer behaviours, market trends and competition. Without it,
businesses may miss opportunities and make flawed decisions.
Data science is also vital in areas beyond
regular business operations. In healthcare, its uses include diagnosis of
medical conditions, image analysis, treatment planning and medical research.
Academic institutions use data science to monitor student performance and
improve their marketing to prospective students. Sports teams analyze
player performance and plan game strategies via data science. Government
agencies and public policy organizations are also big users.
Data science process and lifecycle
Data science projects involve a series of data collection and analysis steps. In an article that describes the data science process, in six primary steps:
- Identify a business-related hypothesis to test.
- Gather data and prepare it for analysis.
- Experiment with different analytical models.
- Pick the best model and run it against the data.
- Present the results to business executives.
- Deploy the model for ongoing use with fresh data.
Benefits of data science
one of data science's biggest benefits is to
empower and facilitate better decision-making. Organizations that invest in it
can factor quantifiable, data-based evidence into their business decisions.
Ideally, such data-driven decisions will lead to stronger business performance,
cost savings and smoother business processes and workflows.
The specific business benefits of data science
vary depending on the company and industry. In customer-facing organizations,
for example, data science helps identify and refine target audiences. Marketing
and sales departments can mine customer data to improve conversion rates and
create personalized marketing campaigns and promotional offers that produce
higher sales.
In other cases, the benefits include reduced
fraud, more effective risk management, more profitable financial trading,
increased manufacturing uptime, better supply chain performance, stronger
cybersecurity protections and improved patient outcomes. Data science also enables
real-time analysis of data as it's generated.
Challenges in data science
Data science is inherently challenging because
of the advanced nature of the analytics it involves. The vast amounts of data
typically being analyzed add to the complexity and increase the time it takes
to complete projects. In addition, data scientists frequently work with pools
of big data that may contain a variety of structured, unstructured,
and semi structured data, further complicating the analytics process.
One of the biggest challenges is eliminating
bias in data sets and analytics applications. That includes issues with the
underlying data itself and ones that data scientists unconsciously build into
algorithms and predictive models. Such biases can skew analytics results if
they aren't identified and addressed, creating flawed findings that lead to
misguided business decisions. Finding the right data to analyze is another
challenge and choosing the right tools, managing deployments of analytical
models, quantifying business value and maintaining models as significant
hurdles.
Data science team
Many organizations have created a separate
team, or multiple teams, to handle data science activities. There's more to an
effective team than data scientists themselves. It may also include the
following positions:
- Data engineer. Responsibilities include setting up data pipelines and
aiding in data preparation and model deployment, working closely with data
scientists.
- Data analyst. This is a lower-level position for analytics
professionals who don't have the experience level or advanced skills that data
scientists do.
- Machine learning engineer. This programming-oriented job involves
developing the machine learning models needed for data science applications.
- Data visualization developer. This person works with data scientists
to create visualizations and dashboards used to present analytics results to business
users.
- Data translator. Also called an analytics translator, it's
an emerging role that serves as a liaison to business units and helps
plan projects and communicate results.
- Data architect. A data architect designs and oversees
the implementation of the underlying systems used to store and manage data for
analytics uses.
The team commonly is run by a director of data
science, data science manager or lead data scientist, who may report to either
the chief data officer, chief analytics officer, or vice president of
analytics; chief data scientist is another management position that has emerged
in some organizations. Some data science teams are centralized at the
enterprise level, while others are decentralized in individual business units
or have a hybrid structure that combines those two approaches.
Business intelligence vs. data science
Like data science, basic business
intelligence and reporting aims to help guide operational decision-making
and strategic planning. But BI primarily focuses on descriptive analytics: What
happened or is happening now that an organization should respond to or address?
BI analysts and self-service BI users mostly work with structured transaction
data that's extracted from operational systems, cleansed, and transformed to
make it consistent, and loaded into a data warehouse or data mart for
analysis. Monitoring business performance, processes and trends is a common BI
use case.
Data science involves analytics applications
that are more advanced. In addition to descriptive analytics, it encompasses
predictive analytics that forecasts future behavior and events, as well as
prescriptive analytics, which seeks to determine the best course of action to
take on the issue being analyzed.
Unstructured or semi structured types of data
-- for example, log files, sensor data and text -- are common in data science
applications, along with structured data. Also, data scientists often want to
access raw data before it has been cleaned up and consolidated so they can
analyze the full data set or filter and prepare it for specific analytics uses.
As a result, the raw data may be stored in a data lake based on
Hadoop, a cloud object storage service, a NoSQL database, or another big data
platform.
Data science technologies, techniques, and
methods
Data science relies heavily on machine
learning algorithms. Machine learning is a form of advanced analytics in which
algorithms learn about data sets and then look for patterns, anomalies, or
insights in them. It uses a combination of supervised, unsupervised, semi
supervised and reinforcement learning methods, with algorithms getting
different levels of training and oversight from data scientists.
There's also deep learning, a more
advanced offshoot of machine learning that primarily uses artificial neural
networks to analyze large sets of unlabelled data.
Predictive models are another core data science
technology. Data scientists create them by running machine learning, data
mining or statistical algorithms against data sets to predict business
scenarios and likely outcomes or behavior. In predictive modeling and other
advanced analytics applications, data sampling is often done to
analyze a representative subset of data, a data mining technique that's
designed to make the analytics process more manageable and less time-consuming.
Common statistical and analytical
techniques that are used in data science projects include the following:
- classification, which separates the elements in a data set into different categories.
- regression, which plots the optimal values of related data variables in a line or plane; and
- clustering, which groups together data points with an affinity or shared attributes.
How industries rely on data science
Before they became technology vendors
themselves, Google and Amazon were early users of data science and big
data analytics for internal applications, along with other internet and
e-commerce companies like Facebook, Yahoo, and eBay. Now, data science is
widespread in organizations of all kinds. Here are some examples of how it's
used in different industries:
- Entertainment. Data science enables streaming services to track and
analyze what users watch, which helps determine the new TV shows and films they
produce. Data-driven algorithms are also used to create personalized
recommendations based on a user's viewing history.
- Financial services. Banks and credit card companies mine and
analyze data to detect fraudulent transactions, manage financial risks on loans
and credit lines, and evaluate customer portfolios to identify upselling
opportunities.
- Healthcare. Hospitals and other healthcare providers use machine
learning models and additional data science components to automate X-ray
analysis and aid doctors in diagnosing illnesses and planning treatments based
on previous patient outcomes.
- Manufacturing. Data science uses at manufacturers include
optimization of supply chain management and distribution, plus predictive
maintenance to detect potential equipment failures in plants before they occur.
- Retail. Retailers
analyze customer behavior and buying patterns to drive personalized product
recommendations and targeted advertising, marketing, and promotions. Data
science also helps them manage product inventories and their supply chains to
keep items in stock.
- Transportation. Delivery companies, freight carriers and
logistics services providers use data science to optimize delivery routes and
schedules, as well as the best modes of transport for shipments.
- Travel. Data
science aids airlines with flight planning to optimize routes, crew scheduling
and passenger loads. Algorithms also drive variable pricing for flights and
hotel rooms.
Other data science uses, in areas such as
cybersecurity, customer service and business process management, are common
across different industries. An example of the latter is assisting in employee
recruitment and talent acquisition: Analytics can identify common
characteristics of top performers; measure how effective job postings are and
provide other information to help in the hiring process.
Comments
Post a Comment