Skip to main content

Data Science

 

Data science is the field of applying advanced analytics techniques and scientific principles to extract valuable information from data for business decision-making, strategic planning, and other uses. It's increasingly critical to businesses: The insights that data science generates help organizations increase operational efficiency, identify new business opportunities, and improve marketing and sales programs, among other benefits. Ultimately, they can lead to competitive advantages over business rivals.

Data science incorporates various disciplines -- for example, data engineering, data preparation, data mining, predictive analytics, machine learning and data visualization, as well as statistics, mathematics, and software programming. It's primarily done by skilled data scientists, although lower-level data analysts may also be involved. In addition, many organizations now rely partly on citizen data scientists, a group that can include business intelligence (BI) professionals, business analysts, data-savvy business users, data engineers and other workers who don't have a formal data science background.

This comprehensive guide to data science further explains what it is, why it's important to organizations, how it works, the business benefits it provides and the challenges it poses. You'll also find an overview of data science applications, tools, and techniques, plus information on what data scientists do and the skills they need. Throughout the guide, there are hyperlinks to related TechTarget articles that delve more deeply into the topics covered here and offer insight and expert advice on data science initiatives.

Why is data science important?

Data science plays an important role in virtually all aspects of business operations and strategies. For example, it provides information about customers that helps companies create stronger marketing campaigns and targeted advertising to increase product sales. It aids in managing financial risks, detecting fraudulent transactions, and preventing equipment breakdowns in manufacturing plants and other industrial settings. It helps block cyber-attacks and other security threats in IT systems.

From an operational standpoint, data science initiatives can optimize management of supply chains, product inventories, distribution networks and customer service. On a more fundamental level, they point the way to increased efficiency and reduced costs. Data science also enables companies to create business plans and strategies that are based on informed analysis of customer behaviours, market trends and competition. Without it, businesses may miss opportunities and make flawed decisions.

Data science is also vital in areas beyond regular business operations. In healthcare, its uses include diagnosis of medical conditions, image analysis, treatment planning and medical research. Academic institutions use data science to monitor student performance and improve their marketing to prospective students. Sports teams analyze player performance and plan game strategies via data science. Government agencies and public policy organizations are also big users.

Data science process and lifecycle

Data science projects involve a series of data collection and analysis steps. In an article that describes the data science process, in six primary steps:

  1. Identify a business-related hypothesis to test.
  2. Gather data and prepare it for analysis.
  3. Experiment with different analytical models.
  4. Pick the best model and run it against the data.
  5. Present the results to business executives.
  6. Deploy the model for ongoing use with fresh data. 

Benefits of data science

one of data science's biggest benefits is to empower and facilitate better decision-making. Organizations that invest in it can factor quantifiable, data-based evidence into their business decisions. Ideally, such data-driven decisions will lead to stronger business performance, cost savings and smoother business processes and workflows.

The specific business benefits of data science vary depending on the company and industry. In customer-facing organizations, for example, data science helps identify and refine target audiences. Marketing and sales departments can mine customer data to improve conversion rates and create personalized marketing campaigns and promotional offers that produce higher sales.

In other cases, the benefits include reduced fraud, more effective risk management, more profitable financial trading, increased manufacturing uptime, better supply chain performance, stronger cybersecurity protections and improved patient outcomes. Data science also enables real-time analysis of data as it's generated.

Challenges in data science

Data science is inherently challenging because of the advanced nature of the analytics it involves. The vast amounts of data typically being analyzed add to the complexity and increase the time it takes to complete projects. In addition, data scientists frequently work with pools of big data that may contain a variety of structured, unstructured, and semi structured data, further complicating the analytics process.

One of the biggest challenges is eliminating bias in data sets and analytics applications. That includes issues with the underlying data itself and ones that data scientists unconsciously build into algorithms and predictive models. Such biases can skew analytics results if they aren't identified and addressed, creating flawed findings that lead to misguided business decisions. Finding the right data to analyze is another challenge and choosing the right tools, managing deployments of analytical models, quantifying business value and maintaining models as significant hurdles.

Data science team

Many organizations have created a separate team, or multiple teams, to handle data science activities. There's more to an effective team than data scientists themselves. It may also include the following positions:

  • Data engineer. Responsibilities include setting up data pipelines and aiding in data preparation and model deployment, working closely with data scientists.
  • Data analyst. This is a lower-level position for analytics professionals who don't have the experience level or advanced skills that data scientists do.
  • Machine learning engineer. This programming-oriented job involves developing the machine learning models needed for data science applications.
  • Data visualization developer. This person works with data scientists to create visualizations and dashboards used to present analytics results to business users.
  • Data translator. Also called an analytics translator, it's an emerging role that serves as a liaison to business units and helps plan projects and communicate results.
  • Data architect. A data architect designs and oversees the implementation of the underlying systems used to store and manage data for analytics uses.

The team commonly is run by a director of data science, data science manager or lead data scientist, who may report to either the chief data officer, chief analytics officer, or vice president of analytics; chief data scientist is another management position that has emerged in some organizations. Some data science teams are centralized at the enterprise level, while others are decentralized in individual business units or have a hybrid structure that combines those two approaches.

Business intelligence vs. data science

Like data science, basic business intelligence and reporting aims to help guide operational decision-making and strategic planning. But BI primarily focuses on descriptive analytics: What happened or is happening now that an organization should respond to or address? BI analysts and self-service BI users mostly work with structured transaction data that's extracted from operational systems, cleansed, and transformed to make it consistent, and loaded into a data warehouse or data mart for analysis. Monitoring business performance, processes and trends is a common BI use case.

Data science involves analytics applications that are more advanced. In addition to descriptive analytics, it encompasses predictive analytics that forecasts future behavior and events, as well as prescriptive analytics, which seeks to determine the best course of action to take on the issue being analyzed.

Unstructured or semi structured types of data -- for example, log files, sensor data and text -- are common in data science applications, along with structured data. Also, data scientists often want to access raw data before it has been cleaned up and consolidated so they can analyze the full data set or filter and prepare it for specific analytics uses. As a result, the raw data may be stored in a data lake based on Hadoop, a cloud object storage service, a NoSQL database, or another big data platform.

Data science technologies, techniques, and methods

Data science relies heavily on machine learning algorithms. Machine learning is a form of advanced analytics in which algorithms learn about data sets and then look for patterns, anomalies, or insights in them. It uses a combination of supervised, unsupervised, semi supervised and reinforcement learning methods, with algorithms getting different levels of training and oversight from data scientists.

There's also deep learning, a more advanced offshoot of machine learning that primarily uses artificial neural networks to analyze large sets of unlabelled data.

Predictive models are another core data science technology. Data scientists create them by running machine learning, data mining or statistical algorithms against data sets to predict business scenarios and likely outcomes or behavior. In predictive modeling and other advanced analytics applications, data sampling is often done to analyze a representative subset of data, a data mining technique that's designed to make the analytics process more manageable and less time-consuming.

Common statistical and analytical techniques that are used in data science projects include the following:

  • classification, which separates the elements in a data set into different categories.
  • regression, which plots the optimal values of related data variables in a line or plane; and
  • clustering, which groups together data points with an affinity or shared attributes.

How industries rely on data science

Before they became technology vendors themselves, Google and Amazon were early users of data science and big data analytics for internal applications, along with other internet and e-commerce companies like Facebook, Yahoo, and eBay. Now, data science is widespread in organizations of all kinds. Here are some examples of how it's used in different industries:

  • Entertainment. Data science enables streaming services to track and analyze what users watch, which helps determine the new TV shows and films they produce. Data-driven algorithms are also used to create personalized recommendations based on a user's viewing history.
  • Financial services. Banks and credit card companies mine and analyze data to detect fraudulent transactions, manage financial risks on loans and credit lines, and evaluate customer portfolios to identify upselling opportunities.
  • Healthcare. Hospitals and other healthcare providers use machine learning models and additional data science components to automate X-ray analysis and aid doctors in diagnosing illnesses and planning treatments based on previous patient outcomes.
  • Manufacturing. Data science uses at manufacturers include optimization of supply chain management and distribution, plus predictive maintenance to detect potential equipment failures in plants before they occur.
  • Retail. Retailers analyze customer behavior and buying patterns to drive personalized product recommendations and targeted advertising, marketing, and promotions. Data science also helps them manage product inventories and their supply chains to keep items in stock.
  • Transportation. Delivery companies, freight carriers and logistics services providers use data science to optimize delivery routes and schedules, as well as the best modes of transport for shipments.
  • Travel. Data science aids airlines with flight planning to optimize routes, crew scheduling and passenger loads. Algorithms also drive variable pricing for flights and hotel rooms.

Other data science uses, in areas such as cybersecurity, customer service and business process management, are common across different industries. An example of the latter is assisting in employee recruitment and talent acquisition: Analytics can identify common characteristics of top performers; measure how effective job postings are and provide other information to help in the hiring process.

Comments

Popular posts from this blog

Black swan

A  black swan event  is an incident that occurs randomly and unexpectedly and has wide-spread ramifications. The event is usually followed with reflection and a flawed rationalization that it was inevitable. The phrase illustrates the frailty of inductive reasoning and the danger of making sweeping generalizations from limited observations. The term came from the idea that if a man saw a thousand swans and they were all white, he might logically conclude that all swans are white. The flaw in his logic is that even when the premises are true, the conclusion can still be false. In other words, just because the man has never seen a black swan, it does not mean they do not exist. As Dutch explorers discovered in 1697, black swans are simply outliers -- rare birds, unknown to Europeans until Willem de Vlamingh and his crew visited Australia. Statistician Nassim Nicholas Taleb uses the phrase black swan as a metaphor for how humans deal with unpredictable events in his 2007...

A Graphics Processing Unit (GPU)

A graphics processing unit (GPU) is a computer chip that performs rapid mathematical calculations, primarily for the purpose of rendering images. A GPU may be found integrated with a central processing unit (CPU) on the same circuit, on a graphics card or in the motherboard of a personal computer or server. In the early days of computing, the CPU performed these calculations. As more graphics-intensive applications such as AutoCAD were developed; however, their demands put strain on the CPU and degraded performance. GPUs came about as a way to offload those tasks from CPUs, freeing up their processing power. NVIDIA, AMD, Intel and ARM are some of the major players in the GPU market. GPU vs. CPU A graphics processing unit is able to render images more quickly than a central processing unit because of its parallel processing architecture, which allows it to perform multiple calculations at the same time. A single CPU does not have this capability, although multi...

6G (sixth-generation wireless)

6G (sixth-generation wireless) is the successor to 5G cellular technology. 6G networks will be able to use higher frequencies than 5G networks and provide substantially higher capacity and much lower latency. One of the goals of the 6G Internet will be to support one micro-second latency communications, representing 1,000 times faster -- or 1/1000th the latency -- than one millisecond throughput. The 6G technology market is expected to facilitate large improvements in the areas of imaging, presence technology and location awareness. Working in conjunction with AI, the computational infrastructure of 6G will be able to autonomously determine the best location for computing to occur; this includes decisions about data storage, processing and sharing.  Advantages of 6G over 5G 6G is expected to support 1 terabyte per second (Tbps) speeds. This level of capacity and latency will be unprecedented and wi...