Data quality is a perception or an assessment of
data's fitness to serve its purpose in a given context. The quality of data is
determined by factors such as accuracy, completeness, reliability, relevance
and how up to date it is. As data has become more intricately linked with the
operations of organizations, the emphasis on data quality has gained greater
attention.
An oft-cited estimate originating
from IBM suggests the yearly cost of data quality issues in the U.S. during
2016 alone was about $3.1 trillion. Lack of trust by business managers in data
quality is commonly cited among chief impediments to decision-making.
The problem of poor data quality
was particularly common in the early days of corporate computing, when most
data was entered manually. Even as more automation took hold, data quality
issues rose in prominence. For a number of years, the image of deficient data
quality was represented in stories of meetings at which department heads sorted
through differing spreadsheet numbers that ostensibly described the same
activity.
Determining data quality
Aspects, or dimensions, important
to data quality include accuracy, conformity and consistency. As a first step
toward improving data quality, organizations typically perform data asset
inventories in which the relative value, uniqueness and validity of data can
undergo baseline studies. Established baseline ratings for known good data sets
are then used for comparison against data in the organization going forward.
Methodologies for such data
quality projects include the Data Quality Assessment Framework (DQAF), which
was created by the International Monetary Fund (IMF) to provide a common method
for assessing data quality. The DQAF provides guidelines for measuring data
dimensions that include timeliness, in which actual times of data delivery are
compared to anticipated data delivery schedules.
Managing data quality
Software tools specialized for
data quality management match records, delete duplicates, establish remediation
policies and identify personally identifiable data. Management consoles for
data quality support creation of rules for data handling to maintain data
integrity, discovering data relationships and automated data transforms that
may be part of quality control efforts.
Collaborative views and workflow
enablement tools have become more common, giving data stewards, who are charged
with maintaining data quality, views into corporate data repositories. These
tools and related processes are often closely linked with master data
management (MDM) systems that have become part of many data governance efforts.
Data quality management tools
include IBM InfoSphere Information Server for Data Quality, Informatica Data
Quality, Oracle Enterprise Data Quality, Pitney Bowes Spectrum Technology
Platform, SAP Data Quality Management and SAS DataFlux.
Comments
Post a Comment