Skip to main content

Data quality

Data quality is a perception or an assessment of data's fitness to serve its purpose in a given context. The quality of data is determined by factors such as accuracy, completeness, reliability, relevance and how up to date it is. As data has become more intricately linked with the operations of organizations, the emphasis on data quality has gained greater attention.


An oft-cited estimate originating from IBM suggests the yearly cost of data quality issues in the U.S. during 2016 alone was about $3.1 trillion. Lack of trust by business managers in data quality is commonly cited among chief impediments to decision-making.

The problem of poor data quality was particularly common in the early days of corporate computing, when most data was entered manually. Even as more automation took hold, data quality issues rose in prominence. For a number of years, the image of deficient data quality was represented in stories of meetings at which department heads sorted through differing spreadsheet numbers that ostensibly described the same activity.

Determining data quality

Aspects, or dimensions, important to data quality include accuracy, conformity and consistency. As a first step toward improving data quality, organizations typically perform data asset inventories in which the relative value, uniqueness and validity of data can undergo baseline studies. Established baseline ratings for known good data sets are then used for comparison against data in the organization going forward.

Methodologies for such data quality projects include the Data Quality Assessment Framework (DQAF), which was created by the International Monetary Fund (IMF) to provide a common method for assessing data quality. The DQAF provides guidelines for measuring data dimensions that include timeliness, in which actual times of data delivery are compared to anticipated data delivery schedules.

Managing data quality

Software tools specialized for data quality management match records, delete duplicates, establish remediation policies and identify personally identifiable data. Management consoles for data quality support creation of rules for data handling to maintain data integrity, discovering data relationships and automated data transforms that may be part of quality control efforts.

Collaborative views and workflow enablement tools have become more common, giving data stewards, who are charged with maintaining data quality, views into corporate data repositories. These tools and related processes are often closely linked with master data management (MDM) systems that have become part of many data governance efforts.

Data quality management tools include IBM InfoSphere Information Server for Data Quality, Informatica Data Quality, Oracle Enterprise Data Quality, Pitney Bowes Spectrum Technology Platform, SAP Data Quality Management and SAS DataFlux.


Comments

Popular posts from this blog

Black swan

A  black swan event  is an incident that occurs randomly and unexpectedly and has wide-spread ramifications. The event is usually followed with reflection and a flawed rationalization that it was inevitable. The phrase illustrates the frailty of inductive reasoning and the danger of making sweeping generalizations from limited observations. The term came from the idea that if a man saw a thousand swans and they were all white, he might logically conclude that all swans are white. The flaw in his logic is that even when the premises are true, the conclusion can still be false. In other words, just because the man has never seen a black swan, it does not mean they do not exist. As Dutch explorers discovered in 1697, black swans are simply outliers -- rare birds, unknown to Europeans until Willem de Vlamingh and his crew visited Australia. Statistician Nassim Nicholas Taleb uses the phrase black swan as a metaphor for how humans deal with unpredictable events in his 2007...

A Graphics Processing Unit (GPU)

A graphics processing unit (GPU) is a computer chip that performs rapid mathematical calculations, primarily for the purpose of rendering images. A GPU may be found integrated with a central processing unit (CPU) on the same circuit, on a graphics card or in the motherboard of a personal computer or server. In the early days of computing, the CPU performed these calculations. As more graphics-intensive applications such as AutoCAD were developed; however, their demands put strain on the CPU and degraded performance. GPUs came about as a way to offload those tasks from CPUs, freeing up their processing power. NVIDIA, AMD, Intel and ARM are some of the major players in the GPU market. GPU vs. CPU A graphics processing unit is able to render images more quickly than a central processing unit because of its parallel processing architecture, which allows it to perform multiple calculations at the same time. A single CPU does not have this capability, although multi...

6G (sixth-generation wireless)

6G (sixth-generation wireless) is the successor to 5G cellular technology. 6G networks will be able to use higher frequencies than 5G networks and provide substantially higher capacity and much lower latency. One of the goals of the 6G Internet will be to support one micro-second latency communications, representing 1,000 times faster -- or 1/1000th the latency -- than one millisecond throughput. The 6G technology market is expected to facilitate large improvements in the areas of imaging, presence technology and location awareness. Working in conjunction with AI, the computational infrastructure of 6G will be able to autonomously determine the best location for computing to occur; this includes decisions about data storage, processing and sharing.  Advantages of 6G over 5G 6G is expected to support 1 terabyte per second (Tbps) speeds. This level of capacity and latency will be unprecedented and wi...