Skip to main content

Data quality

Data quality is a perception or an assessment of data's fitness to serve its purpose in a given context. The quality of data is determined by factors such as accuracy, completeness, reliability, relevance and how up to date it is. As data has become more intricately linked with the operations of organizations, the emphasis on data quality has gained greater attention.


An oft-cited estimate originating from IBM suggests the yearly cost of data quality issues in the U.S. during 2016 alone was about $3.1 trillion. Lack of trust by business managers in data quality is commonly cited among chief impediments to decision-making.

The problem of poor data quality was particularly common in the early days of corporate computing, when most data was entered manually. Even as more automation took hold, data quality issues rose in prominence. For a number of years, the image of deficient data quality was represented in stories of meetings at which department heads sorted through differing spreadsheet numbers that ostensibly described the same activity.

Determining data quality

Aspects, or dimensions, important to data quality include accuracy, conformity and consistency. As a first step toward improving data quality, organizations typically perform data asset inventories in which the relative value, uniqueness and validity of data can undergo baseline studies. Established baseline ratings for known good data sets are then used for comparison against data in the organization going forward.

Methodologies for such data quality projects include the Data Quality Assessment Framework (DQAF), which was created by the International Monetary Fund (IMF) to provide a common method for assessing data quality. The DQAF provides guidelines for measuring data dimensions that include timeliness, in which actual times of data delivery are compared to anticipated data delivery schedules.

Managing data quality

Software tools specialized for data quality management match records, delete duplicates, establish remediation policies and identify personally identifiable data. Management consoles for data quality support creation of rules for data handling to maintain data integrity, discovering data relationships and automated data transforms that may be part of quality control efforts.

Collaborative views and workflow enablement tools have become more common, giving data stewards, who are charged with maintaining data quality, views into corporate data repositories. These tools and related processes are often closely linked with master data management (MDM) systems that have become part of many data governance efforts.

Data quality management tools include IBM InfoSphere Information Server for Data Quality, Informatica Data Quality, Oracle Enterprise Data Quality, Pitney Bowes Spectrum Technology Platform, SAP Data Quality Management and SAS DataFlux.


Comments

Popular posts from this blog

Understanding the Evolution: AI, ML, Deep Learning, and Gen AI

In the ever-evolving landscape of artificial intelligence (AI) and machine learning (ML), one of the most intriguing advancements is the emergence of General AI (Gen AI). To grasp its significance, it's essential to first distinguish between these interconnected but distinct technologies. AI, ML, and Deep Learning: The Building Blocks Artificial Intelligence refers to the simulation of human intelligence in machines that are programmed to think like humans and mimic their actions. Machine Learning, a subset of AI, empowers machines to learn from data and improve over time without explicit programming. Deep Learning, a specialized subset of ML, involves neural networks with many layers (hence "deep"), capable of learning intricate patterns from vast amounts of data. Enter General AI (Gen AI): Unraveling the Next Frontier Unlike traditional AI systems that excel in specific tasks (narrow AI), General AI aims to replicate human cognitive abilities across various domains. I...

Normalization of Database

Database Normalisation is a technique of organizing the data in the database. Normalization is a systematic approach of decomposing tables to eliminate data redundancy and undesirable characteristics like Insertion, Update and Deletion Anamolies. It is a multi-step process that puts data into tabular form by removing duplicated data from the relation tables. Normalization is used for mainly two purpose, Eliminating reduntant(useless) data. Ensuring data dependencies make sense i.e data is logically stored. Problem Without Normalization Without Normalization, it becomes difficult to handle and update the database, without facing data loss. Insertion, Updation and Deletion Anamolies are very frequent if Database is not Normalized. To understand these anomalies let us take an example of  Student  table. S_id S_Name S_Address Subject_opted 401 Adam Noida Bio 402 Alex Panipat Maths 403 Stuart Jammu Maths 404 Adam Noida Physics Updation Anamoly :  To upda...

How to deal with a toxic working environment

Handling a toxic working environment can be challenging, but there are steps you can take to address the situation and improve your experience at work: Recognize the Signs : Identify the specific behaviors or situations that contribute to the toxicity in your workplace. This could include bullying, harassment, micromanagement, negativity, or lack of support from management. Maintain Boundaries : Set boundaries to protect your mental and emotional well-being. This may involve limiting interactions with toxic individuals, avoiding gossip or negative conversations, and prioritizing self-care outside of work. Seek Support : Reach out to trusted colleagues, friends, or family members for support and advice. Sharing your experiences with others can help you feel less isolated and provide perspective on the situation. Document Incidents : Keep a record of any incidents or behaviors that contribute to the toxic environment, including dates, times, and specific details. This documentation may b...