6 Dimensions of Data Quality, Examples, and Measurement

Q: What is Accuracy Data Quality Dimension?

Data accuracy is the degree to which data represent real world things, event, or an agreed upon source.

Q: What is Completeness Data Quality Dimension?

Completeness data quality dimension is defined as the percentage of data populated vs. the possibility of 100% fulfillment.

Q: What is Consistency Data Quality Dimension?

Consistent data can be explained as how close your data aligns or is in uniformity with another dataset or a reference dataset.

Q: What is Uniqueness Data Quality Dimension?

The occurrence of an object or an event gets recorded multiple times in a dataset.

Q: What is Validity Data Quality Dimension?

Data validity describes the closeness of data value to predetermined values or a calculation.

Q: What is Timeliness Data Quality Dimension?

It is the time lag between actual event time vs. the event captured in a system to make it available for use.

Table of Contents

In this guide, I will explain both data quality (DQ) and the six data quality dimensions. You will learn advanced data quality concepts, data quality measurement, and examples of different data quality dimensions. This guide shares my 25+ years of experience in real-world data engineering. Let’s dive right in.

What is Data Quality?

Data quality (DQ) is defined as the data’s suitability for a user’s defined purpose. It is subjective, as the concept of quality is relative to the standards defined by the end user’s expectations.

Data Quality Expectations: It is possible that for the exact same data, different users may have totally different data quality expectations, depending on their usage. For example, the accounting department needs accurate data to the penny, whereas the marketing team does not, because the approximate sales number are enough to determine the sales trends.

Instead of just providing definitions of different data quality dimensions, this guide offers a comprehensive and very nuanced list of use cases and examples from our professional experience.

What are the Six Data Quality Dimensions?

The six data quality dimensions are Accuracy, Completeness, Consistency, Uniqueness, Timeliness, and Validity. However, this classification is not universally agreed upon.

In this guide we have added four more dimensions: Currency, Conformity, Integrity, and Precision, bringing the total of 10 DQ dimensions.

Accuracy
Completeness
Consistency
Uniqueness
Timeliness
Validity
Currency
Conformity
Integrity
Precision

As humans, we naturally like to classify things. For example, we categorize animals into various categories such as reptiles, mammals, birds, etc. Similarly, Data Quality dimensions serve as a conceptual framework designed to group data quality issues with similar patterns. Of course, you can choose to restrict, expand, or create your own taxonomy.

Accuracy
Completeness
Consistency
Uniqueness
Timeliness
Validity
Currency
Conformity
Integrity
Precision

What is Accuracy Data Quality Dimension?

Data accuracy is the degree to which data accurately represents real-world things, events, or an agreed-upon source.

For example, if a prospective employee has an inaccurate interview address, he won’t be able to attend the interview until he obtains the accurate address.

We will take two examples to explain the data accuracy dimension and how it can be measured:

Data Accuracy Measurement with the physical world.
Data Accuracy Measurement with Reference Source.

a. Data Accuracy Measurement With Physical World – Example

Data accuracy can be judged by comparing data values with physical measurement or observations.

Example: We perform this data accuracy check at the grocery store every time we make a purchase – by checking the items on the bill and then physically verifying the items in the grocery cart.

However, this manual testing is not feasible at scale. Imagine checking the accuracy of inventory data for thousands of items: someone would have to go to the warehouse and count each items manually.

Data accuracy can be judged by comparing data values with physical measurement or observations.

Example: We perform this data accuracy check at the grocery store every time we make a purchase – by checking the items on the bill and then physically verifying the items in the grocery cart.

However, this manual testing is not feasible at scale. Imagine checking the accuracy of inventory data for thousands of items: someone would have to go to the warehouse and count each items manually.

b. Data Accuracy Measurement With Reference Source – Example

Data-Accuracy-Measurement-with-Reference-Source-iCEDQ

Another way to measure accuracy is by comparing actual values to standard or reference values provided by a reliable source.

Example: The Consumer Price Index (CPI) is published by the US Bureau of Statistics. If you have CPI index values in your database, you can compare it with the reference values obtained from the US Bureau of Statistics website to measure accuracy.

What is Completeness Data Quality Dimension?

The completeness data quality dimension is defined as the percentage of data populated vs. the possibility of 100% fulfillment.

Completeness Data Quality Dimension-iCEDQ

You have probably heard multiple times that data is incomplete for making decisions.

Example: A salesperson wants to send an email to the customer, but the data entry operator did not fill in the email address. In this case, the data is not inaccurate, rather the email attribute was left empty. When data is missing, it directly impedes the operations of any organization.
We will provide four examples to explain different types of data completeness quality issues:

The record itself is missing.
A value in an attribute is missing.
A reference Value is missing.
Data truncation.

You have probably heard multiple times that data is incomplete for making decisions. Example: A salesperson wants to send an email to the customer, but the data entry operator did not fill in the email address. In this case, the data is not inaccurate, rather the email attribute was left empty. When data is missing it directly impedes the operations of any organization.
We will provide four examples to explain different types of data completeness quality issues:

The record itself is missing.
A value in an attribute is missing.
A reference Value is missing.
Data truncation.

a. Completeness Check – Missing Records Example

Completeness DQ dimension Missing Record-iCEDQ

You are an eligible voter, but at the voting booth, the record with your name is missing from the voter’s list. This is an example of a missing record under the completeness data quality dimension.

b. Completeness Check – Null Attribute Example

Even though you have all the customer records, some attributes within those records might be missing values. For example, each customer record should include a name, email address, and phone number. However, the phone number or email address might be missing in some of the records.

Completeness-DQ-Dimension-Null-Attribute-Example-iceDQ

c. Completeness Check – Missing Reference Data Example

Completeness DQ dimension Reference Data-iCEDQ

A system might not have all the reference values required for the domain.

Example: A banker is trying to update a customer account to a “Suspended” state. The banker expects three reference values:
1. Open
2. Closed
3. Suspended

However, the reference table has only two domain values: “Open” and “Closed”. The banker cannot find the “Suspended” reference value in the data. This is an example of reference data completeness. It is a specific case of the prior example, where complete records are missing.

d. Completeness Check – Data Truncations Example

Even if an attribute is populated with a data value, it’s possible that the values were truncated during the loading process. This often occurs if the ETL process variables are not correctly defined or if the target attribute is not large enough to capture the full length of the data values.

Completeness-Check-Data-Truncations-iCEDQ

What is Consistency Data Quality Dimension?

Consistent data refers to how closely your data aligns with or matches another dataset or a reference dataset.

Here are a few examples of Data Consistency DQ dimension:

Record level data consistency across source and target
Attribute level data consistency across source and target
Data consistency between subject areas
Data consistency in transactions
Data consistency across time
Data consistency in data representation

a. Record Level Data Consistency Across Source and Target

When data is loaded from one system to another, it’s important to ensure that the data reconciles with the source system. Source vs. target reconciliation often reveals inconsistencies in the records. Below is an example of an inconsistency at the record level. The record for Tom exists in the source but not in the target system.

b. Attribute Consistency Across Source And Target

Another specialized example of inconsistency between the source and target is when the records exist on both sides, but their attributes do not match. In the case below, the records for Tom and Ken exists on both sides, but the target side is missing Tom’s email and Ken’s phone number.

Data-Inconsistency-Between-Source-Target-iCEDQ

c. Data Consistency Between Two Subject Areas

In a clothing store, a customer’s order shows one gown and three pairs of dress pants. However, the shipping dataset for the same order indicates that the store must ship three gowns and one pair of dress pants. In this case, the orders and shipment quantities are inconsistent between the two datasets.

Inconsistent-Data-Between-the-Two-Datasets-iCEDQ

d. Transaction Data Consistency

A transaction is a collection of read/write operations that succeed only if all the contained operations are successful. If the transaction is not executed properly, it can create consistency issues in the data.

Transaction-Data-Consistency-Issue-iCEDQ

The opening balance for account A500 was $9000, and $1000 was withdrawn. At the end of the day, the A500 account should have a balance of $8000, but it is showing as $4000. This discrepancy occurred because the transaction was not executed properly, creating inconsistency in the data.

e. Data Consistency Over Time

Data values and volumes are expected to remain consistent over time, with only minor variations unless there is a significant business change.

Example: You receive IBM stock prices every day, and suddenly you notice that the value has increased by 10 times. A 1000% increase in the stock price in a single day is nearly impossible. This could be a simple mistake of misplacing the decimal.

Similarly, most companies acquire customers at a steady and consistent pace. If the business typically acquires about 500 new customers every day, and suddenly one day the number zooms to thousands, it’s highly likely that the data was loaded twice due to an error. If the customer count suddenly drops to zero, it’s possible that the data processor failed to run for that day.

f. Consistency In Data Representation Across Systems

Reference data is expected to be stored consistently not only within a dataset but also across multiple data stores.

Example: In a customer dataset, the reference table for sex includes “Male”, “Female”, and “Unknown”.

Consistency-in-data-representation-across-systems-iCEDQ

This reference data might be used across multiple systems. For example, the Return Material Authorization (RMA) can experience reference data consistency issues if:

Same meaning, different representation: The business definitions are the same but different data values are used to represent the same business concept.
Missing reference data values: One or more reference data values are missing.
Additional reference values: One or more reference values are added.
Finer granularity: The reference values are further subdivided into more detailed levels.
Same representation, different meaning: The data values are the same but used differently, which is difficult to catch.

What is Uniqueness Data Quality Dimension?

Uniqueness refers to the occurrence of an object or events being recorded multiple times in a dataset.

An event or entity should be recorded only once. Duplicate data should be avoided, as it can lead to double counting or misreporting.

Below are examples of duplicate data:

One entity is represented by two identities
One entity is represented multiple times with the same identity

a. Same Entity Is Represented by Different Identities

There is a general expectation that a single physical entity should be only represented once. In this example, the customer is recorded twice, initially as “Thomas” and second time by the nickname “Tom”. Anyone accessing the data may be confused about which name to use for the customer. Additionally, information about the customer might be split across the two records. As a result, the company may count two customers, even though there is only one.

If you simply check the data, you cannot determine if “Thomas” and “Tom” are the same because the names are different. To deduplicate such records, you will need secondary but universally unique information, such as email addresses.

b. The same Entity Is Represented Multiple Times With the Same Identity

In this case, the record identifier is exactly the same. This type of duplication is easy to detect because the keys in the dataset can also be compared to each other to identify the duplicates.

What is Validity Data Quality Dimension?

Data validity refers to how closely a data value matches predetermined values or a calculation.

Here are three examples of the Validity DQ dimension:

Data Validity based on rules
Data Validity is based on a range of values (Numeric, Date)
Data Validity based on Sequence of events

Here are three examples of the Validity DQ dimension:

Data Validity based on rules
Data Validity is based on a range of values (Numeric, Date)
Data Validity based on Sequence of events

a. Data Validity Based On Business Rules Or Calculation

The data captured in the datastore can come from a graphical user interface or an automated ETL process. But is the data valid according to the business rules?

Example: The business rule for Net Amount is Gross Amt – (Tax Amt + Fee Amt + Commission Amt).

The net amount can be validated by calculating the expected value based on the business rule given above.

Data-Validity-based-on-Business-Rules-or-Calculation-iCEDQ

b. Data Validity For Range Of Values

Data values can also be based on predefined ranges. For example, the value (numeric or date) in an attribute must fall within the specified range.

Numeric Range: Weight range for a USPS parcel. If the weight data doesn’t match the parcel type, then the data is considered invalid.

Parcel	Content must weigh less than 70 lbs.
Large Parcel	Contents must weigh less than 70 lbs.
Irregular Parcel	Contents must weigh less than 16 oz.

Data Validity for Range of values -iCEDQ

Date Range: A liquor shop cannot have a customer who is less than 21 years old and it is rare for a customer to be older than 100 years.

c. Invalid Sequence

Invalid Sequence in Data Validation-iCEDQ

Normally, you cannot ship without having the order in place, that is the business rule. So, if you find a shipping record that has a shipping date earlier than the order date, there is clearly a data validation issue.

Normally, you cannot ship without having the order in place. That is the business rule. So, if you find a shipping record that has a shipping date earlier than the order date, there is clearly a data validation issue.

What is Timeliness Data Quality Dimension?

Timeliness refers to the time lag between the actual event time and the time the event is captured in a system, making it available for use.

When an actual event occurs, the system needs to capture the event information, process it, and store it for further downstream usage. However, this process is never instantaneous.

The delay between actual event occurrence and the availability of data, defined by the business or the downstream process, defines the timeliness quality dimension. It is important to note that the data is still valid and simply delayed.

When an actual event occurs, the system needs to capture the event information, process it, and store it for further downstream usage. However, this process is never instantaneous.

The delay between actual event occurrence and the availability of data, defined by the business or the downstream process defines the timeliness quality dimension. It is important to note that the data is still valid and simply delayed.

Here we are considering two timeliness data quality examples

Late for business
Lag in the data capture

a. Late For Business Process

Poor Data Quality due to Timeliness issues-iCEDQ

A Pizza restaurant promises to deliver a pizza within 50 minutes. However, the order booking clerk enters the data two hours late for some reason. In this case, the data itself is correct, but for the business it is too late. The pizza is delivered late, which will result in negative reviews and potentially a loss of future business. This is a failure to meet the promise of timeliness.

Even though the data is accurate in terms of the business process and expectations, the timeliness issue makes the data of poor quality.

A Pizza restaurant promises to deliver a pizza within 50 minutes. However, the order booking clerk enters the data two hours late for some reason. In this case, the data itself is correct, but for the business, it is too late. The pizza is delivered late which will result in negative reviews and potentially a loss of future business. This is a failure to meet the promise of timeliness.

Even though the data is accurate in terms of the business process and expectations, the timeliness issue makes the data of poor quality.

b. Time Lag In Real-Time Systems

In automated trading, decisions to buy /sell stocks are processed in microseconds. The user excepts the immediate availability of data for their algorithmic trading.

If there is a lag in the availability of data, their competitors will have an advantage. Even if the data is accurate, it still suffers from poor timeliness quality.

A similar situation can occur with self-driving cars, where any lag in the arrival of data can cause accidents, as the system won’t be able to make course correction in time.

In automated trading decisions to buy /sell, stocks data is processed in microseconds. The user excepts the immediate availability of data for their algorithmic trading.

If there is a lag in the availability of data, their competitors will have an advantage. Even if the data is accurate, it still suffers from poor timeliness quality.

A similar situation can occur with self-driving cars where any lag in the arrival of data can cause accidents, as the system won’t be able to make course correction in time.

What is Currency Data Quality Dimension?

Data Currency is defined as reflection of the real-world state vs. the state captured in the dataset.

Often, the data captured reflects the current state of an entity, but the state of the object can change over time. If the state transition are not captured correctly, the data becomes outdated.

Here are two examples of the data currency DQ dimension:

Address has changed
Coupon has expired

a. Changed Address

Address Data Currency Quality Dimension-iCEDQ

A mailing list contains customer’s addresses, but if customers have moved to a new address, the data loses its currency.

b. Expired Coupon

Coupon Expired in Data Currency Quality Dimension-iCEDQ

If you are trying to sell a wedding gown to your customer and send a discount coupon as an incentive for purchase, the coupon is sent based on data showing the customer is unmarried and in the market for a wedding dress. However, the customer is already married.

Since the data was not updated in time, it still reflects the customer’s old state, and the data currency is compromised.

If you are trying to sell a wedding gown to your customer and send a discount coupon as an incentive for purchase, the coupon is sent based on the data showing the customer is unmarried and in the market for a wedding dress. However, the customer is already married.

Since the data was not updated in time. It still reflects the customer’s old state, and the data currency is compromised.

What is the difference between Data Timeliness and Currency?

Timeliness refers to the late arrival of data or a delay, while the information remains accurate. However, if the data arrives late and reflects a state that has changed or expired, it becomes irrelevant, losing its value or currency.

What is Conformity Data Quality Dimension?

Conformity means that the data values of the same attributes must be represented in a consistent format and adhere to the correct data types.

Humans have a unique ability to discern subtle differences and recognize commonality, whereas computers cannot. Even if the data values are correct, if the data does not adhere to the same standard format or data type, it results in conformity data quality issues.

Below are two examples of the data conformity DQ dimension:

Format Conformity
Data Type Conformity

What is Conformity Data Quality Dimension?

Conformity means that the data values of the same attributes must be represented in a consistent format and adhere to the correct data types.

Humans have a unique ability to discern subtle differences and recognize commonality. whereas computers cannot. Even if the data values are correct if the data does not adhere to the same standard format or data type, it results in conformity data quality issues.

Below are two examples of the data conformity DQ dimension:

Format Conformity
Data Type Conformity

a. Format Conformity

The order date below is expected to follow ‘MM/DD/YYYY’ format. While the data may appear correct to humans, any changes in the data format can cause chaos for computers.

Don records has date in ‘YYYY/M/DD’ format.
Joe’s record has date in correct ‘MM/DD/YYYY’ format.
Tim’s records is in the ‘YYYY/M/DD HH:MM:SS’ format.

Data Format conformity issues can typically be identified using regular expressions.

b. Data Type Conformity

The data type is also another case of conformity quality issue. The order amount in the table below is expected to be in numeric format, but Joe’s record is written in alpha numeric format. This is a data type conformity issue.

What is Integrity Data Quality Dimension?

Data Integrity is the degree to which defined relational constraints are implemented between two datasets.

The data integrity issues can arise within a single system or across multiple systems. The key characteristic of the integrity data quality dimension is the relationship between two datasets.

Here are two examples for data integrity dimension

Referential Integrity
Relationship Cardinality

a. Referential Integrity Or Foreign Keys:

Referential-Integrity-in-Data-Integrity-Dimension-iCEDQ

The reference for a parent record must always exist in a child dataset. For example, an order might have a customer number as a foreign key, which means that the customer number must also exist in the customer table. The master dataset could reside within the same database or in a different system.

b. Cardinality Integrity

Cardinality-Data-Integrity-Dimension-iCEDQ

Another example of Integrity data quality dimension is Cardinality, 1:1, 1: Many, etc. Cardinality defines the ratio between two datasets. For example, an employee can only have one badge (1:1). If the cardinality of the relationship is known in advance, it can be checked under the data integrity DQ dimension.

What is Precision Data Quality Dimension?

Precision refers to the degree to which the data has been rounded or aggregated.

In industrial measurements, precision and accuracy are different concepts. Accuracy refers to the deviation from the target data value, while precision pertains to the closeness of the values to each other. In data quality measurement, Precision is a derived concept used to identify errors related to rounding or aggregation of data.

Below are some of the examples of precision errors

Numerical Precisions
Time precision
Granularity precision

a. Precision Errors Due To Rounding Of Number

Depending on the degree of precision provided by the GPS coordinates, the location can differ by kilometers. The table below shows values ranging from two-digit precision to five-digit precision. The location error can range from 1 meter to 1 kilometer.

GPS Decimal Places	Decimal Degrees	N/S or E/W at equator
2	0.01	1.1132 km
3	0.001	111.32 m
4	0.0001	11.132 m
5	0.00001	1.1132 m

Rounding-Errors-Data-Precision-Quality-Dimension-iceDQ

Imagine the consequences of a military bombing occurring 1 km away from the intended location.

In stock trading, the SEC under rule 612 mandateds a minimum precision for stocks: those worth over $1.00 must have a precision of $0.01, while stocks under $1.00 require a precision of $0.0001.

Stock	Date	End of day Price
IBM	05/05/2020	$122.58
JPM	05/05/2020	$92.00
MTNB (Penny Stock)	05/05/2020	$0.7064

b. Time Precision

The store accounting is done at the day level and may not require the exact second of purchase. However, for credit card fraud detection, time precision must be accurate to the second.

time precision in data precision quality dimension icedq

c. Granularity Precision

Every time data is aggregated it loses details or precision. Granular data cannot be derived from summarized data.

At first glance, granularity may not seem like an obvious aspect of precision. However, for certain operations aggregated or summarized data is not useful.

For example, if you want to pay each salesperson’s commission based on their individual sale, you will need the specific sales number, not just the aggregated total.

Commission Calculator
Product	$ Sales by each Employee	Commission	$ Sale by Emp X Commission % =Commission Amount
John Dove	—	3%	?
Evan Gardner	—	3%	?
Accessories	—	3%	?

But the data below does not have precision at the salesperson level. It is summarized for all the employee for each moth of the quarter. Since the head of sales does not have individual sales data for each salesperson, they cannot pay commissions.

Total Sales
Product	Total Sales
Jan 2020	$4,050,000
Feb 2020	$3,500,000
Mar 2020	$500,000

Data Quality Measurement

This is simply the ratio of total records available to the defective records identified by one of the data quality dimensions.

Data Quality Dimension	Measurement
Accuracy	# Of records with inaccurate data / total # of records
Completeness	# Of records with incomplete data / total # of records
Timeliness	# Of records with Timeliness data / total # of records
Currency	# Of records with Currency data / total # of records
Consistency	# Of records with inconsistent data / total # of records
Uniqueness	# Of non-unique records / total # of records
Validity	# Of records with invalid data / total # of records
Conformity	# Of records with unconfirmed data / total # of records
Integrity	# Of records with integrity issues data / total # of records
Precision	# Of records with imprecise data / total # of records

The above can be easily represented by a gauge representation on a dashboard. It can also be easily aggregated or drilled down into different dimensions.

Conclusion

I hope you liked the data quality examples and understand that there is much more than the 6 DQ dimensions. Do not fret too much about these classifications, choose the one you like or define your own.

Do you agree with our thought process? Leave a comment below!

Sandesh Gawande - CTO iceDQ

Sandesh Gawande

CEO and Founder at iceDQ.
First to introduce automated data testing. Advocate for data reliability engineering.

Explore the
#1 Data Testing Tool

Data Testing

Data Monitoring & Observabilty

6 Dimensions of Data Quality, Examples, and Measurement

What is Data Quality?

What are the Six Data Quality Dimensions?

What is Accuracy Data Quality Dimension?

a. Data Accuracy Measurement With Physical World – Example

b. Data Accuracy Measurement With Reference Source – Example

What is Completeness Data Quality Dimension?

a. Completeness Check – Missing Records Example

b. Completeness Check – Null Attribute Example

c. Completeness Check – Missing Reference Data Example

d. Completeness Check – Data Truncations Example

What is Consistency Data Quality Dimension?

a. Record Level Data Consistency Across Source and Target

b. Attribute Consistency Across Source And Target

c. Data Consistency Between Two Subject Areas

d. Transaction Data Consistency

e. Data Consistency Over Time

f. Consistency In Data Representation Across Systems

What is Uniqueness Data Quality Dimension?

a. Same Entity Is Represented by Different Identities

b. The same Entity Is Represented Multiple Times With the Same Identity

What is Validity Data Quality Dimension?

a. Data Validity Based On Business Rules Or Calculation

b. Data Validity For Range Of Values

c. Invalid Sequence

What is Timeliness Data Quality Dimension?

a. Late For Business Process

b. Time Lag In Real-Time Systems

What is Currency Data Quality Dimension?

a. Changed Address

b. Expired Coupon

What is the difference between Data Timeliness and Currency?

What is Conformity Data Quality Dimension?

What is Conformity Data Quality Dimension?

a. Format Conformity

b. Data Type Conformity

What is Integrity Data Quality Dimension?

a. Referential Integrity Or Foreign Keys:

b. Cardinality Integrity

What is Precision Data Quality Dimension?

a. Precision Errors Due To Rounding Of Number

b. Time Precision

c. Granularity Precision

Data Quality Measurement

Conclusion

Leave a Reply Cancel reply