Reports and dashboards are extensively used to make business decisions. These decisions later form the basis for the company’s growth and success. If BI reports are wrong, then decisions taken based on these reports will also be wrong. Inaccurate reports affect the organization’s credibility and also exposes them to compliance and legal issues. It can also lead to hefty fines that organizations can not afford to ignore BI Testing.
How do I Know if the reports are correct? |
Report / Dashboard |
Step by Step Guide to BI Testing
We must ensure the reports are delivered correctly. But BI report is just a final output of a very long data pipeline. So, if the report is wrong, it is not necessary that the defect is in the report; the root cause of the defect might be somewhere along with the breadth of the data pipeline.
Where is the defect in the data pipeline? |
BI Data Pipeline |
The data pipeline can be divided into two distinct stages:
- Part A: Data Processing & Storage
-
- Source data
- ETL
- Database/Data warehouse
- Part B: BI Testing
-
- Data Layer
- Reports
- Dashboards
Part A: Data Processing and Storage
- Source Data: The data for business intelligence system comes from diverse sources and applications such as CRM, ERP, OMS The data in the source system can have data issues because of the way data was entered or system issues. For example. Sales number could be populated as 1000 instead of 100. BI teams do not have any control over the source data. However, the defect in data can propagate downstream to reports. Therefore validating the integrity of Source Data Testing becomes out most important.
- ETL: The data are extracted from the source system, transformed and loaded to the data warehouse. The transformation from OLTP to OLAP is a crucial process involving business rules with many related transformations like aggregating, summarizing, deriving, deduplicating, restructuring and integrating. This transformation have a high chance of errors, mistakes, missteps, and miscalculations. For example, a file might be accidentally loaded twice. Customer data is integrated incorrectly because the names are not an exact match. Such issues can be only captured by thoroughly testing the ETL process. Check more on ETL Testing
- Ddatabase/Data Warehouse: Regardless of the testing on the source and the ETL the data in the Data warehouse could still be wrong. Here’s an example. The ETL process loaded Orders from the source. Later on, it ran other ETL processes to load the shipment. The order comes from Othe MS system and the Shipment from the Way Bill system. It possible that some orders were missed in the data warehouse. So it is possible to have shipment data without the orders. It is also possible that the orders data was accidentally deleted from the data warehouse. Look more into Data Warehouse Testing.
Part B: BI Testing
- Data Layer: The data layer or metadata layer provides a high-level object for easy use and access to business users. The metadata or the business objects mainly consists of the dimensions, measures, lookups, calculated metrics, joins, hierarchy, etc. These data classes are derived from the databases and are kind of soft data transformations. As in any development, mistakes can happen; Hence the need to test the data layer.
- BI Report: A BI report consists of SQL queries, the prompts/filters. There could be issues in any of these items because of the technical specifications or development mistakes. Regardless, building a report is a development activity and must be tested to ensure quality.
- Dashboards: A dashboard is a combination of multiple reports with different visualization and data. They might be or might not be interlinked. Most of the time these dashboards are the final artifacts that are used by the business, hence testing them also is of paramount importance.
The scope of this article is limited to BI Testing/Report Testing. Part A is about data processing. Usually taking care of Part A will ensure 90% of the issues are taken care of. iceDQ always had capabilities to automate that part of the data pipeline.
Part B is about actual Business Intelligence Testing. In this article, we focus on Part B. The focus will be on generalized methodology rather than BI tool specific. In Future articles, we will be more specific about BI and reporting tools like SAP BO, IBM Cognos or Tableau.
How to Automate BI Testing?
The principle for BI testing is very simple. What you do vs what you expect. The Delta defines the magnitude of the failure.
The report can be tested if access is available to that data set in a report or prompts are available in a programmatic fashion. Assuming that is true, let’s look at some of the techniques. The actual results are in reports, here we assume that the specific result is available to us programmatically. Now the question arises on expected data. if the expected data is not known or is only in the business users mind, we cannot automate results. But if he/she (developers) writes down the expected result or points to a location where we can find it then we can automate the testing with iceDQ.
Let’s discuss the testing of three core component of BI testing.
- Data Layer/ Metadata
- BI Reports
- BI Dashboards
1. Data Layer Testing Technique
All Business Intelligence System offers the business users with a business view or metadata layer which is referred to by different terminology by every BI tool. Like in SAP BI it’s called Business Objects Universes or in IBM Cognos it’s called Framework Manager etc.
This layer hides the complexity of the physical database structure by providing objects and classes that map source data structure in the database to the business terms used by the analysts. It helps them run queries against a database to create reports and perform data analysis without bothering about the underlying data structures in the database.
The role of this metadata layer is to provide ease of use and access for business users. For example, data layers for a financial analyst contain objects such as profit margin or return for the investments.
It provides calculated fields and creates a virtual layer between two databases for example sales data dream sales databases and customer data from CRM database. It let’s user to set up filters on data and allows to create lookup cycles in referenced data example m- male, f = female, etc.
Example:
-
- The data in the data layer often needs to be compared with the equivalent database. Here sales revenue column is a measure as it has numerical data, whereas Dimensions are always qualitative, for example, year or quarter. Say now all these dimensions and measures are to be reconciled with the source data.
- The data in the data layer often needs to be compared with the equivalent database. Here sales revenue column is a measure as it has numerical data, whereas Dimensions are always qualitative, for example, year or quarter. Say now all these dimensions and measures are to be reconciled with the source data.
-
- To compare data from one data layer(production)with another in a non-production environment. Now the business user needs assurance that both the metadata have exactly the same metrics. For example, all the dimensions related to sales revenue should be the same at both locations. This type of data layer testing can be automated by iceDQ.
- To compare data from one data layer(production)with another in a non-production environment. Now the business user needs assurance that both the metadata have exactly the same metrics. For example, all the dimensions related to sales revenue should be the same at both locations. This type of data layer testing can be automated by iceDQ.
- There is a need to be sure that a data layer is showing accurate metrics corresponding to what was manually created and maintained by a user in some file. The data layer is based on the data located in the flat files. for example, the user stores sales revenue for the first two quarters of the year in an excel file that can be compared with the equivalent dimensions in the data layer.
2. Report Testing/ BI Report Testing Techniques
Reports are the basic entity of BI tools wherein data is presented in different graphical ways for ease of understanding and to empower the users to analyze and take efficient business decisions. Reports are built on the Data Layer. Data is visualized as tables, graphs, charts, etc. These reports are further export to .pdf, .csv or excel or can be mailed. Reports can also be scheduled to run.
There are Ad hoc reports that are created from scratch or can be build using canned reports. There can be report bursting, paginated reports for ease of printing, pixel-perfect reports wherein the elements are customized to the pixel level in both size and location, multi-section as well as master-detail(s) reports. Data manipulation, such as drilling down to different semantic layers of data, slicing and dicing of information through sorting and filtering (prompts) is encouraged for data analysts and non-technical end users. Monthly sales, daily sales can be designed using Pixel perfect reports.
- Tabular reports: A tabular representation of related data. This representation is best for quantitative comparisons. For example, this table displays quarterly sales revenue of the e-fashion store.
- Master-Detail BI Report: A Master detail/Drill down report allows nesting of two or more related reports, Say the chart visualizing shipping details and list or crosstab showing its detailed view. This report types are best used for complex multidimensional data queries. It encourages users to view aggregated data and explore deeper levels of the data for finer analysis.
- Multi-Section BI Report: Multi-section encourages visualization of related tables and charts filtered to the unique value of the same column which gets repeated in each section. The example depicts the table in a multi-section applied to the year column and accordingly dividing it into Year wise sections.
- Summary/Group by BI Report: Data can be summarized in a report using a different type of aggregate. Calculated Column can be used for more complex expressions.
- BI Report Filter /Prompts: Prompt is creating the filter criteria for a report. It specifies the criteria for hierarchical attributes, individual attribute or metrics that must be satisfied by the data. The picture depicts the prompt is defined on Year column and criteria is the year 2014, 2015 and 2016.
Examples:
-
- Create a rule to compare a report data with data warehouse, if necessary, provide a filter through APIs. iceDQ provides an easy interface to reconcile and validate the query of the report and certify data quality. For example, the yearly sales revenue stored in the data warehouse needs to be reconciled with the BI report generated for it.
- Create a rule to compare a report data with data warehouse, if necessary, provide a filter through APIs. iceDQ provides an easy interface to reconcile and validate the query of the report and certify data quality. For example, the yearly sales revenue stored in the data warehouse needs to be reconciled with the BI report generated for it.
- There is a need to be sure that a BI report is showing accurate evaluation that was manually created and maintained by a user in some file. The report is based on the data located in the flat files. for example, the user stores sales revenue for the first two quarters of the year in an excel file that can be compared with the report.
- Say you have a BI report present in two different environments and the business user needs to be sure that both the reports must have exactly the same metrics. For example, sales revenue for the first two quarters in the year 2014 & 2015 should be the same at both locations. This type of BI report testing can be automated by the iceDQ engine.
3. BI Dashboard Testing Techniques
A BI dashboard is essentially a collection of multiple reports that provide at-a-glance views of KPIs (key performance indicators) and progress report or reports relevant to a business process.
A dashboard is very similar to a report because essentially a dashboard is made of multiple reports. Dashboard testing can be done in very similar to report testing.
|
A dashboard is very similar to a report because essentially a dashboard is made of multiple reports. Dashboard testing can be done in very similar to report testing.
|
Conclusion
There are many BI reporting tools in the market. Business relies on them to make critical decisions. iceDQ helps business by automating BI testing thus ensuring the quality of the reports and in turn quality of decisions. All in all, BI testing will not only empower your business intelligence system but also your business.
Also, don’t forget the BI testing is good but the report can be still bad in production because the incoming data is not monitored. All your effort will come to naught if the business users see bad data in production. Read more on production data monitoring here.
Another important aspect is DevOps for Data (DataOps). How can one implement the DevOps in an organization without automated testing? BI is the last hurdle because it is both visual and data. iceDQ, on the other hand, enables you to implement the final CI/CD by enabling report automation
The next section of this article will be to take specific vendor tools and figure out how to test each of them.
Also See: ETL testing, Data Monitoring, BI testing, Tableau testing, Cognos testing