Getting Data Quality Right

Sean Michael

Updated · Oct 21, 2015

By Jon Green, BackOffice Associates

Data quality initiatives are important to enterprises because poor data quality can lead to a multitude of costly and reputation-affecting business problems. Critical marketing campaigns can become ineffective due to poor address and contact information. Customer retention may take a hit due to duplication of customer records and lack of visibility of buying history. And significant vendor cost savings is likely being left on the table if vendor information duplicates are prevalent across geographic regions.

When commencing a data quality initiative it is important to consider four simple principles. We refer to this as “getting data quality right,” and it can be summed up in this statement: Provide the right data, to the right people, in the right format at the right time.

The Right Data

When technology solutions are used to manage data quality within an organization it sounds obvious that the right data must be presented for assessment and resolution. However, this is often less straightforward than it seems.

When establishing a data quality standard, the standard itself may be very generic (e.g., a field or set of fields has been deemed required and therefore must be populated). This rule can be assigned (or bound) to one or more fields within one or more data sets in your enterprise application architecture.

The ability to assign specific individuals who are responsible for the data sets is important to the process of improving data quality, but building this into the rules and bound data sets leads to a fragmented approach to rule creation and assessment. This, in turn, results in unnecessarily large IT overhead and maintenance headaches. By allowing for user filters to be assigned before distributing failed data, the individuals only receive the failed data sets that they are responsible for.

For example, one employee works in the U.S. and is responsible for U.S. customers, while another employee works in Europe and looks after the same data set but for the European market. A single rule can be defined to deliver failures to either individual based on their user level filters. This maintains a simple rule and single bound data set but allows for the realities of the business responsibilities to be modeled.

The Right People

When implementing a data quality solution for enterprise applications, it could be perceived as easier for the IT team to develop a set of central scorecards and ask people to review and drill down to their specific area of responsibility, seeing if there is data assigned to them that requires cleansing or resolution.

As this strategy puts the onus on the wider business team members to review the reports and scorecards on a frequent basis, it can be difficult to enforce within large organizations. Changing this approach into a distribution style solution allows individuals to be sent their own data, filtered and formatted in the way they want it, and when the rules fail, allowing for the erroneous data to be investigated and remediated efficiently.

The Right Format

Ensuring that data stewards or business users receive their data in the format of their choosing vastly improves their ability to review and resolve any issues efficiently and leads to greater user adoption. Delivering all data in a single huge spreadsheet or report and including all fields associated with the specific business object could potentially swamp an end user, meaning they can no longer efficiently process and resolve the issues present. Conversely, just providing the failed data fields can result in too little information.

Allowing the business user to choose which fields they need to resolve the issue, the order of the fields in the report and also ensuring that the field headings and descriptions are delivered in their language, rather than just delivering the technical field names, all assist in the adoption and ease of use of the solution. It is also important to present data to users in the tools they are familiar with, thus reducing the education requirements of the deployed solutions. If the user is only familiar with email and spreadsheets, then delivering the data in their preferred technology will improve the overall process of resolution.

The Right Time

As mentioned above, in many data quality initiatives the IT team will build out a rules execution process and deliver the results into a scorecard or reporting application that the end users can then interact with. This places responsibility on the wider business team to review these scorecards on a regular basis.

There is a different approach to this problem called Passive Data Quality. By creating a distribution framework of the specific rule(s), bindings, users and applicable filters, a user can be notified when data fails and have the data pushed out to them, on their desired medium, for notification and resolution via a workflow process.

This process allows users to avoid having to check and drill through scorecards to find out if there is something failing that they are responsible for, instead just checking their email for a notification. The “passive” aspect of the solution is that the user is notified when there is something specific for them to address, rather than having to actively check the solution to determine the current status. This workflow capability will improve the adoption of any solution by allowing users to work on normal day-to-day activities until they are notified of a problem.

The ability to tie all of the above into one solution and to provide the capabilities to automate the remediation of the failed data provides a complete “closed loop” solution for data quality.

Applying these four simple principles can improve the adoption and usability of data quality solutions within an organization and the impact they have on the overall efficiency of the business processes that the data ultimately affects.

Jon Green is director of Product Management for BackOffice Associates, a worldwide leader in information governance and data modernization solutions, focusing on helping Fortune 1000 customers manage one of their most critical assets – data.

  • Data Management
  • Research
  • Sean Michael
    Sean Michael

    Sean Michael is a writer who focuses on innovation and how science and technology intersect with industry, technology Wordpress, VMware Salesforce, And Application tech. TechCrunch Europas shortlisted her for the best tech journalist award. She enjoys finding stories that open people's eyes. She graduated from the University of California.

    More Posts By Sean Michael