A ‘Data First’ Approach, in 3 Phases

Ann All

Updated · Sep 24, 2014

by Pedro Cardoso, BackOffice Associates

At last year’s Gartner Business Intelligence and Analytics Summit, the failure rate of BI and analytics projects was estimated at 70 percent-plus worldwide. While disconcerting, sadly this is nothing new and represents a failure rate that has been relatively consistent for the last 20 years.

While tools continue to increase in sophistication and storage and computing capacities race to stay ahead of the seemingly infallible Moore’s Law, more than two out of three information-related projects fail to generate the expected business ROI and outcomes. So what’s going on here?

There are many factors in any project, but the concept of a “data first” strategy mitigates many of the issues at play, including failing to properly account for and incorporate the activities required to properly assess, improve and sustain data quality in order to deliver expected business benefits.

In 5 Reasons a ‘Data First’ Strategy Works, I wrote about this topic in the context of system landscape consolidation (i.e. ERP consolidation) and how a Data First approach to enable a consolidated enterprise reporting view could help attain ROI. Waiting to address data as part of a time-sensitive system migration introduces more risk and cost and results in a sub-optimal outcome, compared to treating the two streams of activity separately in a staged interdependent fashion.

Let’s take a closer look at how a Data First approach is different and discuss the individual phases.

What Does a ‘Data First’ Approach Look Like?

There are structured, well-defined frameworks for assessing data quality, many of which have their origin in academia and controlled research studies that have been used in the field and have evolved to become proven methodologies. We’ll review some of those frameworks in a future article, but for simplicity there are three broad phases of Data Governance Quality Management.

Those with a Six Sigma background, in particular Transactional or Lean Six Sigma, might recognize the similarity to the DMAIC (Define, Measure, Analyze, Improve and Control) execution framework. In a Data First approach, the goal is to reduce defects that impact the quality of the desired end state.

The first step is measuring the current state (processes and systems) to identify the root causes that are responsible for poor data quality, followed by building process capabilities that address those problem areas, thereby improving data quality and process efficiency. The final step involves ensuring that the improvements in systems, processes and data quality are sustained and controlled in a way that maintains the integrity of information throughout its entire lifecycle.

3 Phases of ‘Data First’ Approach

Assess Phase: As mentioned above, this phase establishes a solid and detailed understanding of the supporting business processes, technical systems, subsystems and desired outcomes of the project through the lenses of both the project’s customers and the end-user customers. During this phase, it is important to harvest as much data as possible from the relevant source, target and satellite systems regarding the solution domain in question. Data analysis and profiling is performed to gain an understanding of the data, both from an information architecture perspective and down to the process, sub-process, business entity and individual attribute level.

This quantitative approach allows for validation of stated assumptions as well as the discovery of underlying process or data issues perhaps not yet known or fully understood. This type of insight is invaluable and influential in finalizing project requirements and accurately informing design and blueprinting activity, as well as de-risking the entire solution delivery process.

You can now answer important questions such as: Will the data adequately support the requirements and proposed solution? Do fundamental business process issues need to be addressed? Do data harmonization, standardization or cleansing efforts need to be resourced and scheduled into the project plan?

Improve Phase: Yes, this is where enterprises build, test and ultimately roll out their project and solution to the customer. From a data quality perspective, data issues are addressed and data quality inevitably improves; however the approach must be specific to the project and situation. While the development of data governance policies, procedures and quality targets is a common deliverable on project plans, building true process governance capabilities as part of solution delivery is rarely considered.

A passive approach in which users are beat over the head with quality dashboards and policy infractions post-delivery is more common, but not a recipe for success. In this stage, efforts are focused on fixing data as part of the current project and building and embedding an integrated business process data governance capability. Techniques such as the 5-WHY tool can help identify the root cause of the issue, which frequently leads back to business process issues that must be addressed to avoid ongoing data quality issues post go-live.

This is where inefficiencies are identified beyond the data, and a decision around how best to handle the issue is made. In some cases it might mean that the solution design or blueprint has to be changed. In others, a business process might be adjusted to solve the data or information gap issue, or new processes created to support the deficiencies. This integrated approach is key to not just stopping the bleeding, but curing the underlying ailment. 

Control Phase: Once data quality and process gaps are closed, it is important to make sure they remain tight. A combination of proactive (preventing bad data from occurring) and compensatory (flagging bad data after it happens) controls will be implemented. 

This goes beyond just addressing the current issues mitigated in the improve phase.  There will be individual data domains or data elements that may meet required data quality standards and not require remediation, but the nature of the information quality lifecycle means that today’s accurate, complete and timely data may not continue into tomorrow.

This phase needs to address all data that is relevant to the integrity of the solution being deployed. At a more macro level this necessitates appropriate data governance structures, with people and processes put in place alongside the technical controls required to maintain steady state environment. This stage is where embedded process controls and automated dashboards transform the project event and activity stream into a process change with an ongoing operational cadence. It becomes embedded into processes and the organizational DNA of how work gets done every day.

Incorporating a Data First approach and associated methodologies allows enterprises to mitigate project risks, significantly improve the organization’s ability to deliver projects on time and on budget, and ultimately deliver the expected immediate and sustained value to its customers. This is accomplished by accounting for the full lifecycle of data in solution delivery, by giving the user access to the right information, at the right time and in the right way to drive expected ROI and benefits.

In my next article we’ll look at some real projects that failed, and how a Data First approach would have not only saved them but also delivered concrete operational benefits.

Pedro Cardoso is a senior Information and Data Governance consultant at BackOffice Associates, a provider of information governance and data migration solutions, focusing on helping Fortune 1000 customers manage one of their most critical assets – data.

Ann All
Ann All

Public relations, digital marketing, journalism, copywriting. I have done it all so I am able to communicate any information in a professional manner. Recent work includes creating compelling digital content, and applying SEO strategies to increase website performance. I am a skilled copy editor who can manage budgets and people.

More Posts By Ann All