Business Intelligence, Data Warehousing and Data Virtualization
Updated · Aug 29, 2011
One of the most fundamental decisions that business intelligence implementers in IT make, at the beginning of every new BI initiative, is whether the new data involved should be copied into a central data mart or data warehouse or accessed where it is.
The advent of software as a service (SaaS) and the public cloud has added a new dimension to this decision: Now the BI implementer must also decide whether to move the data physically into the cloud and “de-link” the cloud and internal data stores. In fact, this decision is no longer the purview solely of the CTO – the security concerns when you move data to a public service provider mean that corporate needs to have input into the decision. However, fundamentally, the decision is the same: Move the one copy of the data; keep the one copy where it is; or copy the data, move one copy, and synchronize between copies.
Almost two decades of experience with data warehousing has shown that these decisions have serious long-term consequences. On the one hand, for customer buying-pattern insights that demand a petabyte of related data, failure to copy to a central location can mean serious performance degradation – as in, it takes hours instead of minutes to detect a major cross-geography shift in buying behavior. On the other hand, attempting to stuff Big Data like the graphics and video involved in social networking into a data warehouse means an exceptionally long lag time before the data yields insights. The firm’s existing data warehouse wasn’t designed for this data; it is not fine-tuned for good performance on this data; and despite the best efforts of vendors, periodic movement or replication of such massive amounts of data to the data warehouse has a large impact on the data warehouse’s ability to perform its usual tasks. Above all, these consequences are long-term – applications are written that depend for their performance on the existing location of the data, and redoing all of these applications if you want to move to a different database engine or a different location is beyond the powers of most IT shops.
The reason that it is time to revisit the “move or stay” decision now is that business intelligence users, and therefore the BI IT that supports them, are faced with an unprecedented opportunity and an unprecedented problem. The opportunity, which now as never before is available not only to large but also medium-sized firms, is to gather mammoth amounts of new customer data on the Web and use rapid-fire BI on that data to drive faster “customer-of-one” and agile product development and sales. The problem is that much of this new data is large (by some estimates, even unstructured data inside the organization is approaching 90% of organizational data by size), is generated outside the organization, and changes very rapidly due to rapid and dangerous changes in the company’s environment: fads, sudden brand-impacting environmental concerns, and/or competitors who are playing the same BI game as you.
How do you get performance without moving the data into the organization’s data warehouse? How can the data warehouse support both new and old types of data and deliver performance on both? Most importantly, how do you keep from making the same mistakes in “move or stay” decisions that make present-day data warehousing so expensive and sub-optimal for your new needs?
Business Intelligence Political Wars
Today’s business intelligence users, wowed by case studies of great analytics insights leading to major cost-cutting and add-on sales, are likely to view these “move or stay” decisions as the property of IT, to be decided after the CEO or CMO decides how best to use the latest dashboard, Facebook data miner, or performance management tool. In turn, BI IT has tended to view these decisions as more short-term and ad-hoc, meant to meet immediate urgent needs. Alas, past experience has shown that not only is such an approach to business intelligence implementation unwise, it is also futile.
An old (slightly “altared”) story is of the new pastor, some of whose congregation ask him to change the location of the altar. He asks an older pastor what the tradition is, and the pastor, instead of answering, says to try it that way. The result is a massive argument among the congregation. He goes back, and the older pastor says to put it back. Instead of dying down, the argument gets even hotter. He goes back again, and says, what’s the tradition? My congregation is fighting like mad about this. Ah, says the older pastor, that’s the tradition.
In the same way, when data warehousing was first introduced, CEOs deferred the “move or stay” decision to IT, which attempted to shoehorn all data into the central data warehouse. Lines of business, of course, resisted the idea that corporate IT should be gatekeepers over their data, making them wait for weeks for reports on their local sales that used to take a day. The result was that CEOs were being frequently appealed to by IT or by lines of business over the matter – that became the tradition. Eventually, the advent of data marts and the fact on the ground that data warehouses could not handle new data from acquisitions ended the arguments, at the cost of BI that was poorly equipped to handle new data from outside the organization and executives and lines of business that under-used corporate BI.
To avoid these political wars, the BI user needs to set out a long-term approach to “move or stay” that should inform implementation decisions at both the corporate and IT level. Instead of “I have a hammer, everything looks like a nail”, this approach stresses maximum flexibility and agility of any BI solution – which, in turn, translates to asking BI IT to deliver, not the highest performance, but reasonable performance and maximum architectural flexibility to accommodate new types of data.
Wayne Kernochan of Infostructure Associates has been an IT industry analyst focused on infrastructure software for more than 20 years.
Wayne Kernochan has been an IT industry analyst and auther for over 15 years. He has been focusing on the most important information-related technologies as well as ways to measure their effectiveness over that period. He also has extensive research on the SMB, Big Data, BI, databases, development tools and data virtualization solutions. Wayne is a regular speaker at webinars and is a writer for many publications.