Business Intelligence, Data Warehousing and Data Virtualization: Page 2
The Business Intelligence Technology of the Decade
To make such an approach effective, the BI-using enterprise needs to understand just what IT can and cannot do to make the architecture flexible by "move or stay" decisions. Here, the good news is that software technology developed over the last decade has given IT a lot more ability to make their architectures flexible – if they will use it.
In my view, the greatest business intelligence technology advance of the last decade is not cloud, analytics, BI for the masses, or "near-real-time" BI, but a technology originally called Enterprise Information Integration (EII) and now called data virtualization. This technology makes many disparate data stores appear as one to the user, developer, and administrator. In order to do so, it also demands that the data virtualization user develop a global metadata directory that catalogs not only data copies, but also variant storage formats. Thus, in master data management, data virtualization allows a wide variety of ways of representing a customer, many of them from acquired companies with their own approaches to customer data, to have a common "master record." In business intelligence, data virtualization allows Big Data from outside the organization to be combined with new types of operational data that are not yet stored in the data warehouse and with data-warehouse data in carrying out near-real-time data mining and analytics.
The practical effect of data virtualization is that in the real world it delivers the "move or stay" flexibility that data warehousing alone never could. It does this in two ways:
1. It gives the BI end user a third option: not just copy new data to the data warehouse and wait until the data warehouse allows you access to it or don't copy it and have no access to it, but you also have the option not to copy it and have slower-performance access to it.
2. It makes IT and the organization aware of its data assets, allowing IT to provide high-level BI interfaces below which the data's location can be changed as needed, and allowing the organization to understand better the BI opportunities afforded by new data they hadn't known about.
There are data virtualization products available today not only from the likes of Composite Software but also in the master data management solutions of vendors like IBM, and embeddable in the solutions of business intelligence vendors like MicroStrategy. At the same time, BI buyers should bear in mind that a Metadata Officer to abet storage of metadata information in the repository and enforce corporate standards for master data management is needed – and is a good idea in the long run anyway.
The second advance that BI users should know about is a work in progress, and is also a bit trickier to describe. The essence of the problem with scattering data copies across geographies is that either you have to make it so that every time one copy of the data is updated, it appears to the user as if all other copies are updated simultaneously, or you have to deal with the difficulties that result when one user thinks the data has one value, and the other, another. For example, suppose your bank receives a deposit to your checking account in New York to cover a withdrawal, and the withdrawal itself in Shanghai immediately after. If the Shanghai branch doesn't receive the notification of the first update in time, you will face overdraft fees and will be rightfully annoyed at the bank.
For at least the last thirty years, software folks have been wrestling with this problem. The solution that covers all cases is something called the two-phase commit, but it requires two back-and-forth communications between NY and Shanghai, and is therefore so slow in real life that it can only handle a small percentage of today's data. In the late 1990s, Microsoft and others found a way to delay some of the updates and still make it look like all the data is in synchronization, in the local area networks that support today's global organizations. Above all, over the last few years, the need to support distributed cloud data stores has led to identification of many use cases (often associated with the "noSQL" movement) in which two-phase commit isn't needed and so-called "eventual consistency" is OK. The result is that, on the cloud, you are much more able to keep multiple copies of data in different locations without slowing down business intelligence performance unacceptably. Of course, this is still a long ways from the cloud hype of "your data is somewhere in the cloud, and you don't need to know the location" – and it is likely that in the real world we will never get to the point where data location doesn't matter.
Thus, the business intelligence-using organization should expect IT to be able to use data virtualization to deliver much greater BI flexibility, in the cloud or outside it, and should demand that IT consider the flexibility benefits of noSQL-type data copying in certain use cases – but should not expect cloud data-location nirvana.
The Business Intelligence User Bottom Line: Think Long Term
So "move or stay" is an important decision for the business intelligence buyer or upgrader; it should be made up front by corporate as well as IT; there are much better solutions today that allow smart BI implementers to avoid many past mistakes; and the key to "move or stay" decision success is to emphasize flexibility over raw performance. Suppose you do all that; now what?
The first thing you find is that decisions like "cloud BI or not cloud BI" become a lot easier. With less dependence on data-location-dependent apps, moving these apps and their data from location to location becomes, if not easy, at least doable in a lot more cases. So where now it makes sense to move only a small subset of mission-critical Big Data to a small cloud BI provider, because otherwise the coordination between that and your data warehouse becomes unwieldy, now you can make the decision based more on potential cost savings from the cloud vs. the overall performance advantages (smaller than before) of a single massive data warehouse.
The second thing you discover is that you have created some new problems – but also some new opportunities. The old, dying "data warehouse plus operational database" model of handling enterprise data had its drawbacks; but compared to the new architectures, complexity was not one of them. However, well-designed new architectures that include moved, copied, and accessed-in-place data also allow the BI user to constantly change the proportions rapidly to adapt to new data. In this case, the agility benefits far outweigh the complexity costs.
The third thing you see is that corporate is being forced to think about BI as a long-term investment, not an endless series of "fad" tools – and that's an excellent thing. All too often, today, the market for BI in general and analytics in particular is driven by one-time cost-cutting or immediate needs for insights and rapid decisions. The key difference between organization flexibility and organizational agility is that the latter makes the organization constantly change, because it is always focused on the change after this one. "Move or stay" decisions by IT can make your business intelligence and enterprise architecture more flexible; a long-term mindset by corporate that drives the "move or stay" and other BI decisions makes BI and the whole organization more agile. And data on agile software development suggests that agility delivers greater cost, revenue, and customer satisfaction benefits than flexibility, both in the short and long term.
A final thought: a popular term a few years ago was "glocal," in which the most effective enterprise was the one that best combined global and local insights. "Move or stay" is the equivalent in business intelligence, a way of tuning BI to combine the best of internal/local and Web/cloud/global data for better insights. In essence, "move or stay" success is a way to achieve information glocality. Given the importance today of long-term, customer-information-driven competitive advantage, a key action item for every business inteligence user, in corporate and IT, should be to redesign the BI architecture in accordance with more agile "move or stay" norms. It's not administrivia; it's about the long term.