Data Virtualization and Big Data Business Intelligence: Page 2
DV, EAI and ETL
There's also a lot of confusion out there about data integration, Enterprise Application Integration (EAI), and ETL (Extract, Transform, Load) tools. Loosely speaking, DV, ETL, and EAI all perform data integration, or the combination of data from different data sources. But data virtualization delivers the combined data in real time, and gives you the choice of whether to put the combined data in a physical data store or not. EAI allows users to exchange data between their enterprise apps' data stores, without trying to keep these apps' data in sync by real-time exchange. ETL takes relational data from multiple data sources, puts it in the format of the data warehouse, cleanses it, and bulk loads it into the warehouse's data store, with a significant delay between the time it arrives at the data source and the time it arrives in the data warehouse – often a day or more. So the key differences are:
- Data virtualization is real-time;
- DV handles a wide variety of data types; and
- DV gives you the choice of storing the combined data or not.
Above all, data virtualization plus ETL and EAI gives you a full spectrum of choices between consolidating your data in one or a few data stores, or keeping it in a broad array of data sources, with a DV "veneer" that makes it all look like one gigantic real-time-accessible database and data store to end users, programmers, and administrators.
The IT buyer bottom line
The implications for IT Buyers trying to implement better business intelligence, and particularly trying to figure out how to combine access to Big Data with their existing BI, are really straightforward. As in the past, data virtualization stands ready to allow you to not only query across both sets of data in near real time, but also make it look like one gigantic database and one gigantic data store, for MDM, for data discovery, for isolating the data warehouse from the very different characteristics of Hadoop-accessed Big Data, and because you have about as much chance of downloading Big Data in bulk into your data warehouse for near-real-time access as a snowball does of surviving hell.
Data virtualization applied to Big Data is an interesting case in point. As I have noted, Big Data accessed by interfaces such as Hadoop offers unprecedented scalability via "delayed consistency," especially when you try to access Facebook-type social media data. However, to achieve this it risks consistency between various data copies – so that what you see may be inaccurate or out of date – and downtime as the system catches up. Therefore, you need to isolate the rich-data risky querying of Big Data from your existing mission-critical data warehouse while combining business intelligence directed at each. Data virtualization is exactly the way to do this. Its long experience in supplementing data warehouse business intelligence and its flexibility in incorporating new data types allows you to both isolate the two data sources from each other when necessary and to combine their output in real time when appropriate.
What data virtualization solutions are out there? Well, as you can see from the above discussion, there are plenty of large vendors to pick from, including IBM, SAP/Sybase, Red Hat, and Oracle/BEA. However, I have a special fondness for the smaller ones like Composite and Denodo that are still plugging away independently – because, more than others, they have focused their efforts for many years on delivering top performance for this particular use case. Their long alliances with folks like MicroStrategy ensure that your integration into more general solutions, and your service and support, should also be top-notch. Still, your mileage may vary, and your existing vendor of other solutions may fit your needs best.
One more point: I stress again that data virtualization also makes your business intelligence more agile, no matter where you use it – if you are open to each new data type as it arrives. Don't just put data virtualization in there and let it hum along automatically. Constantly ask, are there new data types out there, new forms of Big Data, new types of "sensor data" from smartphone videos, that I can apply this to? Because if you don't, some of your competitors will. That's what agile business intelligence is all about.