Business Intelligence Ain’t Over Until Exploratory Data Analysis Sings: Page 2
EDA is about analyzing smaller initial amounts of data to generate as many plausible hypotheses (or “patterns in the data”) as possible, before winnowing them down with further data. For example, the technique creates abstract unlabeled visualizations (“data visualization”) of possible patterns, such as the strangely-named box-and-whisker plot, and then picks the ones with the most promise.
In practice, EDA identifies far more hypotheses than pre-defined models alone. Therefore, applying EDA right after partial data collection typically results in quite a few more key insights.
The automation of EDA over the last couple of decades means the average analyst can insert EDA into his/her typical analysis routine with minimal training and minor added analysis time. Preliminary use cases in academia suggest that effective use of EDA should yield a major improvement in analytics effectiveness “at the margin” (in resulting in-depth analyses) for a small "time overhead" cost.
Where Is It?
You would think, given its potential advantages, that EDA would be used more often in corporate business intelligence. It’s not necessarily the software’s fault. There is, for example, an open-source solution, Orange, which has merged full EDA capabilities with a fully capable data-mining tool.
And yet, in both academia and business, business intelligence is a hot topic while EDA is rarely mentioned. Even vendor EDA products don’t appear to be flourishing as they ought. For a while, SAS’ JMP product stood bravely and alone as a tool that could at least potentially be used by businesses. However, according to Wikipedia, SAS has recently discontinued support for its use on Linux. Still, Wikipedia notes 14 such EDA suites, and there may be more out there.
So let’s summarize: EDA is out there. It’s easy to use. Now that statistical analysis in general is creeping into greater use in analytics, users are ready for it. I fully anticipate that it would have major positive effects on in-depth analytics for enterprises from the largest down at least to many medium-sized ones.
IT shops will have to do some customization and integration themselves, because most if not all vendors have not yet fully integrated it as part of the analytics process in their business intelligence suites, but with open-source and other “standard” EDA tools that’s not inordinately difficult. The only thing lacking is for somebody, anybody, to wake up and pay attention.
IT would be a great driver of EDA use. The most effective initial use of EDA is in support of the longer-term efforts of today’s business analysts, and not in IT-driven agile business intelligence. However, IT should find these business analysts to be surprisingly receptive to an IT EDA pitch – or, at the least, amazed that IT isn’t being a “boat anchor” yet again.
You see, EDA has a sheen of “innovation” about it, and so folks who are in some way associated with the business’ “innovation” efforts should like it a lot. The rest is simply a matter of its becoming part of these business analysts' growing toolkit of rapid-query-generation and statistical in-depth-insight-at-the-margin tools. EDA may not in the normal course of usage get the glory of notice as the source of a new competition-killer, but with a little assiduous use-case monitoring by IT the business case can be made.
It is equally important for IT to note that EDA is twice as effective if it is joined at the front end by a data-gathering process that is to a much greater extent open-ended, customer-driven and flexible in the type of data gathered. Remember, there are ways of doing this – such as parallel in-depth customer interviews or Internet surveys that don’t just parrot SurveyMonkey – that add little “overhead” to data-gathering. IT should seriously consider doing this as well, and preferably design the data-gathering process so as to feed the gathered data to EDA tools, where in-depth statistical analysis of that data will probably be an appropriate next step.
The overall effect will be like replacing a steadily narrowing view of the data with one that expands the potential analyses until the right balance between “data blindness” and “paralysis by analysis” risks is reached.
Making Data Sing
To view EDA as comparable to other business intelligence technologies/solutions is to miss the point. EDA is much more like agile development; its main value lies in changing our analytics methodology, not in improving traditional analyses. It helps the organization itself to think not “outside the box” but “outside the organization” – to be able to combine the viewpoint of the vendor with the viewpoint and reality of the customer, rather than trying to force customer interactions into corporate fantasies of the way customers should think and act for maximum vendor profit.
We all witnessed the public-relations disaster that ensued when Bank of America announced new charges for debit cards – a situation that, if we were honest, we would admit most other enterprises find it all too easy to stumble into. If EDA (or, better still, EDA plus open-ended, customer-driven data-gathering) prevents only one such misstep, it will pay for itself 10 times over, no matter what the numbers say.
EDA seems like it’s about competitive advantage. That’s true as far as it goes, but EDA is actually much more about business risk.
The reference in my title is to an old, bad joke in which a Mob boss, to gain culture, attends an opera, accompanied by his lieutenants. As the long second act draws to a close, the henchmen start getting restless. “Wait!” shouts the boss with cultural authority, silencing their complaints. “It ain’t over ‘til the fat lady sings!” And, of course, that it is what EDA should do for you: make your hypotheses broader and meatier so that, ultimately, the data really sings.
Companies might find out through EDA of their Facebook pages that several key customers-of-one are in opera-lovers’ groups and will respond to an Internet commercial with an aria in it. Sing PROFIT! That's a happy ending.
Wayne Kernochan is the president of Infostructure Associates, an affiliate of Valley View Ventures that aims to identify ways for businesses to leverage information for innovation and competitive advantage. Wayne has been an IT industry analyst for 22 years. During that time, he has focused on analytics, databases, development tools and middleware, and ways to measure their effectiveness, such as TCO, ROI, and agility measures. He has worked for respected firms such as Yankee Group, Aberdeen Group and Illuminata, and has helped craft marketing strategies based on competitive intelligence for vendors ranging from Progress Software to IBM.