PMML Makes Predictive Analytics and Data Mining Easier: Page 2
Building Analytics Models That Can Be Shared
Sharing models between applications is key to the success of predictive analytics. But to be able to share a model, you first need to build it. Model building is composed of several phases, including an exhaustive data analysis phase.
"In this phase, you slice and dice raw data and select the most important pieces of information for model building," said Guazzelli.
Raw and derived fields are then used for model training. Typically, only a fraction of the data fields looked at during the analysis phase are used to build the final model.
"When you put a predictive analytic model to work, you usually expect it to do its job for months or years until it needs to be refreshed, most probably because of performance deterioration," said Guazzelli. Then another model is built and deployed in place of the older one.
Without a language such as PMML, deploying predictive solutions would be difficult and cumbersome, as different systems represent their computations in different ways.
"Every time you move a model from one system to another, you go through a lengthy translation process which is prone to errors and misrepresentations," said Guazzelli.
With PMML, the process is straightforward. From application A to B to C, PMML allows predictive solutions to be easily shared and put to work as soon as the model building phase is completed.
"For example, you might build a model in IBM SPSS Statistics and instantly benefit from cloud computing where you can deploy it in ADAPA, the Zementis predictive decisioning platform," said Guazzelli.
Or you can move it to IBM InfoSphere, where it will reside close to the data warehouse, or you can move it to KNIME, an open-source tool for building and visualizing data flows from the University of Konstanz in Germany, said Guazzelli.
This is the power of PMML: enabling true interoperability of models and solutions between applications. PMML also allows IT folks to shield end-users from the complexity associated with statistical tools and models.