Building Customer Loyalty
About SurfGold
Solutions
Products
Approach
Our Clients
Knowledge Hub
Careers
Home > Knowledge Hub > White Papers | Information tools | Articles | Press Releases
Articles

Applications of data mining

We have reached full circle in our exploration of data mining. From Figure 1, it is simple to trace the steps that we have taken in this journey. We have had an introduction of what is and what is not data mining and knowledge discovery. We discussed, at length what processes a data-mining project would follow. We then dived into detailed discussions of data-mining techniques both visual and algorithm based techniques for the mining of data.

In this article, we shall discuss some of the ways in which data mining techniques can be utilized, how they can be applied and what likely benefits can we derived from them.

Classification
The most common use of data mining is in the area of classification. Simply stated, classification enables, in an automated fashion, the creation of distinct segments of data sets, each exhibiting unique and distinctive behaviors.

Classification is sought for in multiple divisions of an enterprise, from marketing to R&D, from product development to operations. One can say that the most people associate data mining to the magical classification capabilities that it can bring, rather than the many other aspects, which are just as important.

Consider the classic justification for association-based algorithms: The operator of a super market chain would like to know what is the common basket of goods that describes the typical Joe Doe shopper. In other words, he would like to classify the thousands of customers that patronize the super markets into realistic segments that he can serve uniquely.

Millions and millions of transactions that describe the buying behavior of these customers are collected, scrubbed and processed in order to establish a basket of goods similar to the ones shown below.

{eggs, orange juice, butter, diapers, tissue papers..}

{leafy vegetables, orange juice, butter, soft drinks, soya sauce}

{eggs, pepper, orange juice, bread, diapers, cereals, seafood}

{potatoes, leafy vegetables, Milo, soya sauce, fruits}

……

all having strong confidence and support levels.

One might conclude that these are the set of items that are usually bought on a typical shopping trip to the super market under analysis. The marketing or operations division might want to consider putting these collections of goods near to each other in order to encourage associated purchases. For example, putting eggs near to orange juice or leafy vegetables near to soya sauce. In addition, they may be able to use this information to plan the layout of the entire super market so that a different customer segments would take a different route across the super market so that there would not be a bottom neck on a particular aisle of the super market. I am no expert in grocery shopping dynamics, but I would expect that the results of association analysis could be useful in creating bundle promotions where high margin (but less popular) item are bundled with low margin (very popular) items in order to up sell the customer, increasing revenue and profits.

This is, of course, just the tip of the iceberg for applications on dataset classification. Classifications can be used in identifying loyal customers, those that contribute 80% of the revenue if an enterprise, those that quit on you the moment your competitor gives a 2% discount, those that are likely to buy the new product that you plan to rollout next quarter.

Prediction
The thinking behind predictive applications of data mining is this: if we have a large enough sample of past transactions and assuming that the changes in environmental conditions remains constant, we should be able to use the results of past transactions to predict future transactions.

Remember decision trees? They are the very often used for predictive applications in data mining. Applications typically start with an examination of a large number of previous transactions, known as the training set, build a decision tree from it and we’d be ready to use it to predict future outcome with some predetermined level of confidence. Bankers used them all the time to assist them in new loans approval by forecasting if the loan would default in time. Insurance companies use them to provide an estimate on the risk that they would have to underwrite for a new case that they’d be evaluating.

Statistical methods, such as regression analysis, are also frequently used for predictive applications. The simplest always involve some form of equation generation in the mapping of values generated from a collection of inter related variables. The known values of a set of variables would be used to predict those of the other set. Regression analysis are often used in predicting marketing ROI, where past promotion programs performance, variations due to seasonality, market growth, channel spread…etc. are input variables into regression models that predict ROI on future marketing programs.

Sometimes it is difficult to distinguish between classification and prediction. Can we say the examples given above are not some form of classifying different members of a dataset before deciding if the new member belongs to one of these defined sets? Or are we, like what we have assumed, predicting the outcome of the new member based on models that we have created based on past transactions? So, as you can see, the distinction is often a matter of perspective. There is this school of thought that differentiates classification and prediction by the type of outcome that we are trying to estimate: if the outcome is categorical and discrete (new customer seems to belong to group 1) then it is classification, if the outcome is continuous (the forecasted incremental revenue of this new marketing campaign is $837.38) then it is a predictive application. For us, we’ll leave this distinction to the academics.

So, we have once again reached the end of another article. We have looked at 2 common applications of data mining: Classification and Prediction. The lines that separate the two are sometimes blurred, but their applications to real life are obvious. In fact, many, from marketing to operations, from sales to support have taken advantages of their applications and reaped substantial benefits. And contrary to popular belief, it is not difficult to get started.

PRM solution,Partner Relationship Management,Partner Relationship Marketing

Related Links
Download
our Fact Sheet
on Data Analytics
Read the
HP Case Study
on Data Analytics

Download the Data Analytics Brochure
Click here to download Chapter 1 of our book on Data Analytics
© Copyright 2005 SurfGold. All rights reserved.

Customer Loyalty Solutions | Partner Relationship Management | Data Analytics | Promo@Ease | AdoreAsia Rewards | Loyalty Whitepapers | Relationship Management Consulting | Loyalty Case Studies | Loyalty Engine | Loyalty Cube | PRM Solutions | Strategic Planning Process | Loyalty Solutions