Articles
Applications
of data mining
We have reached full circle in our exploration of data mining.
From Figure 1, it is simple to trace the steps that we have taken
in this journey. We have had an introduction of what is and what
is not data mining and knowledge discovery. We discussed, at length
what processes a data-mining project would follow. We then dived
into detailed discussions of data-mining techniques both visual
and algorithm based techniques for the mining of data.
In this article, we shall discuss some of the ways in which data
mining techniques can be utilized, how they can be applied and what
likely benefits can we derived from them.
Classification
The most common use of data mining is in the area of classification.
Simply stated, classification enables, in an automated fashion,
the creation of distinct segments of data sets, each exhibiting
unique and distinctive behaviors.
Classification is sought for in multiple divisions of an enterprise,
from marketing to R&D, from product development to operations.
One can say that the most people associate data mining to the magical
classification capabilities that it can bring, rather than the many
other aspects, which are just as important.
Consider the classic justification for association-based algorithms:
The operator of a super market chain would like to know what is
the common basket of goods that describes the typical Joe Doe shopper.
In other words, he would like to classify the thousands of customers
that patronize the super markets into realistic segments that he
can serve uniquely.
Millions and millions of transactions that describe the buying
behavior of these customers are collected, scrubbed and processed
in order to establish a basket of goods similar to the ones shown
below.
{eggs, orange juice, butter, diapers, tissue papers..}
{leafy vegetables, orange juice, butter, soft drinks, soya sauce}
{eggs, pepper, orange juice, bread, diapers, cereals, seafood}
{potatoes, leafy vegetables, Milo, soya sauce, fruits}
……
all having strong confidence and support levels.
One might conclude that these are the set of items that are usually
bought on a typical shopping trip to the super market under analysis.
The marketing or operations division might want to consider putting
these collections of goods near to each other in order to encourage
associated purchases. For example, putting eggs near to orange juice
or leafy vegetables near to soya sauce. In addition, they may be
able to use this information to plan the layout of the entire super
market so that a different customer segments would take a different
route across the super market so that there would not be a bottom
neck on a particular aisle of the super market. I am no expert in
grocery shopping dynamics, but I would expect that the results of
association analysis could be useful in creating bundle promotions
where high margin (but less popular) item are bundled with low margin
(very popular) items in order to up sell the customer, increasing
revenue and profits.
This is, of course, just the tip of the iceberg for applications
on dataset classification. Classifications can be used in identifying
loyal customers, those that contribute 80% of the revenue if an
enterprise, those that quit on you the moment your competitor gives
a 2% discount, those that are likely to buy the new product that
you plan to rollout next quarter.
Prediction
The thinking behind predictive applications of data mining is this:
if we have a large enough sample of past transactions and assuming
that the changes in environmental conditions remains constant, we
should be able to use the results of past transactions to predict
future transactions.
Remember decision trees? They are the very often used for predictive
applications in data mining. Applications typically start with an
examination of a large number of previous transactions, known as
the training set, build a decision tree from it and we’d be
ready to use it to predict future outcome with some predetermined
level of confidence. Bankers used them all the time to assist them
in new loans approval by forecasting if the loan would default in
time. Insurance companies use them to provide an estimate on the
risk that they would have to underwrite for a new case that they’d
be evaluating.
Statistical methods, such as regression analysis, are also frequently
used for predictive applications. The simplest always involve some
form of equation generation in the mapping of values generated from
a collection of inter related variables. The known values of a set
of variables would be used to predict those of the other set. Regression
analysis are often used in predicting marketing ROI, where past
promotion programs performance, variations due to seasonality, market
growth, channel spread…etc. are input variables into regression
models that predict ROI on future marketing programs.
Sometimes it is difficult to distinguish between classification
and prediction. Can we say the examples given above are not some
form of classifying different members of a dataset before deciding
if the new member belongs to one of these defined sets? Or are we,
like what we have assumed, predicting the outcome of the new member
based on models that we have created based on past transactions?
So, as you can see, the distinction is often a matter of perspective.
There is this school of thought that differentiates classification
and prediction by the type of outcome that we are trying to estimate:
if the outcome is categorical and discrete (new customer seems to
belong to group 1) then it is classification, if the outcome is
continuous (the forecasted incremental revenue of this new marketing
campaign is $837.38) then it is a predictive application. For us,
we’ll leave this distinction to the academics.
So, we have once again reached the end of another article. We have
looked at 2 common applications of data mining: Classification and
Prediction. The lines that separate the two are sometimes blurred,
but their applications to real life are obvious. In fact, many,
from marketing to operations, from sales to support have taken advantages
of their applications and reaped substantial benefits. And contrary
to popular belief, it is not difficult to get started.
|