Building Customer Loyalty
About SurfGold
Solutions
Products
Approach
Our Clients
Knowledge Hub
Careers
Home > Knowledge Hub > White Papers | Information tools | Articles | Press Releases
Articles

Data Mining

Data mining is part of a larger discipline known as Knowledge Discovery in Databases (KDD), which involved disciplines such as mathematics, algorithms, machine learning, statistics, artificial intelligence in order to make more intelligence sense of data while at the same time refining our understanding of the data collected.

Why is Data Mining Important?
Look around you. The world is obviously a data centric one. Humans and corporations alike have been collecting data with a fervor that challenges that of the gold rush. The innate need to tag things, to label objects, to track transactions, have lead to a parallel industry to record, classify, categorize, segment and otherwise organize the data thus collected.

We, hence, live in a sea of data. Not very much unlike deep-sea ecosystems where unimaginable life forms wobble along their existence in an environment surrounded by water. We are, unfortunately, not very much smarter than these creatures whose world is in perpetual darkness. We are a visitor to the Library of Babel, as described by Argentine writer Jorge Luis Borges, whose infinite shelves of books contain an infinite amount of knowledge, yet unattainable without the discovery of the hinted catalogue of catalogues nor accessible since the contents of which are incomprehensible by humans. We know who eats in which restaurant at which time and yet have no idea when he would be hungry for pan-cakes next, we know where and when he pumped his last gallon of gas, but would not have any clue as to why if he decides to switch his petrol alliance to someone else.

The answer is as simple as the question: data, massive amounts of it, does not equate to intelligence. We have massive collections of data, but we are no more intelligent than we have been before. Which explains why there is currently so much interest in the area of data mining – the art and science of extracting intelligence from massive collections of data.

Similar to any technology under the spotlight, data mining is a frequently misunderstood term. I have personally encountered situations where companies promoting statistical packages as having data mining capabilities. There are others who regurgitate data from backend systems to nicely formatted web fronts who also claim to have data mining competencies. The classics would be those that tout POS systems that supply transactional information as data mining systems.

What exactly is data mining, this mysterious silver bullet that would convert our heaps and heaps of data into intelligence? Simply put, data mining refers to processes and algorithms that enable the discovery of hidden information from collections of data. This discovered information can be trends, segmentation, clusters, associations, rules-of-thumb, understandings,..etc.; fundamentally, a new realization of the data that was previously unknown.

In fact, it is widely accepted that, data mining is but the discovery stage of a wider discipline known as Knowledge Discovery in Databases (KDD) . Pieter Adriaans et al. defines KDD as the “ non-trivial extraction of implicit, previously unknown and potentially useful information from data”. KDD is hence an amalgamation of machine learning, statistical methods, visualization, expert systems and database technologies.

This series of articles attempts to offer a comprehensive introduction to the entire KDD process and not limit itself to data mining algorithms and methods. Future issues will discuss topics on Data Preparation, Cleansing and Transformation, Data Visualization techniques and Data Modeling. The intension is to round off the series with a discussion of real life applications of data mining techniques and how companies have benefited from these explorations. Oh, yes, there will also be an elaboration of data mining algorithms with an emphasis on the more popular methods for intelligence discovery.
To round off this issue, perhaps it might be useful to clarify some of the greatest myths surrounding data mining.

1 I don’t think there are any trends or clusters in my data set.Nothing can be further from the truth. Data mining engagements always suffers from insufficient data, both in quality (number of attributes available) and in quantity (number of instances available), but seldom in the lack of character. There are rare occasions where data sets, of significant size, did not reveal anything interesting patterns, but these are indeed far and few in between. More often than not, data mining uncovers interesting patterns that were at best suspected.

2 My data is mine. I don’t want you to be sticking your nose into them in case the competition gets wind of what it reveals.This is partially true. Data in its raw form reveals many things. Like how sloppy the data collection process can be. Or how redundancy might be exploited in order to build in data verification or solicit more details from the user. It is, therefore, conceivable that an analysis of the data would reveal the deepest and darkest secrets within your enterprise. But there are many things that can be done to protect the semantic content of your data. Encryption is one. Normalization is the other. Meta data generation is yet another. We can probably fill up this entire article with transformations that can be applied to data in order to achieve the objective of privacy with the added benefit data modeling before applying data mining algorithms.

3 I am already doing data mining. We have this application that display my data in rows and columns, allowing me to drill down its rows and columns to see greater and greater details of my business.

You could be deriving a great deal of insight through drilling down rows and columns, but that is data mining, NOT. What you have just described is probably an OLAP application. OLAP, which stands for Online Analytical Processing, allows the user to gather data from multiple databases into highly complex tables. OLAP basically deals with aggregates, which is fairly different from intelligence discovery such as identifying patterns, trends, segmentations, clusters and associations.

However, it is unwise to overlook what OLAP tools can reveal, especially when used in conjunction with data mining algorithms. In fact, I have seen numerous instances of how OLAP and data mining offerings augment each other providing great insights to the business operations. In fact, OLAP will be discussed extensively in a later issue in conjunction with analytical methods.

4 I am a marketing guy; I don’t understand predictive regression analysis nor cross correlation matrix. Heck, I don’t even know how to set up a DBMS properly nor do I have the money to pay for the hardware or software needed for data mining.

Again, this is partially true. Data mining has, however, evolved into offerings available in an ASP (Application Service Provider) model. You pay the data-mining provider a small fee per month and you get reports on the analysis performed on your data set. You don’t need to know how predictive regression analysis is done, not how cross correlation matrices are created. Heck, you don’t even know how to set up a DBMS, your friendly data-mining service provider will provide you with reports that describe the interesting patterns discovered in your data set in terms that you understand.

5 Data mining is not suitable for me. I don’t have a computer system; the only data I have of my customers’ is in the form of paper invoices or warranty cards. Nothing can be further from the truth. If you are keen in uncovering hidden trends in your data, the analog nature by which your data is trapped in is not going to stop you. In fact, many of the data mining engagements that we have undertaken began with data trapped in invoices, warranty cards, reports…etc. There are well-established, automated and semi-automated ways to encode and otherwise digitalize data into a form on which data mining algorithms can be applied.

Data mining is a field that has generated a lot of interest lately. It tends to straddle between the realm of art and science, requiring the data-mining practitioner to be both well versed in the science of database technologies and the art of data manipulation. Too many people, in the recent past, have claimed to offer data mining solutions. Too many of them borders on outright misrepresentation and fraud. It is hope that this series of articles will demystify many of the fallacies surrounding data mining. In the next article we shall describe the processes necessary to set up a knowledge discovery environment.
Terms Description
DM Data Mining or Direct Marketing or Direct Mailer
DBMS Database Mangement System
RDBMS Relational DBMS
KDO Knowledge Discovery in Databases
OLAP Online Analytical Processing
OLTP Online Transactional Processing
BI Business Intelligence
RTL Extract-TransForm-Load
k-NN k-nearest neighbour algorithm


Customer Relationship Management,Channel Loyalty Program,Channel Marketing,Consumer Rewards

Related Links
Download
our Fact Sheet
on Data Analytics
Read the
HP Case Study
on Data Analytics

Download the Data Analytics Brochure
Click here to download Chapter 1 of our book on Data Analytics
© Copyright 2005 SurfGold. All rights reserved.

Customer Loyalty Solutions | Partner Relationship Management | Data Analytics | Promo@Ease | AdoreAsia Rewards | Loyalty Whitepapers | Relationship Management Consulting | Loyalty Case Studies | Loyalty Engine | Loyalty Cube | PRM Solutions | Strategic Planning Process | Loyalty Solutions