What is data mining?
Data mining is a hugely complex process, a form of artificial intelligence, which generally outputs results in the form of patterns, or predictions. There are many “families” of data mining depending on the type of analysis algorithm used.
Here are a few definitions.
Oracle define data mining as follows:
The practice of automatically searching large stores of data to discover patterns and trends that go beyond simple analysis. Data mining uses sophisticated mathematical algorithms to segment the data and evaluate the probability of future events. Data mining is also known as Knowledge Discovery in Data (KDD).
The key properties of data mining are:
- Automatic discovery of patterns
- Prediction of likely outcomes
- Creation of actionable information
- Focus on large datasets and databases
Data mining can answer questions that cannot be addressed through simple query and reporting techniques.
Microsoft SQL Server 2016 contains “Analysis Services” which categorize algorithms as follows:
- Classification algorithms predict one or more discrete variables, based on the other attributes in the dataset.
- Regression algorithms predict one or more continuous variables, such as profit or loss, based on other attributes in the dataset.
- Segmentation algorithms divide data into groups, or clusters, of items that have similar properties.
- Association algorithms find correlations between different attributes in a dataset. The most common application of this kind of algorithm is for creating association rules, which can be used in a market basket analysis.
- Sequence analysis algorithms summarize frequent sequences or episodes in data, such as a Web path flow. The most fascinating aspect of data mining is the fact that it may return unexpected or unpredictable results.
The Easy.Data.Mining site contains an interesting list of data mining experiments, here is the summary of the 3 I found most interesting:
- Beer & nappies: A supermarket put aside all and it reassessed its sales strategy with respect to the positioning of goods in the market. Even the usual categories of products of the trade chain were ignored, i.e. foods were not just compared to foods, but also to everything else. The supermarket also added other data for the analysis – e.g. the gender of the buyers, weekdays, and more. Men who have children and who (have to) do the shopping on Saturdays often tend to buy nappies for their little ones plus beer for the weekend evenings in front of the television. Subsequently, the supermarket decided to position the palettes of beer besides those of nappies on Saturdays – with the success of strongly risen sales figures.
- A car insurance company want to predict the probability of a car accident happening within a certain period of time on the basis of customer data available at the time of signing the insurance policy (e.g. personal data, attributes of the car to be insured, history of accidents.). A data table is available with each data record representing the data of a past customer at the beginning of a year and the customer’s claim class in that year. A prediction model is created using this data table. The prediction model reveals interesting customer segments with a high risk of belonging to a bad claim class.
- In a medical test phase a new treatment is performed on test patients. Personal attributes (e.g. weight, gender, medical history) is obtained and stored for each test patient. At the end of the test phase the patients are split into different classes depending if they reacted positively, neutrally or negatively to the treatment. Pattern recognition may reveal the combinations of attributes responsible for a patient to react positively or negatively to the treatment.
Why not try it out
While producing a data mining solution is extremely costly and requires years of research, there are software solutions out there, often included in CRM, data intelligence or database software suites.
Here is a list of Software Products containing Data Mining functionality:
- SAS (Enterprise Miner)
- IBM (SPSS Modeler)
- Microsoft SQL (Analysis Services)
- Oracle Data Mining
There are also many free solutions out there, such as Apache Mahout.
So why don’t you check whether the software licensed for your company includes data mining features. Spend some time running algorithms against your company’s data, training (fine-tuning) the data mining engine and who knows, you may find a rare gem or nugget which will revolutionize the way your company does business!
- Data Mining Concepts – What Is Data Mining, Oracle, http://docs.oracle.com/cd/B28359_01/datamine.111/b28129/process.htm (accessed: 19 September 2015)
- Data Mining Algorithms (Analysis Services – Data Mining), MSDN, Microsoft, https://msdn.microsoft.com/en-us/library/ms175595.aspx (accessed: 19 September 2015)
- Data Mining in Practice, Easy.Data.Mining, http://www.easydatamining.com/en/data-mining/data-mining-in-practice/ (accessed: 19 September 2015)
- “40 Top Free Data Mining Software”, Predictive Analytics Today, http://www.predictiveanalyticstoday.com/top-free-data-mining-software (accessed: 19 September 2015)
- Top 26 Data Mining Software – Predictive Analytics Today, http://www.predictiveanalyticstoday.com/top-data-mining-software (accessed: 19 September 2015)