Term

Definition 
Accuracy 
A measure of a predictive model that reflects the proportionate number of times that the model is correct when applied to data 
Bias 
Difference between expected value and actual value 
Cardinality 
Data mining terms indicating the number of different values a categorical predictor or OLAP dimension can have. High cardinality predictors and dimensions have large numbers of different values (e.g. zip codes), low cardinality fields have few different values (e.g. eye color). 
CART 
Classification and Regression Trees. A type of decision tree algorithm that automates the pruning process through cross validation and other techniques. 
CHAID 
ChiSquare Automatic Interaction Detector. A decision tree that uses contingency tables and the chisquare test to create the tree. Classification. The process of learning to distinguish and discriminate between different input patterns using a supervised training algorithm. Classification is the process of determining that a record belongs to a group 
Cluster Centroid 
most typical case in a cluster. The centroid is a prototype. It does not necessarily describe any given case assigned to the cluster. 
Clustering 
The technique of grouping records together based on their locality and connectivity within the ndimensional space. This is an unsupervised learning technique. 
Collinearity 
The property of two predictors showing significant correlation without a causal relationship between them 
concentration of measure 
any set of positive probability can be expanded very slightly to contain most of the probability the average of bounded independent random variables is tightly concentrated around its expectation 
Conditional Probability 
The probability of an event happening given that some event has already occurred. For example the chance of a person committing fraud is much greater given that the person had previously committed fraud 
Confidence 
The likelihood of the predicted outcome, given that the rule has been satisfied. 
convergence of random variables 
a sequence of essentially random or unpredictable events can sometimes be expected to settle down into a behaviour that is essentially unchanging when items far enough into the sequence are studied 
correlation 
number that describes the degree of relationship between two variables 
Coverage 
A number that represents either the number of times that a rule can be applied or the percentage of times that it can be applied 
Crossvalidation 
The process of holding aside some training data which is not used to build a predictive model and to later use that data to estimate the accuracy of the model on unseen data simulating the real world deployment of the model. 
Data Mining Process 
Define the problem. Select the data. Prepare the data. Mine the data. Deploy the model. Take business action. 
Discrete Fourier Transform 
Concentrates energy in first few coefficients 
Entropy 
A measure often used in data mining algorithms that measures the disorder of a set of data 
Error Rate 
A number that reflects the rate of errors made by a predictive model. It is one minus the accuracy 
Expectation–maximization 
algorithm for estimating parameters where there exist significant missing or inferred values 
ExpectationMaximization (EM) 
Solves estimation with incomplete data. Iteratively use estimates for missing data and continue until convergence 
Expert System 
A data processing system comprising a knowledge base (rules), an inference (rules) engine, and a working memory 
Exploratory Data Analysis 
The processes and techniques for general exploration of data for patterns in preparation for more directed analysis of the data 
Factor Analysis 
A statistical technique which seeks to reduce the number of total predictors from a large number to only a few “factors” that have the majority of the impact on the predicted outcome. 
Fuzzy Logic 
A system of logic based on the fuzzy set theory 
Fuzzy Set 
A set of items whose degree of membership in the set may range from 0 to 1 
Fuzzy System 
A set of rules using fuzzy linguistic variables described by fuzzy sets and processed using fuzzy logic operations 
Genetic Algorithm 
Optimization techniques that use processes such as generic combination, mutation, and natural selection in a design based on the concepts of revolution 
Genetic Operator 
An operation on the population member strings in a genetic algorithm which are used to produce new strings 
Gini Index 
A measure of the disorder reduction caused by the splitting of data in a decision tree algorithm. Gini and the entropy metric are the most popular ways of selected predictors in the CART decision tree algorithm 
Hebbian Learning 
One of the simplest and oldest forms of training a neural network. It is loosely based on observations of the human brain. The neural net link weights are strengthened between any nodes that are active at the same time. 
Hill Climbing 
A simple optimization technique that modifies a proposed solution by a small amount and then accepts it if it is better than the previous solution. The technique can be slow and suffers from being caught in local optima 
Hypothesis Testing 
The statistical process of proposing a hypothesis to explain the existing data and then testing to see the likelihood of that hypothesis being the explanation 
ID3 
Decision Tree algorithm 
Intelligent Agent 
A software application which assists a system or a user by automating a task. Intelligent agents must recognize events and use domain knowledge to take appropriate actions based on those events. 
Itemset 
An itemset is any combination of two or more items in a transaction 
Jackknife Estimate 
estimate of parameter is obtained by omitting one value from the set of observed values. Allows you to examine the impact of outliers. 
Kernel 
a function that transforms the input data to a highdimensional space where the problem is solved 
kNearest Neighbor 
A data mining technique that performs prediction by finding the prediction value of records (near neighbors) similar to the record to be predicted 
Kohonen Network 
A type of neural network where locality of the nodes learn as local neighborhoods and locality of the nodes is important in the training process. They are often used for clustering 
Latent variable 
variables inferred from a model rather than observed 
Lift 
A number representing the increase in responses from a targeted marketing application using a predictive model over the response rate achieved when no model is used 
Machine Learning 
A field of science and technology concerned with building machines that learn. In general it differs from Artificial Intelligence in that learning is considered to be just one of a number of ways of creating an artificial intelligence 
maximum likelihood 
method for estimating the parameters of a model 
Maximum Likelihood Estimate (MLE) 
Obtain parameter estimates that maximize the probability that the sample data occurs for the specific model. Joint probability for observing the sample data by multiplying the individual probabilities. 
Mean Absolute Error 
AVG(ABS(predicted_value – actual_value)) 
Mean Squared Error (MSE) 
expected value of the squared difference between the estimate and the actual value 
MemoryBased Reasoning (MBR) 
A technique for classifying records in a database by comparing them with similar records that are already classified. A form of nearest neighbor classification. 
Minimum Description Length (MDL) Principle 
The idea that the least complex predictive model (with acceptable accuracy) will be the one that best reflects the true underlying model and performs most accurately on new data. 
Model 
A description that adequately explains and predicts relevant data but is generally much smaller than the data itself 
Neural Network 
A computing model based on the architecture of the brain. A neural network consists of multiple simple processing units connected by adaptive weights 
Nominal Categorical Predictor 
A predictor that is categorical (finite cardinality) but where the values of the predictor have no particular order. For example, red, green, blue as values for the predictor “eye color”. 
Ordinal Categorical Predictor 
A categorical predictor (i.e. has finite number of values) where the values have order but do not convey meaningful intervals or distances between them. For example the values high, middle and low for the income predictor 
Outlier Analysis 
A type of data analysis that seeks to determine and report on records in the database that are significantly different from expectations. The technique is used for data cleansing, spotting emerging trends and recognizing unusually good or bad performers 
overfitting 
The effect in data analysis, data mining and biological learning of training too closely on limited available data and building models that do not generalize well to new unseen data. At the limit, overfitting is synonymous with rote memorization where no generalized model of future situations is built 
Point Estimation 
estimate a population parameter. May be made by calculating the parameter for a sample. May be used to predict value for missing data. 
Predictive model 
model created or used to perform prediction. In contrast to models created solely for pattern detection, exploration or general organization of the data 
Predictor 
The column or field in a database that could be used to build a predictive model to predict the values in another field or column. Also called variable, independent variable, dimension, or feature. 
Principle Component Analysis 
A data analysis technique that seeks to weight the importance of a variety of predictors so that they optimally discriminate between various possible predicted outcomes 
Prior Probability 
The probability of an event occurring without dependence on (conditional to) some other event. In contrast to conditional probability 
Purity/Homogeneity 
the degree to which the resulting child nodes are made up of cases with the same target value 
Radial Basis Function Networks 
Neural networks that combine some of the advantages of neural networks with those of nearest neighbor techniques. In radial basis functions the hidden layer is made up of nodes that represent prototypes or clusters of records 
Receiver Operating Characteristic (ROC) 
The area under the ROC curve (AUC) measures the discriminating ability of a binary classification model. The larger the AUC, the higher the likelihood that an actual positive case will be assigned a higher probability of being positive than an actual negative case. The AUC measure is especially useful for data sets with unbalanced target distribution (one target class dominates the other). 
Regression 
A data analysis technique classically used in statistics for building predictive models for continuous prediction fields. The technique automatically determines a mathematical equation that minimizes some measure of the error between the prediction from the regression model and the actual data 
Reinforcement Learning 
A training model where an intelligence engine (e.g. neural network) is presented with a sequence of input data followed by a reinforcement signal 
Root Mean Squared Error 
SQRT(AVG((predicted_value – actual_value) * (predicted_value – actual_value))) 
Sampling 
The process by which only a fraction of all available data is used to build a model or perform exploratory analysis. Sampling can provide relatively good models at much less computational expense than using the entire database 
Segmentation 
The process or result of the process that creates mutually exclusive collections of records that share similar attributes either in unsupervised learning (such as clustering) or in supervised learning for a particular prediction field 
Sensitivity Analysis 
The process which determines the sensitivity of a predictive model to small fluctuations in predictor value. Through this technique end users can gauge the effects of noise and environmental change on the accuracy of the model 
Simulated Annealing 
An optimization algorithm loosely based on the physical process of annealing metals through controlled heating and cooling 
Sparsity 
This means that a high proportion of the nested rows are not populated. 
Statistical Independence 
The property of two events displaying no causality or relationship of any kind. This can be quantitatively defined as occurring when the product of the probabilities of each event is equal to the probability of the both events occurring 
Stepwise Regression 
Automated Regressions to identify most predictive variables. 1st regression finds most predictive, 2nd regression finds most predictive given 1st regression. 
Supervised Algorithm 
A class of data mining and machine learning applications and techniques where the system builds a model based on the prediction of a well defined prediction field. This is in contrast to unsupervised learning where there is no particular goal aside from pattern detection. 
Support 
The relative frequency or number of times a rule produced by a rule induction system occurs within the database. The higher the support the better the chance of the rule capturing a statistically significant pattern. 
Term 
Definition 
TimeSeries Prediction 
The process of using a data mining tool (e.g., neural networks) to learn to predict temporal sequences of patterns, so that, given a set of patterns, it can predict a future value 
Unsupervised Algorithm 
A data analysis technique whereby a model is built without a well defined goal or prediction field. The systems are used for exploration and general data organization. Clustering is an example of an unsupervised learning system 
Visualization 
Graphical display of data and models which helps the user in understanding the structure and meaning of the information contained in them 