Online Library

This page contains numerous papers and white papers about Data Mining and Knowledge Discovery.

A Bayesian Approach to Filtering Junk Email (Sahami et al, 1998)

Cached Sufficient Statistics (Moore et al., 2000) : white papers and some more Artificial Intelligence papers

Clustering and Visualization of Navigation Patterns on a Web Site ( Cadez et al, 2000)

 

Data Mining and Statistics: What is the Connection (Jerome Friedman)

Data Mining at the Interface of Computers and Statistics (Padhraic Smith)

Data Mining Techniques and Algorithms (Dunham, 2000)

Data Mining Tools - A Classification (Decision Framework, DF-07-7020, K. Strange, A. Linden, Research Note, 17 March 1999)

Data Mining with Graphical Methods - Dissertation (Doktoringenieur, 2000)

Designing and Mining Multi-Terabyte Astronomy Archives (Szalay et al, 1999)

How Much Information is There in the World (Michael Lesk, 1997):

bullet

The name of the paper itself speaks about the content. This paper makes various estimates and compares the answers with the estimates of disk and tape sales, and size of all human memory.

Introduction to Data Mining and Knowledge Discovery - Two Crows Paper

Jerome H. Friedman: A few papers of Statistics and Data Mining. Some of them are

bullet

"Data Mining and Statistics: What's the Connection?" (1997)

bullet

"Bump hunting in High Dimensional Data" (1997)

Tips for Successful Data Mining: An Oracle Technical White Paper (June, 1999)

 

The 1999 Magic Quadrant on Data Mining Workbenches ( Markets, M-08-603, A. Linden, Research Note, 16 August 1999)

 

Web Document Clustering (Zamir and Etzioni, 1998)

 

Giga-Mining (Cortes and Pregibon, 1997)

Detecting Group Differences:  Mining Contrast Sets (Bay and Pazzani, 2001)

 

Mining the Network Value of Customers (Richardson and Domingos, 2001)

Some more Data Mining papers from Domingos

Web Mining: Information and Pattern Discovery on the World Wide Web (Cooley et al.,1997)

The Page Rank Citation Ranking: Bringing Order to the Web (Brin et al., 1998): This paper is on page rank algorithm

Split Procedure

Growing Trees for Stratified Modeling (Padraic G. Neville)

Decision Trees for Predictive Modeling (Padraic G. Neville, 1999)

Alternative Neural Architectures (Will Dwinnel)

On Bagging and Non-Linear Estimation (Jerome H. Friedman et al, 2000)

Gene Expression Informatics (Bassett et al, 1999)

bullet

Technologies for whole-genome RNA expression studies are becoming increasingly reliable and accessible. However, universal standards to make the data more suitable for comparative analysis and for inter-operability with other information resources have yet to emerge. Improved access to large electronic data sets, reliable and consistent annotation and effective tools for ‘data mining’ are critical. Analysis methods that exploit large data warehouses of gene expression experiments will be necessary to realize the full potential of this technology.

A Tutorial on Learning With Bayesian Networks (David Heckerman, 1996)

A Tutorial on Bayesian Model Averaging (Hoeting et al, 1999)

Learning Bayes Nets from Data (David Heckerman)

A Complete Tutorial on Bayesian Statistics (Bernardo)

Clustering Large Datasets in Arbitrary Metric Spaces (Ganti, Ramakrishnan, Gehrke)

Additive Logistic Regression: A Statistical View of Boosting (Friedman et al, 1999)

Neural Network Models for Breast Cancer Prognosis (Ruth M. Ripley, 1998)

Predictive Multivariate Responses in Multiple Linear Regression ( Friedman, Brieman, 1997)- Journal of Royal Statistical Society

Salford Systems - Critical Features of High Performance Decision Trees

The Customer Relationship Management Primer - What You Need To Get Started (From CRMguru.com)

On Bias, Variability, 0/1 - Loss, and the Curse of Dimensionality (Friedman, 1996)

Darwin: A Scalable Integrated System for Data Mining (Tamayo et al, 1997)

Data Dictionary (Dr. Hardin)

bullet

A work of reference in which the words of a language or of any system or province of knowledge are entered alphabetically and defined.

Basic Concept of Databases

Decision Theory in Expert Systems and Artificial Intelligence (Horvitz et al, 1988)

Learning with Mixtures of Trees (Meila et al, 2000)

The Computational Support of Scientific Discovery (Pat Langley)

Oracle Darwin Data mining Software for Credit Card Banking (Oracle, 1999)

Oracle Darwin Data mining Software for Database Marketing (Oracle, 1999)

Oracle Darwin Data mining Software for Government (Oracle, 1999)

Oracle Darwin Data mining Software for Network Data Management (Oracle, 1999)

Oracle Darwin Data mining Software for Telecommunications (Oracle, 1999)

Data Mining in the Insurance Industry- A SAS White Paper - Solving Business Problems using SAS Enterprise Miner Software

Internet Relationship Management and Personalization Powered by Data Mining Insights (Oracle, 1999)

Designing and Mining Multi-Terabyte Astronomy Archives: The Sloan Digital Sky Survey (Alexander S. Szalay, 2000)

Data Mining and Visualization (Ron Kohavi, 2000)

Data Mining to Measure and Improve the Success of Websites (Pohle, 2000)

Applications of Data Mining to Electronic Commerce (Kohavi et al, 2001)

Which Visits Lead to Purchases? Dynamic Conversion Behavior at e-Commerce Sites (Wendy W. Moe, 2001)

Finding the Solution to Data Mining - Exploring the features and components of Enterprise Miner (SAS)

Tutorial on e-Commerce and Clickstream Mining Vendor Sites (Becher et al, 2001) - First SIAM International Conference on data Mining

The EMMIX Software for the Fitting of Mixtures of Normal and t-Components (McLachlan)

bullet

User Guide to EMMIX - VErsion 1.3, 1999

Adaptive Fraud Detection (Foster Provost et al, 1997)

Generalization of Boosting Algorithms and Applications of Bayesian Inference for Massive Datasets (Ridgeway, 1999) - A PHD dissertation from the university of Wisconsin at Department of Statistics.

Decision Support in the Booming e- world (James Goodnight, CEO)

Model-Based Clustering and Visualization of Navigation Patterns on a Web Site (Cadez et al, 2001)

How Much Can Data Analysis Be Automated? (Will Dwinnell)

Hugin Expert - A White Paper                                                                                                            

bullet

This is a the leading company in developing software for artificial intelligence and advanced decision support based on complex statistical models (Bayesian Networks (BN))

IBM DB2 Intelligent Miner for Data V6R1, Greater Competitiveness Through Informed Decision Making (IBM, 1999)

IMS Health Prescribes Intelligent Miner for the Health care industry (IBM)

Information Self Organization for Knowledge Discovery (Feng et al)

SAS helps Scientists Decipher Human Genetic Code (Brad Shoemake, 2000) - Info World Reprint

Adaptive Intrusion Detection - A Data Mining Approach (Lee et al, 2000)

A Comparison of Prediction Accuracy, Complexity, and training Time of Thirty Three Old and New Classification Algorithms (Shih et al, 1999)

MCLUST: Software for Model Based Cluster and Discriminate Analysis (Fraley et al, 1999)

Mining New Markets in Telecommunications

An Algorithm for Fitting Mixtures of Gompertz Distributions to Censor Survival Data ( MCLachlan)

Learning with Mixtures of Trees ( Meila, 2000)

Methodological Review: Modeling Medical Prognosis: Survival Analysis Techniques (Machado, 2002)

Bayesian Network Without Tears (Charniak, 1991) - This is a publication of American Association of artificial Intelligence

Statistical Process Analysis of Medical Incidents (Suzuki et al)

A Robust Outlier Detection Scheme for Large Datasets (Tang et al, 2001)

Some Rigorous approaches to Data Mining (Papa Dimitriou)

Usage Pattern Analysis - WhiteCross/NARUS IBI Solutions - a White Paper

Personalization of Super Market Product Recommendations - IBM Research report (Lawrence et al, 2000)

Bump Hunting in High Dimensional Data (Friedman et al, 1998)

Public Workshop on Online Profiling - United States of America Department of Commerce and Federal Trade commission - 1999

Another Approach to Polychotomous Classification (Friedman, 1996)

Direct Policy Search Using Paired Statistical Tests (Strens)

Model Based Clustering, Discriminant Analysis, and Density Estimation (Chris Fraley and Adrian E. Raftery, October 2000)

Greedy Function Approximation: A Gradient Boosting Machine (Jerome H. Friedman, 2000)

Visual Techniques for Exploring Databases (Daniel A. Keim)

Dependency Networks for Inference, Collaborative Filtering, and Data Visualization (David Heckerman et al)

Visualization and Analysis of Click Stream Data of Online Stores for Understanding Web Merchandizing (Lee et al)

Web Document Clustering: A Feasibility Demonstration

Why Mine Data - An Executive Guide - Using Business Intelligence to attract and retain customers ( An Oracle Business White Paper, 1999)

Likelihood-based Data Squashing: A Modeling Approach to Instance Construction (AT & T labs research, 1999)

Classes of Kernels for Machine Learning: A Statistics Perspective

The State of Boosting (Ridgeway)

Stochasic Gradient Boosting (Jerome H. Friedman, 1999)

Polytomous Logistic Regression Trees

Understanding the Crucial Differences Between Classification and Discovery of Association Rules - A Position Paper (Freitas)

Algorithms for Model Based Gaussian Hierarchical Clustering

How Many Clusters? Which Clustering Method? Answers Via Model based Cluster Analysis

Rain Forest - A Framework for Fast Decision Tree Construction of Large Datasets (Venkatesh Ganti et al)

Reducing Customer Churn - Customer Application Story - An Oracle white Paper

Prediction in the Era of Massive Datasets (Ridgeway)

Finding Needles in Haystacks ( Tools for finding structure in large datasets) - Ripley -2000

Mining E-Commerce Data - the good, the bad and the ugly - Kohavi (2000)

Tutorial on E-commerce and Clickstream data mining - First SIAM International Conference on Data mining (2001)

Data mining in the Insurance Industry - A SAS Institute White Paper - Solving Business Solutions using Enterprise Miner Software

Finding the Solution to Data Mining - A SAS Institute White Paper - A map of features of Enterprise Miner software

IBM DB2 Intelligent Miner for data - A Tutorial

A Scent of a Site: A System for Analyzing and Predicting Information Scent, Usage and Usability of the Website