A Bayesian Approach to Filtering Junk Email (Sahami et al, 1998)
Cached Sufficient Statistics (Moore et al., 2000) : white papers and some more Artificial Intelligence papers
Clustering and Visualization of Navigation Patterns on a Web Site ( Cadez et al, 2000)
Data Mining and Statistics: What is the Connection (Jerome Friedman)
Data Mining at the Interface of Computers and Statistics (Padhraic Smith)
Data Mining Techniques and Algorithms (Dunham, 2000)
Data Mining Tools - A Classification (Decision Framework, DF-07-7020, K. Strange, A. Linden, Research Note, 17 March 1999)
Data Mining with Graphical Methods - Dissertation (Doktoringenieur, 2000)
Designing and Mining Multi-Terabyte Astronomy Archives (Szalay et al, 1999)
How Much Information is There in the World (Michael Lesk, 1997):
|
The name of the paper itself speaks about the content. This paper makes various estimates and compares the answers with the estimates of disk and tape sales, and size of all human memory. |
Introduction to Data Mining and Knowledge Discovery - Two Crows Paper
Jerome H. Friedman: A few papers of Statistics and Data Mining. Some of them are
|
"Data Mining and Statistics: What's the Connection?" (1997) | |
|
"Bump hunting in High Dimensional Data" (1997) |
Tips for Successful Data Mining: An Oracle Technical White Paper (June, 1999)
The 1999 Magic Quadrant on Data Mining Workbenches (
Markets, M-08-603, A. Linden, Research Note, 16 August 1999)
Web Document Clustering (Zamir and Etzioni, 1998)
Giga-Mining (Cortes and Pregibon, 1997)
Detecting Group Differences: Mining Contrast Sets (Bay and Pazzani, 2001)
Mining the Network Value of Customers (Richardson and Domingos, 2001)
Some more Data Mining papers from Domingos
Web Mining: Information and Pattern Discovery on the World Wide Web (Cooley et al.,1997)
The Page Rank Citation Ranking: Bringing Order to the Web (Brin et al., 1998): This paper is on page rank algorithm
Growing Trees for Stratified Modeling (Padraic G. Neville)
Decision Trees for Predictive Modeling (Padraic G. Neville, 1999)
Alternative Neural Architectures (Will Dwinnel)
On Bagging and Non-Linear Estimation (Jerome H. Friedman et al, 2000)
Gene Expression Informatics (Bassett et al, 1999)
|
Technologies for whole-genome RNA expression studies are becoming increasingly reliable and accessible. However, universal standards to make the data more suitable for comparative analysis and for inter-operability with other information resources have yet to emerge. Improved access to large electronic data sets, reliable and consistent annotation and effective tools for ‘data mining’ are critical. Analysis methods that exploit large data warehouses of gene expression experiments will be necessary to realize the full potential of this technology. |
A Tutorial on Learning With Bayesian Networks (David Heckerman, 1996)
A Tutorial on Bayesian Model Averaging (Hoeting et al, 1999)
Learning Bayes Nets from Data (David Heckerman)
A Complete Tutorial on Bayesian Statistics (Bernardo)
Clustering Large Datasets in Arbitrary Metric Spaces (Ganti, Ramakrishnan, Gehrke)
Additive Logistic Regression: A Statistical View of Boosting (Friedman et al, 1999)
Neural Network Models for Breast Cancer Prognosis (Ruth M. Ripley, 1998)
Predictive Multivariate Responses in Multiple Linear Regression ( Friedman, Brieman, 1997)- Journal of Royal Statistical Society
Salford Systems - Critical Features of High Performance Decision Trees
The Customer Relationship Management Primer - What You Need To Get Started (From CRMguru.com)
On Bias, Variability, 0/1 - Loss, and the Curse of Dimensionality (Friedman, 1996)
Darwin: A Scalable Integrated System for Data Mining (Tamayo et al, 1997)
Data Dictionary (Dr. Hardin)
|
A work of reference in which the words of a language or of any system or province of knowledge are entered alphabetically and defined. |
Decision Theory in Expert Systems and Artificial Intelligence (Horvitz et al, 1988)
Learning with Mixtures of Trees (Meila et al, 2000)
The Computational Support of Scientific Discovery (Pat Langley)
Oracle Darwin Data mining Software for Credit Card Banking (Oracle, 1999)
Oracle Darwin Data mining Software for Database Marketing (Oracle, 1999)
Oracle Darwin Data mining Software for Government (Oracle, 1999)
Oracle Darwin Data mining Software for Network Data Management (Oracle, 1999)
Oracle Darwin Data mining Software for Telecommunications (Oracle, 1999)
Data Mining in the Insurance Industry- A SAS White Paper - Solving Business Problems using SAS Enterprise Miner Software
Internet Relationship Management and Personalization Powered by Data Mining Insights (Oracle, 1999)
Designing and Mining Multi-Terabyte Astronomy Archives: The Sloan Digital Sky Survey (Alexander S. Szalay, 2000)
Data Mining and Visualization (Ron Kohavi, 2000)
Data Mining to Measure and Improve the Success of Websites (Pohle, 2000)
Applications of Data Mining to Electronic Commerce (Kohavi et al, 2001)
Which Visits Lead to Purchases? Dynamic Conversion Behavior at e-Commerce Sites (Wendy W. Moe, 2001)
Finding the Solution to Data Mining - Exploring the features and components of Enterprise Miner (SAS)
Tutorial on e-Commerce and Clickstream Mining Vendor Sites (Becher et al, 2001) - First SIAM International Conference on data Mining
The EMMIX Software for the Fitting of Mixtures of Normal and t-Components (McLachlan)
|
User Guide to EMMIX - VErsion 1.3, 1999 |
Adaptive Fraud Detection (Foster Provost et al, 1997)
Generalization of Boosting Algorithms and Applications of Bayesian Inference for Massive Datasets (Ridgeway, 1999) - A PHD dissertation from the university of Wisconsin at Department of Statistics.
Decision Support in the Booming e- world (James Goodnight, CEO)
Model-Based Clustering and Visualization of Navigation Patterns on a Web Site (Cadez et al, 2001)
How Much Can Data Analysis Be Automated? (Will Dwinnell)
Hugin Expert - A White Paper
|
This is a the leading company in developing software for artificial intelligence and advanced decision support based on complex statistical models (Bayesian Networks (BN)) |
IBM DB2 Intelligent Miner for Data V6R1, Greater Competitiveness Through Informed Decision Making (IBM, 1999)
IMS Health Prescribes Intelligent Miner for the Health care industry (IBM)
Information Self Organization for Knowledge Discovery (Feng et al)
SAS helps Scientists Decipher Human Genetic Code (Brad Shoemake, 2000) - Info World Reprint
Adaptive Intrusion Detection - A Data Mining Approach (Lee et al, 2000)
A Comparison of Prediction Accuracy, Complexity, and training Time of Thirty Three Old and New Classification Algorithms (Shih et al, 1999)
MCLUST: Software for Model Based Cluster and Discriminate Analysis (Fraley et al, 1999)
Mining New Markets in Telecommunications
An Algorithm for Fitting Mixtures of Gompertz Distributions to Censor Survival Data ( MCLachlan)
Learning with Mixtures of Trees ( Meila, 2000)
Methodological Review: Modeling Medical Prognosis: Survival Analysis Techniques (Machado, 2002)
Bayesian Network Without Tears (Charniak, 1991) - This is a publication of American Association of artificial Intelligence
Statistical Process Analysis of Medical Incidents (Suzuki et al)
A Robust Outlier Detection Scheme for Large Datasets (Tang et al, 2001)
Some Rigorous approaches to Data Mining (Papa Dimitriou)
Usage Pattern Analysis - WhiteCross/NARUS IBI Solutions - a White Paper
Personalization of Super Market Product Recommendations - IBM Research report (Lawrence et al, 2000)
Bump Hunting in High Dimensional Data (Friedman et al, 1998)
Public Workshop on Online Profiling - United States of America Department of Commerce and Federal Trade commission - 1999
Another Approach to Polychotomous Classification (Friedman, 1996)
Direct Policy Search Using Paired Statistical Tests (Strens)
Model Based Clustering, Discriminant Analysis, and Density Estimation (Chris Fraley and Adrian E. Raftery, October 2000)
Greedy Function Approximation: A Gradient Boosting Machine (Jerome H. Friedman, 2000)
Visual Techniques for Exploring Databases (Daniel A. Keim)
Dependency Networks for Inference, Collaborative Filtering, and Data Visualization (David Heckerman et al)
Visualization and Analysis of Click Stream Data of Online Stores for Understanding Web Merchandizing (Lee et al)
Web Document Clustering: A Feasibility Demonstration
Why Mine Data - An Executive Guide - Using Business Intelligence to attract and retain customers ( An Oracle Business White Paper, 1999)
Likelihood-based Data Squashing: A Modeling Approach to Instance Construction (AT & T labs research, 1999)
Classes of Kernels for Machine Learning: A Statistics Perspective
The State of Boosting (Ridgeway)
Stochasic Gradient Boosting (Jerome H. Friedman, 1999)
Polytomous Logistic Regression Trees
Algorithms for Model Based Gaussian Hierarchical Clustering
How Many Clusters? Which Clustering Method? Answers Via Model based Cluster Analysis
Rain Forest - A Framework for Fast Decision Tree Construction of Large Datasets (Venkatesh Ganti et al)
Reducing Customer Churn - Customer Application Story - An Oracle white Paper
Prediction in the Era of Massive Datasets (Ridgeway)
Finding Needles in Haystacks ( Tools for finding structure in large datasets) - Ripley -2000
Mining E-Commerce Data - the good, the bad and the ugly - Kohavi (2000)
Tutorial on E-commerce and Clickstream data mining - First SIAM International Conference on Data mining (2001)
Data mining in the Insurance Industry - A SAS Institute White Paper - Solving Business Solutions using Enterprise Miner Software
Finding the Solution to Data Mining - A SAS Institute White Paper - A map of features of Enterprise Miner software
IBM DB2 Intelligent Miner for data - A Tutorial