Handbook of Statistical Analysis and Data Mining Applications

Book description

The Handbook of Statistical Analysis and Data Mining Applications is a comprehensive professional reference book that guides business analysts, scientists, engineers and researchers (both academic and industrial) through all stages of data analysis, model building and implementation. The Handbook helps one discern the technical and business problem, understand the strengths and weaknesses of modern data mining algorithms, and employ the right statistical methods for practical application. Use this book to address massive and complex datasets with novel statistical approaches and be able to objectively evaluate analyses and solutions. It has clear, intuitive explanations of the principles and tools for solving problems using modern analytic techniques, and discusses their application to real problems, in ways accessible and beneficial to practitioners across industries - from science and engineering, to medicine, academia and commerce. This handbook brings together, in a single resource, all the information a beginner will need to understand the tools and issues in data mining to build successful data mining solutions.



  • Written "By Practitioners for Practitioners"
  • Non-technical explanations build understanding without jargon and equations
  • Tutorials in numerous fields of study provide step-by-step instruction on how to use supplied tools to build models
  • Practical advice from successful real-world implementations
  • Includes extensive case studies, examples, MS PowerPoint slides and datasets
  • CD-DVD with valuable fully-working  90-day software included:  "Complete Data Miner - QC-Miner - Text Miner" bound with book

Table of contents

  1. Cover image
  2. Title page
  3. Table of Contents
  4. Copyright
  5. Foreword 1
  6. Foreword 2
  7. Preface
    1. OVERALL ORGANIZATION OF THIS BOOK
    2. References
    3. SAS
    4. STATSOFT
    5. SPSS
  8. Introduction
    1. Patterns of Action
    2. Human Intuition
    3. Putting it all Together
    4. References
  9. List of Tutorials by Guest Authors
  10. Part I. History of Phases of Data Analysis, Basic Theory, and the Data Mining Process
    1. Chapter 1. The Background for Data Mining Practice
      1. Preamble
      2. A Short History of Statistics and Data Mining
      3. Modern Statistics: A Duality?
      4. Two Views of Reality
      5. The Rise of Modern Statistical Analysis: The Second Generation
      6. Machine Learning Methods: The Third Generation
      7. Statistical Learning Theory: The Fourth Generation
      8. Postscript
      9. References
    2. Chapter 2. Theoretical Considerations for Data Mining
      1. Preamble
      2. The Scientific Method
      3. What Is Data Mining?
      4. A Theoretical Framework for the Data Mining Process
      5. Strengths of the Data Mining Process
      6. Customer-Centric Versus Account-Centric: A New Way to Look at Your Data
      7. The Data Paradigm Shift
      8. Creation of the CAR
      9. Major Activities of Data Mining
      10. Major Challenges of Data Mining
      11. Examples of Data Mining Applications
      12. Major Issues in Data Mining
      13. General Requirements for Success in a Data Mining Project
      14. Example of a Data Mining Project: Classify a Bat’s Species by Its Sound
      15. The Importance of Domain Knowledge
      16. Postscript
      17. References
    3. Chapter 3. The Data Mining Process
      1. Preamble
      2. The Science of Data Mining
      3. The Approach to Understanding and Problem Solving
      4. Business Understanding (Mostly Art)
      5. Data Understanding (Mostly Science)
      6. Data Preparation (A Mixture of Art and Science)
      7. Modeling (A Mixture of Art and Science)
      8. Deployment (Mostly Art)
      9. Closing the Information Loop* (Art)
      10. The Art of Data Mining
      11. Postscript
      12. References
    4. Chapter 4. Data Understanding and Preparation
      1. Preamble
      2. Activities of Data Understanding and Preparation
      3. Issues That Should be Resolved
      4. Data Understanding
      5. Postscript
      6. References
    5. Chapter 5. Feature Selection
      1. Preamble
      2. Variables as Features
      3. Types of Feature Selections
      4. Feature Ranking Methods
      5. SUBSET SELECTION METHODS
      6. Postscript
      7. References
    6. Chapter 6. Accessory Tools for Doing Data Mining
      1. Preamble
      2. Data Access Tools
      3. Data Exploration Tools
      4. Modeling Management Tools
      5. Modeling Analysis Tools
      6. In-Place Data Processing (IDP)
      7. Rapid Deployment of Predictive Models
      8. Model Monitors
      9. Postscript
      10. Bibliography
  11. Part II. The Algorithms in Data Mining and Text Mining, the Organization of the Three Most Common Data Mining Tools, and Selected Specialized Areas Using Data Mining
    1. Chapter 7. Basic Algorithms for Data Mining: A Brief Overview
      1. Preamble
      2. Basic Data Mining Algorithms
      3. Generalized Additive Models (GAMs)
      4. Classification and Regression Trees (CART)
      5. General Chaid Models
      6. Generalized EM and k-Means Cluster Analysis—An Overview
      7. Postscript
      8. References
      9. Bibliography
    2. Chapter 8. Advanced Algorithms for Data Mining
      1. Preamble
      2. Advanced Data Mining Algorithms
      3. Image and Object Data Mining: Visualization and 3D-Medical and Other Scanning Imaging
      4. Postscript
      5. References
    3. Chapter 9. Text Mining and Natural Language Processing
      1. Preamble
      2. The Development of Text Mining
      3. A Practical Example: NTSB
      4. Text Mining Concepts Used in Conducting Text Mining Studies
      5. Postscript
      6. References
    4. Chapter 10. The Three Most Common Data Mining Software Tools
      1. Preamble
      2. SPSS Clementine Overview
      3. SAS-Enterprise Miner (SAS-EM) Overview
      4. STATISTICA Data Miner, QC-Miner, and Text Miner Overview
      5. Postscript
      6. References
    5. Chapter 11. Classification
      1. Preamble
      2. What is Classification?
      3. Initial Operations in Classification
      4. Major Issues with Classification
      5. Assumptions of Classification Procedures
      6. Methods for Classification
      7. What is the Best Algorithm for Classification?
      8. Postscript
      9. References
    6. Chapter 12. Numerical Prediction
      1. Preamble
      2. Linear Response Analysis and the Assumptions of the Parametric Model
      3. Parametric Statistical Analysis
      4. Assumptions of the Parametric Model
      5. Linear Regression
      6. Generalized Linear Models (GLMs)
      7. Methods for Analyzing Nonlinear Relationships
      8. Nonlinear Regression and Estimation
      9. Data Mining and Machine Learning Algorithms Used in Numerical Prediction
      10. Advantages of Classification and Regression Trees (C&RT) Methods
      11. Application to Mixed Models
      12. Neural Nets for Prediction
      13. Support Vector Machines (SVMs) and Other Kernel Learning Algorithms
      14. Postscript
      15. References
    7. Chapter 13. Model Evaluation and Enhancement
      1. Preamble
      2. Introduction
      3. Model Evaluation
      4. Re-Cap of the Most Popular Algorithms
      5. Enhancement Action Checklist
      6. Ensembles of Models: The Single Greatest Enhancement Technique
      7. How to Thrive as a Data Miner
      8. Postscript
      9. References
    8. Chapter 14. Medical Informatics
      1. Preamble
      2. What Is Medical Informatics?
      3. How Data Mining and Text Mining Relate to Medical Informatics
      4. 3D Medical Informatics
      5. Postscript
      6. References
      7. Bibliography
    9. Chapter 15. Bioinformatics
      1. Preamble
      2. What Is Bioinformatics?
      3. Data Analysis Methods in Bioinformatics
      4. Web Services in Bioinformatics
      5. How Do We Apply Data Mining Methods to Bioinformatics?
      6. Postscript
      7. References
      8. Bibliography
    10. Chapter 16. Customer Response Modeling
      1. Preamble
      2. Early CRM Issues in Business
      3. Knowing How Customers Behaved Before They Acted
      4. CRM in Business Ecosystems
      5. Conclusions
      6. Postscript
      7. References
    11. Chapter 17. Fraud Detection
      1. Preamble
      2. Issues with Fraud Detection
      3. How Do You Detect Fraud?
      4. Supervised Classification of Fraud
      5. How Do You Model Fraud?
      6. How Are Fraud Detection Systems Built?
      7. Intrusion Detection Modeling
      8. Comparison of Models with and Without Time-Based Features
      9. Building Profiles
      10. Deployment of Fraud Profiles
      11. Postscript and Prolegomenon
      12. References
  12. Part III. Tutorials—Step-by-step Case Studies as a Starting Point to Learn How to Do Data Mining Analyses
    1. Guest Authors of the Tutorials
    2. Tutorial A. How to Use Data Miner Recipe: STATISTICA Data Miner Only
      1. What Is STATISTICA Data Miner Recipe (DMR)?
      2. Core Analytic Ingredients
    3. Tutorial B. Data Mining for Aviation Safety: Using Data Mining Recipe “Automatized Data Mining” from STATISTICA
      1. Airline Safety
      2. SDR Database
      3. Preparing the Data for Our Tutorial
      4. Data Mining Approach
      5. Data Mining Algorithm Error Rate
      6. Conclusion
      7. References
    4. Tutorial C. Predicting Movie Box-Office Receipts: Using SPSS Clementine Data Mining Software
      1. Introduction
      2. Data and Variable Definitions
      3. Getting to Know the Workspace of the Clementine Data Mining Toolkit
      4. Results
      5. Publishing and Reuse of Models and Other Outputs
      6. References
    5. Tutorial D. Detecting Unsatisfied Customers: A Case Study Using SAS Enterprise Miner Version 5.3 for the Analysis
      1. Introduction
      2. A Primer of SAS-EM Predictive Modeling
      3. Scoring Process and the Total Profit
      4. Oversampling and Rare Event Detection
      5. Decision Matrix and the Profit Charts
      6. Micro-Target the Profitable Customers
      7. Appendix
      8. Reference
    6. Tutorial E. Credit Scoring Using STATISTICA Data Miner
      1. Introduction: What Is Credit Scoring?
      2. Credit Scoring: Business Objectives
      3. Case Study: Consumer Credit Scoring
      4. Analysis and Results
      5. Comparative Assessment of the Models (Evaluation)
      6. Deploying the Model for Prediction
      7. Conclusion
    7. Tutorial F. Churn Analysis with SPSS-Clementine
      1. Objectives
      2. Steps
    8. Tutorial G. Text Mining: Automobile Brand Review Using STATISTICA Data Miner and Text Miner
      1. Introduction
      2. Text Mining
      3. Car Review Example
      4. Interactive Trees (C&RT, CHAID)
      5. Other Applications of Text Mining
      6. Conclusion
    9. Tutorial H. Predictive Process Control: QC-Data Mining Using STATISTICA Data Miner and QC-Miner
      1. Predictive Process Control Using STATISTICA and STATISTICA QC-Miner
      2. Case Study: Predictive Process Control
      3. Data Analyses with STATISTICA
      4. Conclusion
    10. Tutorials I, J, and K. Three Short Tutorials Showing the Use of Data Mining and Particularly C&RT to Predict and Display Possible Structural Relationships among Data
    11. Tutorial I. Business Administration in a Medical Industry: Determining Possible Predictors for Days with Hospice Service for Patients with Dementia
    12. Tutorial J. Clinical Psychology: Making Decisions about Best Therapy for a Client: Using Data Mining to Explore the Structure of a Depression Instrument
    13. Tutorial K. Education–Leadership Training for Business and Education Using C&RT to Predict and Display Possible Structural Relationships
      1. References
    14. Tutorial L. Dentistry: Facial Pain Study Based on 84 Predictor Variables (Both Categorical and Continuous)
    15. Tutorial M. Profit Analysis of the German Credit Data Using SAS-EM Version 5.3
      1. Introduction
      2. Modeling Strategy
      3. SAS-EM 5.3 Interface
      4. A Primer of SAS-EM Predictive Modeling
      5. Advanced Techniques of Predictive Modeling
      6. Micro-Target the Profitable Customers
      7. Appendix
      8. References
    16. Tutorial N. Predicting Self-Reported Health Status Using Artificial Neural Networks
      1. Background
      2. Data
      3. References
  13. Part IV. Measuring Truecomplexity, the “Right Model for the Right Use,” Top Mistakes, and the Future of Analytics
    1. Chapter 18. Model Complexity (and How Ensembles Help)
      1. Preamble
      2. Model Ensembles
      3. Complexity
      4. Generalized Degrees of Freedom
      5. Examples: Decision Tree Surface with Noise
      6. Summary and Discussion
      7. Postscript
      8. References
    2. Chapter 19. The Right Model for the Right Purpose: When Less Is Good Enough
      1. Preamble
      2. More Is Not Necessarily Better: Lessons from Nature and Engineering
      3. Embrace Change Rather Than Flee from It
      4. Decision Making Breeds True in the Business Organism
      5. The 80:20 Rule in Action
      6. Agile Modeling: An Example of How to Craft Sufficient Solutions
      7. Postscript
      8. References
    3. Chapter 20. Top 10 Data Mining Mistakes
      1. Preamble
      2. Introduction
      3. 0 Lack Data
      4. 1 Focus on Training
      5. 2 Rely on One Technique
      6. 3 Ask the Wrong Question
      7. 4 Listen (Only) to the Data
      8. 5 Accept Leaks from the Future
      9. 6 Discount Pesky Cases
      10. 7 Extrapolate
      11. 8 Answer Every Inquiry
      12. 9 Sample Casually
      13. 10 Believe the Best Model
      14. How Shall We Then Succeed?
      15. Postscript
      16. References
    4. Chapter 21. Prospects for the Future of Data Mining and Text Mining as Part of Our Everyday Lives
      1. Preamble
      2. RFID
      3. Social Networking and Data Mining
      4. Image and Object Data Mining
      5. Cloud Computing
      6. Postscript
      7. References
    5. Chapter 22. Summary: Our Design
      1. Preamble
      2. Beware of Overtrained Models
      3. A Diversity of Models and Techniques Is Best
      4. The Process Is More Important Than the Tool
      5. Text Mining of Unstructured Data Is Becoming Very Important
      6. Practice Thinking about Your Organization as Organism Rather Than as Machine
      7. Good Solutions Evolve Rather Than Just Appear after Initial Efforts
      8. What You Don’t Do Is Just as Important as What You Do
      9. Very Intuitive Graphical Interfaces Are Replacing Procedural Programming
      10. Data Mining Is No Longer a Boutique Operation; It Is Firmly Established in the Mainstream of Our Society
      11. “Smart” Systems Are the Direction in Which Data Mining Technology Is Going
      12. Postscript
      13. References
  14. Glossary
  15. Index

Product information

  • Title: Handbook of Statistical Analysis and Data Mining Applications
  • Author(s): Robert Nisbet, John Elder, Gary Miner
  • Release date: May 2009
  • Publisher(s): Elsevier Science
  • ISBN: 9780080912035