5.4 COMPARATIVE STATISTICS

5.4.1 Overview

Correlation analysis looks at associations between variables. For example, is there a relationship between interest rates and inflation or education level and income? The existence of an association between variables does not imply that one variable causes another. Yet, understanding these relationships is useful for a number of reasons. For example, when building a predictive model, comparative statistics can help identify important variables to use.

images

Figure 5.18. Looking up critical F-statistic

images

Figure 5.19. Relationships between two variables

The relationship between variables can be complex; however, a number of characteristics of the relationship can be measured:

  • Direction: In comparing two variables, a positive relationship results when higher values in the first variable coincide with higher values in the second variable. In addition, lower values in the first variable coincide with lower values in the second variable. Negative relationships result when higher values in the first variable coincide with lower values in the second variable as well as lower values in the first variable coincide with higher values in the second variable. There are also situations where the relationship between the variables is more complex, having a combination ...

Get Making Sense of Data: A Practical Guide to Exploratory Data Analysis and Data Mining now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.