Chapter 7. Measuring Uncertainty with the Bootstrap

With ideal data, you are now able to draw robust conclusions from behavioral data and measure the causal impact of a business/environment change on human behaviors. But how can you proceed if you have suboptimal data? In academic research, one can always fall back to the null hypothesis when faced with inconclusive data and refuse to pass judgment. But in applied research there is no null hypothesis, only alternative courses of action to choose from.

Small sample sizes, weirdly shaped variables, or situations that require advanced analytical tools (e.g., hierarchical modeling, which we’ll see later in the book) can all result in shaky conclusions. Certainly, a linear regression algorithm will spit out a coefficient under all but the most extreme cases, but should you trust it? Can you confidently advise your boss to stake millions of dollars on it?

In this chapter, I’ll introduce you to an extremely powerful and general simulation tool, the Bootstrap, which will allow us to draw robust conclusions from any data, however small or weird. It works by creating and analyzing slightly different versions of your data based on random numbers. A great feature of the Bootstrap is that you literally can never go wrong by applying it: in situations that are best-case scenarios for traditional statistical methods (e.g., running a basic linear regression on a large and well-behaved data set), the Bootstrap is slower and less accurate, but ...

Get Behavioral Data Analysis with R and Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.