3Distributions and Probabilities

3.1 Probability Distributions

We hear the term ‘distribution’ a lot in statistics, so what exactly is it? A distribution, or more specifically, a probability distribution, is a mathematical function that links each value of a random variable with its probability of occurrence in the range of possible values (the so‐called sample space). The probability gives the likelihood of an event or observation to occur and ranges from 0 (impossibility) to 1 (certainty). So, distributions and probabilities are inextricably linked. If we can figure out the underlying distribution of any variable, then we can calculate probabilities of certain events and make predictions for their occurrence. What might be a bit confusing in the beginning is that not only the data that we collect in our experiments follow certain distributions (at least roughly), but also other variables such as test statistics – but we will elaborate on this later.

When talking about distributions, we need to distinguish between continuous and discrete random variables (see Section 1.5.2). The distribution of a continuous random variable is characterised by a probability density curve, which is derived from an equation called the probability density function(PDF) (Figure 3.1). Such a probability density curve indicates the relative probability of occurrence of certain values of the random variable. To be precise, the PDF gives the probability of the random variable to fall in a particular ...

Get R-ticulate now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.