10Estimating Distribution Functions

The harder you fight to hold on to specific assumptions, the more likely there's gold in letting go of them.

John Seely Brown (1997), former Chief Scientist at Xerox Corporation

10.1 Introduction

Let upper X 1 comma upper X 2 comma ellipsis comma upper X Subscript n Baseline be a sample from a population with continuous cumulative distribution function (CDF) upper F period In Chapter 3, we defined the empirical (cumulative) distribution function (EDF) based on a random sample as

upper F Subscript n Baseline left-parenthesis x right-parenthesis equals StartFraction 1 Over n EndFraction sigma-summation Underscript i equals 1 Overscript n Endscripts bold 1 left-parenthesis upper X Subscript i Baseline less-than-or-equal-to x right-parenthesis period

Because upper F Subscript n Baseline left-parenthesis x right-parenthesis, for a fixed x, has a sampling distribution directly related to the binomial distribution, its properties are readily apparent, and it is easy to work with as an estimating function.

The EDF provides a sound estimator for the CDF, but not through any methodology that can be extended to general estimation problems in nonparametric statistics. For example, what if the sample is right truncated? Or censored? What if the sample observations are not independent or identically distributed? In standard statistical analysis, the method of maximum likelihood ...

Get Nonparametric Statistics with Applications to Science and Engineering with R, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.