3.4 THE χ2-TEST OF A HYPOTHESIS

Suppose a large number n of independent trials of a chance experiment are performed. A trial has r possible outcomes O0, O1, …, Or−1 that occur with probabilities q(0), q(1), …, q(r − 1). The number of times the outcome Oi occurs, Ni, is recorded.

How likely is it that the observed outcome-counts {Ni} are consistent with the hypothesis : q(i) is the probability of occurrence of Oi, (0 ≤ i < r). In the context of cribbing

  • The experiment is the generation of plaintext by an iid language model with 1-gram probabilities π followed by monoalphabetic substitution θ;
  • The r outcomes correspond to the occurrence of the letters of a ciphertext r-gram u;
  • u = (u0, u1, …, ur − 1) is a ciphertext isomorph of the plaintext crib v = (v0, v1, …, vr − 1); and
  • The probabilities q(i) = π(vi) are those that would be true if the ciphertext u was the encipherment of the plaintext crib v – that is, if θ : vu.

If the hypothesis is true, then for each possible outcome Oi, the law of large numbers asserts

image

The χ2-statistic is the quantity defined by

image

The ith term in the sum above is the product of two factors. The first,

image

increases without bound with n, and the second has ...

Get Computer Security and Cryptography now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.