3.4 THE χ2-TEST OF A HYPOTHESIS
Suppose a large number n of independent trials of a chance experiment ∈ are performed. A trial has r possible outcomes O0, O1, …, Or−1 that occur with probabilities q(0), q(1), …, q(r − 1). The number of times the outcome Oi occurs, Ni, is recorded.
How likely is it that the observed outcome-counts {Ni} are consistent with the hypothesis : q(i) is the probability of occurrence of Oi, (0 ≤ i < r). In the context of cribbing
- The experiment ∈ is the generation of plaintext by an iid language model with 1-gram probabilities π followed by monoalphabetic substitution θ;
- The r outcomes correspond to the occurrence of the letters of a ciphertext r-gram u;
- u = (u0, u1, …, ur − 1) is a ciphertext isomorph of the plaintext crib v = (v0, v1, …, vr − 1); and
- The probabilities q(i) = π(vi) are those that would be true if the ciphertext u was the encipherment of the plaintext crib v – that is, if θ : v → u.
If the hypothesis is true, then for each possible outcome Oi, the law of large numbers asserts
The χ2-statistic is the quantity defined by
The ith term in the sum above is the product of two factors. The first,
increases without bound with n, and the second has ...
Get Computer Security and Cryptography now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.