2.7 DERIVING THE PARAMETERS OF A MARKOV MODEL FROM SLIDING WINDOW COUNTS

The Markov model parameters are defined from the sliding window counts of 2-grams {N(i, j)} derived from a large sample x = (x0, x1, …, xn−1) of text as follows:

image

image

image

We assume the sample size n is large enough so that image for 0 ≤ i < m and that π satisfies

image

To prove Equation (2.16), we start with Equations (2.13) to (2.15), writing

image

This book provides three sets of Markov source parameters:

  • Smarkov1 and Smarkov2: These Markov source parameters were derived from a nonsliding window count of 67,320 2-grams in the alphabet {A, B, …, Z} appearing in Abraham Sinkov's book [Sinkov, 1968]. P(j/i) was derived using Equation (2.15) from Sinkov's 2-gram counts and written to Smarkov2; thereafter, π(i) was calculated to satisfy Equation (2.3) and written to Smarkov1.
  • Gmarkov1 and Gmarkov2: These Markov source parameters were derived ...

Get Computer Security and Cryptography now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.