A Natural Language Approach to Automated Cryptanalysis of Two-time Pads
Josh Mason
Department of Computer Science
Johns Hopkins University
Abstract
While keystream reuse in stream ciphers and one-time pads has been a well
known problem for several decades, the risk to real systems has been
underappreciated. Previous techniques have relied on being able to
accurately guess words and phrases that appear in one of the plaintext
messages, making it far easier to claim that "an attacker would never be
able to do that." In this paper, we show how an adversary can
automatically recover messages encrypted under the same keystream if only
the type of each message is known (e.g. an HTML page in English).
Our method, which is related to HMMs, recovers the most probable plaintext
of this type by using a statistical language model and a dynamic
programming algorithm. It produces up to 99% accuracy on realistic data
and can process ciphertexts at 200ms per byte on a $2,000 PC. To further
demonstrate the practical effectiveness of the method, we show that our
tool can recover documents encrypted by Microsoft Word 2002.