You are given a ciphertext encrypted with a Vigenère cipher using an English-language key of length 6. Build the full cryptanalytic pipeline in Python: (1) estimate the key length using the index of coincidence by trying candidate lengths 1..20 and identifying the length whose split-stream IC most closely matches English (~0.067), (2) for each of the 6 streams use chi-squared frequency analysis to recover the Caesar shift that aligns the stream with English letter frequencies, (3) reconstruct the key from those shifts and decrypt the full plaintext. Your script must work without human intervention on any sufficiently long ciphertext (>= 500 characters) of the same form.
IC = sum(n_i * (n_i - 1)) / (N * (N - 1)) where n_i is the count of letter i and N is the total. For English it is ~0.067; for uniformly random letters it is ~0.0385 (= 1/26).L split the ciphertext into L streams (positions i, i+L, i+2L, ... for each i in 0..L-1), compute the IC of each stream, and average. The true L produces an average IC near 0.067; wrong lengths produce ICs near 0.0385.chi_squared function from the frequency-analysis task. The shift s that minimises chi-squared on the stream is the s-th key letter.i of a ciphertext of length N with key length L has roughly N // L characters. Strip non-alphabetic characters and uppercase before any analysis.$ python vigenere_break.py
Trying key lengths 1..20:
L= 1 avg IC = 0.0421
L= 2 avg IC = 0.0428
L= 3 avg IC = 0.0419
L= 4 avg IC = 0.0431
L= 5 avg IC = 0.0425
L= 6 avg IC = 0.0668 <-- best
L= 7 avg IC = 0.0419
...
Best key length: 6
Recovering key by chi-squared on each stream:
stream 0: shift 2 -> 'C'
stream 1: shift 17 -> 'R'
stream 2: shift 24 -> 'Y'
stream 3: shift 15 -> 'P'
stream 4: shift 19 -> 'T'
stream 5: shift 14 -> 'O'
Recovered key: CRYPTO
Decrypted plaintext (first 200 chars):
THEINDEXOFCOINCIDENCEISASTATISTICALMEASUREDEVELOPED
BYWILLIAMFRIEDMANINTHE1920SFORATTACKINGPOLYALPHABET
ICCIPHERSITEXPLOITSTHEFACTTHATNATURALLANGUAGEHASA...