One of my favorite books as a kid was Alvin’s Secret Code, a short mystery story I found in my grandma’s bookcase. The story was cool, but even better it had an awesome appendix full of different secret codes and stuff that proved invaluable for my childhood top secret Black Manta club. But what red blooded adolescent male *doesn’t* love secret codes? Of course my fascination with secret codes didn’t end with my innocence, I’ve always thought codes were pretty cool.
(and yes, super special prize to the first person to figure out the secret code above!)
There are all kinds of secret codes out there. Most Cryptography books will start with one of the most basic, the “Caesar Cipher”. So called because it was supposedly the code Caesar used to communicate with his troops, you simply take the letters of your message and shift them all down the alphabet a set number of places (in the Caesar’s case, it was 3 places). So, “hello” would become “khoor” and “zoo” would become “crr” using the Caeser Cipher with the number 3 (when you reach the end of the alphabet you start over back at “A” for purposes of this code).
This is actually modular arithmetic. This will be used later in this post when we discuss the method behind public key cryptography, but I’ll go ahead and go over it now. You can think of it as addition on a “clock”. Say you get to work at 9:00 and work for 8 hours. Adding it up on a clock you know you will get off at 5:00. What you are really doing is adding up all your numbers, but every time you get to 12 you technically start back over at 0.The above calculation is expressed as 9+8 (mod 12). What is going on mathematically is you basically do your addition first, and then divide by whatever number is after the “mod”. The *remainder* after you divide is your actual answer. So 9+8=17 17/12=1 5/12 so the remainder is 5, therefore the answer is 5. And since we were dealing with time it would be 5:00.This makes it easy to solve more complex problems like “If it is 7:00 what time will it be in 368 hours?” You would write this as: 7+368 (mod 12). 375/12=31 3/375 so our remainder and answer is 3, or 3 o’clock.In the case of the Caesar Cipher, we are actually converting the alphabet to numbers 1-26 first, then taking whatever letter we choose (let’s say Y for instance) and doing the following: Y = 25+3 (mod 26), so Y=2 or Y=B.
Now, there are two basic problems with the Caesar Cipher. First, there are only 26 possible ways to shift your letters, so all a code breaker would have to do is try all 26 ways to shift the letters until they figured out the code. Second, even without doing this, our language creates patterns, and it is fairly easy to analyze any message (the longer the better) and quickly sort out these patterns from the seemingly random letters if the code is a simple “letter substitution” code like the Caesar Cipher.
These two problems can be easily surmounted. For the first issue, if there are only 26 possible ways to encode your message with the Caesar Cipher, it is simply a matter of making a more complex code. There are all sorts of ways to encode a message, and as long as the mathematical processes you use to change your message into a code (the “encryption algorithm”) are sophisticated enough the code will be basically unbreakable. In our example above, instead of having “A” correspond to “D”, have it correspond to some random number (chosen by the code maker). Even this simple change makes the code wildly more difficult to solve than the Caesar Cipher.
Yet this new code is still susceptible to the second (and more serious) basic problem with the Caesar Cipher, that while the message looks random, it is still undermined by the basic structure of the English language and can be easily solved by even a simple computer program looking for repeated patterns. Once we have decided upon a sufficiently sophisticated encryption algorithm, we need some way to obfuscate the structure of our coded message from would be code breakers. There are many ways to do this, a simple one could be as follows:
First let’s use the simple random number letter substitution code. We could say H=165, E=837, L=3, O=92 then HELLO = 1658373392. Again, if we leave a full coded message in this form it would be very easy to see the repeated patterns and realize that “837” comes up more than any other sequence of numbers and might be “E” (since it is the most common letter of the alphabet) and go from there. On the other hand, we could apply some arbitrarily complex mathematical process to every “x” digits to produce a new long series of digits that would add a degree of randomness to your original coded message.
This probably doesn’t need an italicized section, but if you were to use this technique, here is how it would work:Let’s say we decide we will repeatedly square and then subtract one from every five digits of our message.First, we’ll need the end result of doing this to every block to have the same number of digits and since a 5 digit number squared will yield a 9-10 digit number we’ll add a zero to the beginning of any nine digit number we end up with so every 5 digit block will end up as a 10 digit block.So first we break our original coding of HELLO = 1658373392 into two 5 digit numbers: 16583 and 73392.Perform the following calculations:165832² -1 and 733922² -1 to get 0274995888 and 5866385663 for our final encryption of HELLO = 027499588855866385663.Now it is a simple matter for the recipient of the coded message to break 02749958885866385663 into the ten digit segments 0274995888 and 5866385663, add one to each and then take the square root to get the original two five digit segments 16583 and 73392 which we could combine to get 1658373392 and then apply our code key that tells what letter each grouping of numbers represents to find our message “HELLO”.
The simple techniques I have outlined above are by no means unbreakable, but there are codes that operate on similar principles and with more sophisticated math that are quite effective.
The big problem with codes like this is that the code sender and code receiver both need to have full knowledge of how the code works in order to send and receive the code. In World War 2, any German wanting to use their “unbreakable” Enigma code would have to make sure each side of the coded message had the complex Enigma machine that performed the encryption algorithms.* Still, while a code like this might be fine for WW2 commanders and members of the top secret Black Manta club, this simply won’t work for many other applications.
Just one example of this would be the need for data security at a bank. Banks need a way for people they have never met in person to send them encoded information that only the bank can decode. The main problem here is that if someone knows how to encode a message, they can just as easily decode the message too (just do the opposite operations…in the case of the Caesar code, if he added three to each letter to encode the message, simply subtract three from each letter of the coded message to decode the message). We need a way to send someone an encoded message where knowing how to encode it is not enough to figure out to decode it. This seemingly impossible dilemma can actually be solved with the power of prime numbers (three pages in and we are finally to the point of the post!)
First a bit about prime numbers (i.e., those numbers that can only be divided by one and themselves). Prime numbers are kind of like the atoms of math. According to the “FUNdamental theorem of arithmetic” (which, if the theorem is “fundamental”, it must be important) every natural number greater than 1 can be represented as unique combination of prime numbers multiplied by each other. 210? 2*3*5*7. 32? 2*2*2*2*2. No two numbers share the same “prime factorization”, and each number only has one unique prime factorization (this is why 1 isn’t a prime number).
First of all, as I stated, for the secret codes we have talked about so far, if you know how to encode it, you just do the reverse mathematical process to decode it. Multiply by 3? Divide by three to decode it, easy as that. So what we need is something where the reverse operation is impossible to do without some additional information. This is where the prime numbers come into play. It is almost impossible to determine how to factor numbers when they get very large, but relatively easy to find very large prime numbers.
One interesting fact I found was that (for example) about one in every 460 200 digit numbers is a prime number. There are also randomized methods that check for primality that will at least work most of the time, so it is not too hard to check a thousand or so two hundred digit numbers until you found two primes which you could then multiply together to get a 400 or so digit composite number that no one would be able to factor in our lifetime even using the fastest computer in the world.
Using these two bits of information, first I find two very large primes. Then I multiply them together to get a new huge number. Now I would find a way for people to encode messages to me by using this product of two very large primes (and only I would know what the two very large primes are). To decode the message I would need to know the prime factors of the very large number that was used to encode the message (which luckily I do since those two huge primes were used to get the huge number that the message was encoded with in the first place). This is not exactly how it works, but the concept is there.
Ok, so the way it actually works, as far as I can figure out is as follows. First let’s get some variables:
M = our message.
C = our encoded message.
p, q = two very large prime numbers chosen by the person who will be receiving the coded messages.
n = our large number that is a product of p and q.
e, d = two inverses mod (p-1)(q-1). Basically, e and d are two numbers derived from p and q. e will be publicly known (along with n), and d will be known only to the person who will be receiving the coded messages.
To encrypt a message, we will take M^e (mod n).To decrypt a message, we will take C^d (mod n) (again, d will be secret, so only someone who knows d, which can only be found out if you know p and q, which if you pick a large enough n will be impossible to find out since n will be too large to factor).
Looking at the Encryption algorithm we can see that this code is basically modular exponentiation as opposed to the modular arithmetic of the Caesar Cipher.
So, let’s see how this works:
As the person who wants to get the coded messages, I will choose my “very large” primes as 3 and 7 (this is my p and q)(of course I’m using very small primes so I can do the math fairly easily). Therefore my n will be 21.
Now I need to find e and d which are inverses in modular relation to p and q. Inverses in modular arithmetic are any two numbers that when multiplied by each other leave a remainder of 1 (mod x). In this case, e*d ≡ 1 (mod (p-1)(q-1)) or in this case, e*d ≡ 1 (mod (3-1)(7-1)) so, e*d ≡ 1 (mod 12). 5 and 17 will work here for e and d since 5*17 = 85 which is congruent to 1(mod 12).
So, I have p, q, n, e, d…now I am ready to set up my public key cryptography system. I make publicly known n=21 and e=5 and my encryption algorithm is M^e (mod n). Now say someone wants to send me the message “4” (again, keeping it simple so we can see how this works).
They take 4^5 (mod 21) which comes out to 16. As the message recipient I get the coded message (C) as “16”. For me to decode the message, I take C^d (mod n), or in this case, 16^17 (mod 21) which gives me the original message, 4! **
Bottom line, to encrypt something with this system you use the product of two predetermined prime numbers. But to decrypt it you need to factor the product you used to encrypt it (since you are doing the reverse operation) thus making decryption basically impossible if you don’t have the prime factors already. So anyone can encode the message using that very large product of two primes (which is publicly made available so people can encode messages to the recipient). But only someone with the two primes that were used to get that product (the intended recipient) can decode the message.
This type of public key cryptography (so called because the encoding key is publicly made available for anyone to send coded messages to its owner) is called RSA encryption and is a very elegant use of the properties of prime numbers and the great difficulty that exists in factoring very large numbers.
To show just how hard it is to factor a very large number (if we use 200 digit prime numbers, like most RSA encryption uses today, our product will be almost 400 digits long!) think of it this way: the number of primes less than x is approximately x/ln x. That means that even for a 20 digit number there are a billion billion primes that could be possible factors…for a 400 digit number there simply isn’t a computer program fast enough or a factorization algorithm sophisticated enough to factor it anywhere near our lifetime.
The final thing to remember about RSA and Public Key Cryptography is that its brilliance is not that it is the best code system out there. There are other systems that might be just as secure and more efficient to use computing power wise (DES for example). The real brilliance of RSA is the ability to have a public encryption key that anyone in the world can use to send secure messages to its owner. Public Key Cryptography is the reason you can send your credit card number and personal information to www.realdoll.com in a code that only they can read. Who says math can’t be useful!
* The Enigma code was actually broken by allied mathematicians during WW2 eventually allowing real time interception of coded messages at the highest level. A very cute Nazi story was how the Gestapo’s Enigma transmissions were essentially unbreakable since they didn’t make any mistakes in their encoding of the messages, and the “user error” of the encoding process from other groups was basically what led the allies to break the “supercode”. I’m guessing the Italians were to blame.
** This works (and this part is pushing the limits of my understanding thus it is down here in the footnotes) because we needed to find an e and d such that (M^e)^d ≡ M (mod n) (in other words, take our message to the e power and then to the d power and that should be congruent to the original message mod n). The tricky part is how this works with p and q in the mix. It depends on the Chinese Remainder Theorem and Fermat’s Little Theorem neither of which I understand well enough to pretend I can get it right, so I’ll leave it here for now…maybe I’ll keep working on it another time!