Computer
Networking
and
Practical
Security

Fall 2006
course
navigation

Cryptography: Tools and Basic Theory

This is part one of the cryptography section. Crypto has become an essential part of day to day security in all parts of our life, from commerce to national security. The advent of the microprocessor has both facilitated and necessitated the rate at which the field now progresses, and there's no guarantee that "secure" today will be secure five or ten years from now. This section provides an introduction to common tools and technology for modern computer encryption, as well as an overview of the three major encryption algorithm classes and their uses.

Contents

Crypto Algorithm Subcategories

Cryptographic algorithms in general are designed to provide secrecy and/or authentication. Some are tailored specifically for one of these purposes, while others can be used for either. Security protocols generally use a combination of more than one category to better complement each other.

Symmetric Key Cipher

In a symmetric key cipher, both the sender and the receiver have access to the same secret key. In classical cryptography, an example would be the substitution cipher: A becomes C, B becomes K, C becomes X, etc. In this case, a table of the substitutions would be the key, and obviously only those with the key would be able to decrypt messages. Today most symmetric key ciphers are substitution-permutation ciphers, sometimes called block ciphers. A block cipher breaks a message into set-length blocks and applies a series of substitutions, permutations, and bitwise XOR's to jumble the data. Unlike classical ciphers, these are usually done strictly as bitwise operations, rather than character-wise operations. The resulting ciphertext should be such that if the communication is intercepted, no information could be discovered from the encrypted message. For example, if all messages with the fifth bit set encrypted to a message with the last bit set, an interceptor could gather some information ("hmm... the first letter can't be an A"). The algorithm designers often spend many years constructing the permutation and substitution tables to combat this.
The current most common symmetric cipher is the AES -- Advanced Encryption Standard -- a block cipher which replaced DES, the Data Encryption Standard, in 2000 after several years of deliberation [1]. This class of algorithm is used for secrecy. Since it relies on sharing keys, it's not possible to guarantee who the key originated from, making it useless for authentication.

Asymmetric Key Cipher

Asymmetric key ciphers, or public-private key ciphers, rely on two keys for encryption and decryption: one "private" key known only to the owner, and another "public" known to everybody (including potential eavesdroppers). Whereas symmetric key ciphers are encrypted and decrypted with the same key, asymmetric ciphers encrypt with one and decrypt with the other. Often the algorithm is such that both encryption/decryption works in either order, i.e. private key then public or vice versa. These algorithms can be used for both secrecy and authentication (in fact they can be used for both at the same time). If Alice wants to send a message to Bob that only Bob can read, she encrypts her message with Bob's public key. Then nobody but Bob, not even Alice herself, can decrypt the message. If instead she wants to prove her identity, she can encrypt her message with her own private key. Then anyone can decrypt it with her public key, which only (hopefully) corresponds to her private key.
Of course if Alice just encrypted a message that said "hi, I'm Alice!" and sent it along, anyone could intercept and copy it, and use it to impersonate Alice. This is where using both secrecy and authentication comes into play. It should be clear that Alice's very act of encrypting with her private key makes the need for any "I'm Alice" message irrelevant. All she needs to do is encrypt her original message with Bob's public key, then encrypt the ciphertext with her private key. The result is a message that can only have been created by Alice, and can only be decrypted by Bob.
The most popular asymmetric key cipher today is the RSA algorithm, which has become almost synonymous with cryptography on the web. It will be discussed in detail in the CryptoMath section.

Hash Functions

Oftentimes we don't want authentication for *people* as much as we want it for *files*. Popular and sizable files, such as web browsers, games, or even operating systems, are often hosted on more than one site: a main site operated by the designer or owner of the file, and a collection of "mirror" sites who have agreed/offered to take some of the load. Now, generally if we trust a designer enough to install their product on our machine we trust them enough to download it from them as well. But what about the mirror sites? Are their security measures as thorough as the developer's (i.e. can we be sure their server hasn't been hacked and modified)? Are their intentions really honest, or are they mirroring the file with the sole purpose of convincing you to download their virus or rootkit? In most cases, the answer to these questions is a hearty "I dunno". With the previous two algorithm categories, we were trying to communicate with a trusted party (and just wanted to make sure we were actually talking to them). Now we're talking with strangers, and we don't know who wrapped that lollipop.
What we need is proof that the file we download on the mirror is the same as the one we download from the main site. It would be silly to just download it again from the main site and compare (especially if we're downloading a 1GB operating system!), so what we want is a small "fingerprint" to compare it with. A hash function is a mathematical function that generates a fixed-length output "hash" string based on an input string (such as a file), which can be of varying length. If a single character or bit in the input string is modified, the generated hash string will be drastically modified, so that it is hard (ideally impossible) to find a relationship between two hash strings. The function is also not reversible --given a hash string, you can't find the input string via any means other than trial and error.
Since we have fewer hash strings than input strings, we get "collisions" --different input strings generating the same hash string, so to be secure, it must also be difficult to find two strings that collide. This prevents people from being able to create malicious files that correspond to the same hash as a benign one.
So when we want to download our favorite Linux distro, we download the file from the mirror and its hash from the distro's main website. Then we compute the hash of the file we downloaded and compare. If they're the same, we're in business. If not, the file is either corrupted or not what it claims to be, and either way we should discard it.
Popular hashing algorithms today are MD5 and SHA-1. The SHA-1 is much newer, and many documented collisions suggest that MD5 is potentially unsafe (or will be soon), but MD5 is still fairly common.

SSL/TLS

Transport Layer Security, and its predecessor, Secure Sockets Layer, is a security protocol that works in between the application and transport layer. It sets up an encrypted session by negotiating between client and server which algorithms to use, using an asymmetric key algorithm to share symmetric keys and finally encrypting/decrypting all data using those keys[2]. The most common occurrence of TLS is in HTTPS --secure web traffic, which you're probably already familiar with.
There are several reasons to use these protocols. The most obvious one is that it's vastly easier to use a pre-existing crypto implementation than to write your own. It is also much easier to guarantee compatibility with other applications. For example, a web designer has to make sure their site is functional when viewed with many different browsers. If each browser relies on a different proprietary encryption suite, the website or server must be configured to recognize each --a tedious task at best. If the server itself uses a non-standard encryption suite, then every existing browser must be updated in order to view it (which would certainly not be an effective method of attracting business to the site). A protocol standard makes implementation easier for everyone. OpenSSL is a common library for using TLS/SSL, which is found on most Linux distros as well as Mac OSX.

ssh

The secure shell should be very familiar to us by now. It's mode of operation is quite similar to TLS, in that it negotiates algorithms for a secure connection and uses both symmetric and asymmetric key ciphers in the process, but also includes several methods for authentication. Most often this is done via password, but this is actually the last method of authentication that ssh attempts. Before resorting to passwords, ssh attempts two RSA-based methods, one relying on public keys and a file of reliable hosts, the other relying solely on public keys. There is also a method that just uses a reliable hosts file, but this is insecure due to the risk of spoofing and is left off by default.
Also like TLS/SSL, ssh can run "underneath" the application layer, which it does through port forwarding. In these situations, ssh sets up a forwarding connection with the remote ssh daemon, and then all data on a specified port is passed through this connection to its destination. The advantage of this is that it can be even easier to use than TLS/SSL, since you don't actually have to write any new code. The downside is that it relies on being able to log in to the destination machine, which makes it useless for most web traffic.

Using ssh Without Passwords

As we just stated, password authentication is the last method attempted by ssh. The following instructions demonstrate how to use the RSA method instead.
First we generate a key (leave file and passphrase empty by default):
mdhcp125:~ gabe$ ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/Users/gabe/.ssh/id_rsa): /Users/gabe/.ssh/id_rsa already exists. Overwrite (y/n)? y Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /Users/gabe/.ssh/id_rsa. Your public key has been saved in /Users/gabe/.ssh/id_rsa.pub. The key fingerprint is: b7:b3:f4:5d:11:9a:f8:80:a3:d5:8b:3c:ad:90:08:75 gabe@mdhcp125.marlboro.edu
Next, copy your public key over to host you wish to connect to (in this case, cs):
mdhcp125:~ gabe$ scp .ssh/id_rsa.pub glein@cs: glein@cs's password: id_rsa.pub 100% 408 0.4KB/s 00:00 mdhcp125:~ gabe$ ssh glein@cs glein@cs's password: Linux cs 2.6.12-10-686-smp #1 SMP Fri Sep 15 16:47:57 UTC 2006 i686 GNU/Linux Welcome to cs.marlboro.edu As of Aug 30 2006 we're running Ubuntu on an Athlon 64 X2. Go figure. Last login: Fri Oct 13 11:47:22 2006 from mdhcp11.marlboro.edu
If you don't have any other public keys stored on the host yet, you can rename the public key "authorized_keys" and move it to the .ssh directory. Otherwise you can use ">>" to append the key to the file.
glein@cs:~$ mv id_rsa.pub .ssh/authorized_keys

The Weakest Link: Human Error

Modern cryptographic algorithms are extremely powerful, and so it should seem very easy to make your system impenetrable to anything short of the NSA. Unfortunately your system is useless if it can't interact with users, and sadly people are very bad at adhering to rigid systems. Incorrectly implemented or monitored cryptographic systems can leave the thickest armor with a rather gaping hole. For example, returning to our discussion of hash functions, many mirrors include a hash alongside the file in question. Say an unsuspecting user comes along and, being security conscious, downloads both. He proceeds to generate the hash of the file and compares it and lo and behold, it's the same! Unfortunately for our user, the hash he downloaded is not the same as the one on the main website, but in fact corresponds to the virus he just downloaded from this malicious mirror. Whoops!
This should seem horribly obvious --you should never use the hash on a mirror unless you're comparing it to the original --but it's all too common. Eventually everyone has to trust *somebody* to get things done, and knowing who to trust is half the battle. But if you're not AWARE that you're trusting someone, you can't make active and accurate decisions. If an intruder can convince you to hold the door into a locked building for them, or that they're the sysadmin and need your password, a dangerous trust relationship has been made, and no security measure, cryptographic or otherwise, will make a difference. Developing a trust relationship or in some way manipulating a human reaction is called social engineering, and is a common method used by attackers. We won't go into more depth in the study of social engineering, but suffice it to say, it is really just the attack of a different system --one with less stringent rules.

Sources and Further Reading

1) Stinson, D. R., Cryptography: Theory and Practice, Boca Raton, Florida, Chapman & Hall/CRC, 2006
2) http://en.wikipedia.org/wiki/Transport_Layer_Security