NetworkPhysical
This section assumes you have read the previous two sections and understand the stack and what a packet is. We will now cover the remaining three layers, focusing primarily on the network layer. Understanding the basics of the link layer is important as we will use it later in the course, and an awareness of the underlying hardware of a network is very useful --especially as a sysadmin --but the technicalities of these are less important. After this section, students will know all the fundamental technologies that make the internet work.
Contents
- Network Layer
- Link and Physical Layers
- Networking Tools
- Ping
- Traceroute and Whois
- Nmap
Network Layer
Let's quickly recap what we've covered so far. If client A on Alice's computer wishes to communicate with host B on Bob's computer, the application layer protocol dictates what they say to each other. The transport layer specifies a method for the communiques to reach each other, i.e. routing process to process. If our applications were on the same computer, we'd be done already. But of course a network consisting of only one computer hardly qualifies as a network. We now need to find a route between Alice's and Bob's computers.
IP
It's hard to find a modern computer user who hasn't used an IP address before, so we'll assume you're already familiar with the four numbers separated by periods. IP stands for internet protocol, and while other layers utilize multiple protocols, the network layer uses it almost exclusively. (We will see shortly that another protocol, ICMP, is used for special system messages, but all normal traffic uses IP.) By using unique addresses and routing tables, each device, or "hop", between host and client can pass off data to where it thinks it should go. A routing table is exactly this list of rules determining which hop to send data to next. Routing tables aren't found solely on routers --all computers need some routing information or they won't be able to communicate beyond their immediately reachable network. In fact, the distinction between a router and a normal computer is not exclusive. A router is really just a simple computer with many network interfaces. A computer with multiple network cards installed can serve the same purpose (although it is generally much bulkier and more expensive).
IP also uses the notion of a subnet mask to distinguish local networks. A subnet mask is a bitmask, that when combined with an IP address provides a range in which all devices on the local network will fall. When looking at the bits of a netmask (both IP addresses and netmasks are really just strings of 4 bytes --we write them in the familiar "quad" notation to make them more readable), all set bits are fixed as they appear in the address. For example, say our IP address is 10.1.4.3 and our netmask is 255.255.0.0, a rather common one. Then:
10.1.4.3 = 00001010.00000001.00000100.00000011 (in binary)
255.255.0.0 = 11111111.11111111.00000000.00000000
-------------------------------------
00001010.00000001.xxxxxxxx.xxxxxxxx
So all addresses of the form "10.1.x.x" will be on our local network. Netmasks can also be written as an IP address followed by a forward slash and a number from 0 to 32. This represents the number of bits from the left that are static in the local address. For example, our netmask from above could be written as "10.1.4.3/16". Originally netmasks were divided in bytes and labeled in different classes. A class C network had the first three bytes of the address static, a class B had the first two (our example is a class B netmask), and a class A network contains all possible addresses. Today it's common to use CIDR ("classless interdomain routing") based addresses, which can be set to any number of bits and are written in the second netmask form[1].
The IP packet helps regulate both the routing process and traffic control. Aside from the always important source and destination addresses, one of the most important parts of the header is the time-to-live (TTL) field. When a packet is first crafted, a one bit TTL number is assigned. Each time a hop receives and processes the packet, it decrements the TTL field by one. If this brings the TTL value to zero, the packet is dropped. This is important, since we have no guarantee a packet will ever arrive at its destination. Without a maximum discrete "distance", a packet could end up traveling eternally around the internet like a wayward malevolent spirit.
The current version of IP we use is version 4. Version 6 was created several years ago to correct many of IPv4's inherent problems. For example, with only a 32 bit address space, there are about 4.3 billion IPv4 addresses. With a world population over 6 billion already, and many people using more than one IP address, there aren't enough address available to everyone. IPv6 utilizes a 128 bit address space, meaning we could allocate over a billion-billion-billion addresses for each person in the world, which would hopefully keep us covered for a while. Despite an ever growing popularity (many networks now provide both IPv4 and IPv6 addresses), IPv4 remains the standard, and probably will for several more years.
DNS
When browsing websites or performing most other daily internet related tasks, we rarely enter *numbers* to connect to, we enter names. Imagine how Google's popularity would have suffered if users had to remember "72.14.205.99" every time they wanted to search for something. To rectify this, we rely on DNS --domain name system -- servers to translate the names we know and love into the IP addresses we need. DNS servers maintain a database of recently queried names which they can check whenever they receive a request. If they don't have an answer, they pass the request to a more authoritative server.
As long as the servers don't have the address, the query continues passing up until it reaches a root nameserver. There are currently 13 root nameservers in the world[2], most located in the US, and most controlled by either corporations or the US government. When someone pays to register a domain name, they're paying to both own the rights to the name and to have the nameservers have an entry for them.
ICMP
The Internet Control Message Protocol is used to provide 12 important control messages on the network layer. Some of the more interesting ones are echo request and reply, source quench, and TTL expired. Echo request and reply go hand in hand and are generally used to check if a host is up. Source quench send a notification to a sender to slow down the rate of transfer (such as if the host can't keep up with the amount of data it's receiving). TTL expired sends a notification when a packet's TTL field is reduced to zero, which lets a client know its packets aren't arriving at their destination.
While ICMP is intended to provide a valuable service (and it does), it can often be used with malicious intent. By sending an IP packet with correct TTL field, or by sending echo requests, we can gain important information about the layout of a network or the systems running on it. For this reason many devices, especially routers, are configured not to send ICMP packets.
Link and Physical Layer
IP provides a method for travel from source to destination, but doesn't actually deal with the transferring from one hop to another. For this, we need the link and physical layers. The link layer specifies communication between two devices when they're connected via some medium. The physical layer provides this medium.
MAC Addresses and ARP
A network interface needs two address to function: the IP address we've already seen, and a MAC (Media Access Control) address. MAC addresses are 6 byte addresses written in hex, with bytes separated by colons, such as "00:0d:93:59:e9:98". They are hard coded into the devices when they're manufactured, so they cannot be changed (without some serious hardware knowhow). They're also all unique, as they're carefully allocated by the IEEE (Institute of Electrical and Electronic Engineers) to different companies[1]. An interesting side effect of this is that we can use a MAC address to find out information about the device it belongs to. The IEEE Standards Association maintains a large list of the owners of allocated ranges available at [3]. The link layer uses these MAC addresses to route its packets (usually called "frames" on this layer) similar to IP using IP addresses. This is generally more of a broadcast model than IP however. An ethernet frame is passed onto the wire (or into the air on a wireless interface) addressed to a specific MAC address, but free for everyone to read. (Coincidently, this is why packet sniffers work.)
The Address Resolution Protocol, or ARP, serves a function similar to DNS. Since we need to identify the corresponding MAC address for our IP address, we send out an "ARP query" requesting this information. The device in question (or another that already knows) sends a reply with the MAC address. We then store this entry in our ARP cache so we don't have to look it up again. Generally these entries are temporary and are removed after some set period (generally some number of minutes).
Networking Tools
We now introduce some standard networking utilities that rely on topics we've just discussed. As always, the man pages for these contain more information and detailed usage instructions. With the exception of Nmap, these are all utilities installed by default in every Unix system and we will use them often.
Ping
The concept of "ping" is familiar to most people who have played an online game. The higher your ping, the more "lag" you feel in the game, which obviously detracts from the experience. Ping the tool is simply a utility that sends ICMP "echo" packets to a target and calculates the time it takes to receive an echo reply. This can be used to time various parts of your network, or just to check if a host is up. Often when troubleshooting a network connection, pinging a host we know is up can determine where the error is.
Traceroute and Whois
Traceroute attempts to calculate all the hops en route to a specified host by sending IP packets with a modified TTL field. By incrementing the TTL one by one, it attempts to elicit an ICMP TTL expired response from every hop along the path. This can provide lots of information. First, we can use it to map out the structure of a network. Second, we can often pinpoint the global location of a host by combining this hop information with the whois utility. Whois queries a collection of databases for information on an IP or domain address. This information includes owners and often contains addresses as well.
There is of course a certain unreliability with traceroute since not all hops are likely to respond and we have no guarantee the packets take the same route every time. Generally though we can expect it to be consistent, and we can run it multiple times to increase our odds.
Nmap
Nmap is a much larger tool than the proceeding ones, and part of an important class of tools called port scanners. This returns us to our discussion of TCP, but knowledge of netmask syntax is very helpful in using nmap to its full capabilities. The basic purpose of port scanners it to try and determine which, if any, ports on a host are listening for traffic. It does this by sending custom tailored TCP packets directed to either all or some range of ports on a host. These can be valid looking packets --a reset packet, for example --or can be some unlikely combination of set flags. Since different TCP implementations have different behavior, the "correctly" designed packet might get some sort of complaint from the host.
Knowing which ports are open allows us to check for security holes. Again, this is something that can be either good or evil. Sysadmins often use nmap to stress test their servers or make sure no users on the network are providing a tempting target. Attackers use this during their reconnaissance to either select a tempting target, or decide where to initiate their attack. For in depth documentation as well as guides to many more security tools and concepts, the nmap main website at [4] is an incredibly valuable resource.
Sources and Further Reading