CS 268 - Class Notes (Network Protocols)

Network Protocols

We'll be looking at the network protocols used in TCP/IP. These will include the Internet Protocol (IP), the Internet Control Message Protocol (ICMP), and how IP routing works.

Internet Protocol (IP)

The IP header is given below.

 0             7               15                              31
|  ver  |  hlen |     TOS       |       Total Length            |
|          Identification       |Flags|     Fragment Offset     |
|      TTL      |   Protocol    |          IP Checksum          |
|                      Source IP Address                        |
|                   Destination IP Address                      |
|                        IP Options (if any)                 ...|
The ver field is set to 4 to indicate IPv4. The hlen field specifies the length of the IP header in 32-bit words. The TOS (Type-Of-Service) field is used to indicate the type of service the packet should receive by routers. This field is broken up into several pieces.

|Prec |D|T|R|M|0|
The Prec field is unused today, but once meant precendence. The last field is unused and is set to 0. The other fields are: RFC 1349 documents the use of these fields and how applications should set them. However, very few routers do anything with these fields. But they should! Only 1 bit may be turned on at once. Telnet should set D. FTPs control channel should set D, while FTPs data channel should set T. DNS using UDP should set D and DNS using TCp should set none. A DNS zone transfer should set T. NNTP should set M. Some new routing protocols can route based on TOS, but seldom are they used in this capacity.

The total length field contains the total length of the IP header and the data it contains. IP performs fragmentation and reassembly and uses the flags and Identification field to do so. The identification field is a unique value for each datagram. Each fragment of the same datagram uses the same identification value. One of the flags is a "more fragments" flag. It indicates that more fragments for this datagram are forthcoming. The last fragment of a datagram does not have this flag set. Routers will fragment a datagram if the MTU size of the media requires it. Also, a sender will fragment a datagram it sends if the media it must send over has an MTU size that requires it. Reassembly is only done at the final destination of a datagram. The "don't fragment" flag is used to specify that a datagram can nto be fragmented by routers along its path. The offset field specifies how far from the beginning a particular fragment is. Reassmebly uses a small timer that is set when an initial fragment is received. If it expires and not all the fragments have been received, the whole datagram is discarded.

The TTL (Time-To-Live) field is used to keep routing loops from allowing datagrams to stay on the network indefinitely. This value is decrmented by 1 for each router the datagram passes through. When the value reaches 0, the datagram is discarded and not forwarded anymore.

The IP checksum is only calculated over the IP header. It is not calculated over the data.

IP provides numerous options. Including, security, timestamps, recording routes, specifying routes, Router Alerts, etc.

Internet Control Message Protocol (ICMP)

ICMP runs on top of IP. Its primary job is to communicate error messages from back to a sender. The general layout of an ICMP message is given below.

 0             7               15                              31
|    type       |    code       |       ICMP checksum           |
|           contents and format depend on type and code         |
ICMP message are either queries or error messages. Most ICMP error messages contain the IP header and first 8 bytes of data from the IP datagram that generated the error. Notice that this will include the UDP or TCP ports if those protocols were used.

ICMP error messages are never generated for:

ICMP is used for some applciations. Most notably, ping and traceroute.

The Ping Applciation

The ping application uses ICMP echo request and echo reply messages. The format of these messages is given below.

 0             7               15                              31
|    type       |    code       |       ICMP checksum           |
|        Identification         |      sequence number          |
|                          Optional Data                        |
The code field is set to 0. The type field is set to 8 for an echo request and 0 for an echo reply. The Identification field is used to identify which application is sending the ping if multiple pings are used. This is usually set to the Process ID of the application. The sequence number field is used to indicate what request an reply is for. The sequence is that a ping application sends an echo request and hopefully gets an echo reply from the destination. The ping application can compute RTT by recording time of echo request transmission and time of echo reply reception.

The Traceroute Application

Traceroute attempts to discover the route to a given host from a sender. This is accomplished by using increasing TTLs and waiting for either ICMP errors when the TTL reaches 0 and is discarded or for ICMP unreachable port errors. Traceroute sends UDP messages to port 30000 which is usually unused by an application. As the TTL is sent in an increasing fashion, the router where it is dropped sends an ICMP error (type is 11 and code is 0 or 1). The IP header and first 8 bytes of the data is also sent in the ICMP message. Thus correct traceroute application can be identified. The source IP address of the ICMP error indicates the router that dropped the UDP message. If the destination is reached, an ICMP port unreachable message is generated instead. Trcaeroute assumes that routing doesn't change much between "probes".

IP Routing

The basic mechanism of routing is the routing table. Each host (including routers) have a routing table. Each entry contains the following information: This table is searched in a specific order. That order is host entries, network entries, and default entries. This means that if a host entry matches before a network entry, then the host entry is chosen. Loopback addresses are also placed in the routing table. An example routing table for naur is given below.
Destination Gateway Flags Interface
localhost localhost host lo0 naur none hme0
default gateway hme0
The route command is used on most hosts to update the routing table at system initialization time.

Once a message is received by a router, it consults its routing table and sees if it can get the message closer to its eventual destination by sending it to other routers.

Address Resolution Protocol

Once a next-hop router is chosen for a destination, the Data Link layer address (such as Ethernet address) can be filled in with the Data Link address of that next-hop router. The process of getting an Ethernet (or other broadcast capable media) address for an IP address is called ARP (Address Resolution Protocol). An ARP request is broadcast out. The ARP reply is unicast back to the requester. The owner of the IP address is the entity that responds. ARP responses are also cached for a short period of time (20 minutes or so).

IP Routers and Dynamic Routing

IP Routers have large routing tables and are update dynamically during operation. ICMP provides mechanisms to dynamically update routing tables. ICMP redirects allow routes to be built up by routers. Normally a host is configured with a default route and ICMP redirects are used to update the hosts routing table as it sends messages to other hosts. ICMP router discovery (RFC 1256) update the default router used by a host. A host broadcasts/multicasts router solicitations, a router responds with an advertisement and periodically a router sends advertisements.

Routing protocols are protocols designed to be used by routers to dynamically update their routing tables. Routers communicate by exchanging information and update their routing tables. Routing daemons, such as routed and gated, are the mechanisms that control the routing protocols and update the routing table.

Routing policy is the process of determining which routes to place in a table based on social, contractual, and technical agreements. To make routing scaleable, a hierarchical approach is required. Each entity that can administrate a set of routers is called an Autonomous System (AS). Routing within an AS is controlled by Interior Gateway Protocols (IGPs). Routing between ASs is controlled by Exterior Gateway Protocols (EGPs). IGPs are Intradomain Routing Protocols and include such protocols as RIP and OSPF. EGPs are Interdomain Routing Protocols and include such protocols as EGP and BGP. Common routing daemons are routed and gated. Routed supports RIPv1. Gated v2 supports RIPv1, EGP, and BGPv1. Gated v3 supports RIPv1, RIPv2, OSPFv2, EGP, BGPv2, and BGPv3.

Routing Algorithms

There are two basic routing algorithms used for dynamically computing routes. The first is distance vector routing and the second is link-state routing.

Distance vector routing algorithms maintain distances from itself to each possible destination. Distances are computed using information in neighbors distance vectors. So for example, I am a router and one neighbor says that home.net is 10 hops away, another says it is 5 hops away, another says it is 4 hops away, and another says it is 3 hops away. If I need to send something to home.net, I would like to send it to the one who says it is 3 hops away.

Distance vector routing has one big problem. It is called "counting to infinity" and increases how long the algorithm takes to converge after a change. Imagine we have a simple routing setup that is a chain. A is directly connected to B and B is directly connected to C. Initially, A believes C is 2 hops away and B believes that C is 1 hop away. Imagine if the link connecting B and C breaks. B consults its information and sees that A is 2 hops away from C (B does not know that A calculated its distance based on B). So, B calculates its distance to C as 2+1=3. This new information causes A to recalculate its distance to C to be 3+1=4. This continues until both B and C reach the predefined number of hops called infinity (or not connected). The two ways of fixing this are to include hop information in the distance vectors or use "split horizon". Split horizon doesn't fix it in the general case, but it does help in most cases. In split horizon, a simpel rule is followed. That is if R forwards traffic for destination D through neighbor N, then R reports to N that R's distance to D is infinity.

Link-state routing is more complicated. Each router must actively test its link to its neighbors and advertise that status to other routers. This dissemination can be tricky. After getting this information, each router can then calculate the distance and path to each other router. Lets look at a graphical example below.

    6       2      5
A ----- B ----- C ---\
|2      |1      |2    G
D ----- E ----- F ---/
    2       4      1

Link State Database:
A: B/6, D/2
B: A/6, C/2, E/1
C: B/2, F/2, G/5
D: A/2, E/2
E: B/1, D/2, F/4
F: C/2, E/4, G/1
G: C/5, F/1
The database contains the link state for each node. Each node contains to list of neighbors and there distances. Each node can compute the path to each other node by using a modified version of Dijkstra's all-pairs shortest path. Take for example the node C, it would compute its routes as the following tree. The numbers in paranthesis are hop counts to that destination from C.

  F --- G
 /(2)   (3)
 \(2)   (3)   (5)   (7) 
  B --- E --- D --- A

Routing Information Protocol (RIP)

RIP uses UDP and port 520 and is specified in RFC 1058. That RFC was written after RIP was widely deployed. RIP is a distance vector routing protocol. RIP packets are limited to 512 bytes in length and follow the format below.

 0             7               15                              31
|   command     |     ver       |              0                |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ---
|       address family          |              0                |
|                           IP address                          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+    route
|                               0                               |  (20 bytes)
|                               0                               |
|                         metric (1-16)                         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ---
A RIP message is composed of a maximum of 25 (20 bytes each) routes. The address family field is set to 2 for IP. The command field can take on the value 1 for request, 2 for reply, 5 for poll, or 6 for pollentry. Values of 3 and 4 are obsolete and 5 and 6 are undocumented. Ver is set to 1 in this case.

RIPs operation begins with a broadcast request out all interfaces. If a request is received, the router checks the address family of the request if it is 0 and the metric is 16, it then responds with its entire router table. If the address family is not 0, then it responds with the value for it has in its table for the IP address. If it has the address, it sends the metric it has. If it doesn't have the address, it sends a response with metric set to 16 (infinity). When a response is received by a router, it updates its routing table after validating the entry. This validation is usually very informal. A router regularly (every 30 seconds or so) sends its entire routing table to its neighbors using a broadcast. When a metric for a route changes, a router sends the changed routes to its neighbors.

In RIP, each route has a lifetime of about 3 minutes. If no update is sent in 3 minutes, the metric for the route is set to 16 and the route is marked for deletion. In no update is received for an additional 60 seconds, the route is then deleted.

Notice that RIPv1 does not have a subnet mask. This is because RIPv1 assumes that the subnet mask used is the same as the interfaces subnet mask. This is flawed, but works in some cases. RIPv2 adds subnet masks, a list of next-hop routers, and route tags (for ASs) as well as simple authentication, and supports multicast so that broadcasts can be avoided.

Open Shortest Path First (OSPF)

OSPF is a link-state routing protocol. It is specified in RFC 1247. OSPF can also support TOS (Type-Of-Service) routing. This is accomplished by using multiple metrics per route. OSPF uses IP directly instead of using UDP.

Border Gateway Protocol (BGP)

BGP is an EGP that connects ASs. BGPv3 is specified in RFC 1267 and its use is discussed in RFC 1268. BGP uses TCP connections to its neighbors. BGP uses distance vector routing, but enumerates routes to each destination. Each AS in an EGP is given a 16-bit number, its AS number, and routing information is based on AS number. So, this enumerated route to a destination is a set of AS numbers. ASs are classified into three categories. Stub ASs only carry local traffic, traffic which originates or terminates in that AS. Stub ASs also have one connection to another AS. Multi-homed ASs refuse to carry transit traffic (traffic that does not originate or terminate in that AS), but has more than 1 other AS connection. A transit AS is an AS that has multiple connections and may carry transit traffic under some policy restrictions. The goal of EGPs is to try to reduce the number of transit ASs to the least as possible amount.

BGP uses four message types. The Open message is sent when a link comes up. An update message is sent to exchange routing information. A notification message is sent as the final message before a link is disconnected. And keepalive messages are sent to reassure a neighbor that everything is OK (in the absence of routing updates).

BGPv4 supports CIDR (Classless InterDomain Routing). CIDR allows subnet masks to lapse into the network ID portion of an address. This reduces the size of the routing table EGPs must support, by allowing many of the Class C addresses to be collapsed into a few addresses. This is only as good as the policy for allocating Class C addresses is enforced, though.

Todd L. Montgomery (revised 04.26.1998)