The ver field is set to 4 to indicate IPv4. The hlen field specifies the length of the IP header in 32-bit words. The TOS (Type-Of-Service) field is used to indicate the type of service the packet should receive by routers. This field is broken up into several pieces.
0 7 15 31 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ver | hlen | TOS | Total Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Identification |Flags| Fragment Offset | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | TTL | Protocol | IP Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source IP Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Destination IP Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IP Options (if any) ...| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The Prec field is unused today, but once meant precendence. The last field is unused and is set to 0. The other fields are:
+-+-+-+-+-+-+-+-+ |Prec |D|T|R|M|0| +-+-+-+-+-+-+-+-+
The total length field contains the total length of the IP header and the data it contains. IP performs fragmentation and reassembly and uses the flags and Identification field to do so. The identification field is a unique value for each datagram. Each fragment of the same datagram uses the same identification value. One of the flags is a "more fragments" flag. It indicates that more fragments for this datagram are forthcoming. The last fragment of a datagram does not have this flag set. Routers will fragment a datagram if the MTU size of the media requires it. Also, a sender will fragment a datagram it sends if the media it must send over has an MTU size that requires it. Reassembly is only done at the final destination of a datagram. The "don't fragment" flag is used to specify that a datagram can nto be fragmented by routers along its path. The offset field specifies how far from the beginning a particular fragment is. Reassmebly uses a small timer that is set when an initial fragment is received. If it expires and not all the fragments have been received, the whole datagram is discarded.
The TTL (Time-To-Live) field is used to keep routing loops from allowing datagrams to stay on the network indefinitely. This value is decrmented by 1 for each router the datagram passes through. When the value reaches 0, the datagram is discarded and not forwarded anymore.
The IP checksum is only calculated over the IP header. It is not calculated over the data.
IP provides numerous options. Including, security, timestamps, recording routes, specifying routes, Router Alerts, etc.
ICMP message are either queries or error messages. Most ICMP error messages contain the IP header and first 8 bytes of data from the IP datagram that generated the error. Notice that this will include the UDP or TCP ports if those protocols were used.
0 7 15 31 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | type | code | ICMP checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | contents and format depend on type and code |
ICMP error messages are never generated for:
The code field is set to 0. The type field is set to 8 for an echo request and 0 for an echo reply. The Identification field is used to identify which application is sending the ping if multiple pings are used. This is usually set to the Process ID of the application. The sequence number field is used to indicate what request an reply is for. The sequence is that a ping application sends an echo request and hopefully gets an echo reply from the destination. The ping application can compute RTT by recording time of echo request transmission and time of echo reply reception.
0 7 15 31 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | type | code | ICMP checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Identification | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Optional Data |
routecommand is used on most hosts to update the routing table at system initialization time.
Once a message is received by a router, it consults its routing table and sees if it can get the message closer to its eventual destination by sending it to other routers.
Routing protocols are protocols designed to be used by routers to dynamically update their routing tables. Routers communicate by exchanging information and update their routing tables. Routing daemons, such as routed and gated, are the mechanisms that control the routing protocols and update the routing table.
Routing policy is the process of determining which routes to place in a table based on social, contractual, and technical agreements. To make routing scaleable, a hierarchical approach is required. Each entity that can administrate a set of routers is called an Autonomous System (AS). Routing within an AS is controlled by Interior Gateway Protocols (IGPs). Routing between ASs is controlled by Exterior Gateway Protocols (EGPs). IGPs are Intradomain Routing Protocols and include such protocols as RIP and OSPF. EGPs are Interdomain Routing Protocols and include such protocols as EGP and BGP. Common routing daemons are routed and gated. Routed supports RIPv1. Gated v2 supports RIPv1, EGP, and BGPv1. Gated v3 supports RIPv1, RIPv2, OSPFv2, EGP, BGPv2, and BGPv3.
Distance vector routing algorithms maintain distances from itself to each possible destination. Distances are computed using information in neighbors distance vectors. So for example, I am a router and one neighbor says that home.net is 10 hops away, another says it is 5 hops away, another says it is 4 hops away, and another says it is 3 hops away. If I need to send something to home.net, I would like to send it to the one who says it is 3 hops away.
Distance vector routing has one big problem. It is called "counting to infinity" and increases how long the algorithm takes to converge after a change. Imagine we have a simple routing setup that is a chain. A is directly connected to B and B is directly connected to C. Initially, A believes C is 2 hops away and B believes that C is 1 hop away. Imagine if the link connecting B and C breaks. B consults its information and sees that A is 2 hops away from C (B does not know that A calculated its distance based on B). So, B calculates its distance to C as 2+1=3. This new information causes A to recalculate its distance to C to be 3+1=4. This continues until both B and C reach the predefined number of hops called infinity (or not connected). The two ways of fixing this are to include hop information in the distance vectors or use "split horizon". Split horizon doesn't fix it in the general case, but it does help in most cases. In split horizon, a simpel rule is followed. That is if R forwards traffic for destination D through neighbor N, then R reports to N that R's distance to D is infinity.
Link-state routing is more complicated. Each router must actively test its link to its neighbors and advertise that status to other routers. This dissemination can be tricky. After getting this information, each router can then calculate the distance and path to each other router. Lets look at a graphical example below.
The database contains the link state for each node. Each node contains to list of neighbors and there distances. Each node can compute the path to each other node by using a modified version of Dijkstra's all-pairs shortest path. Take for example the node C, it would compute its routes as the following tree. The numbers in paranthesis are hop counts to that destination from C.
6 2 5 A ----- B ----- C ---\ |2 |1 |2 G D ----- E ----- F ---/ 2 4 1 Link State Database: A: B/6, D/2 B: A/6, C/2, E/1 C: B/2, F/2, G/5 D: A/2, E/2 E: B/1, D/2, F/4 F: C/2, E/4, G/1 G: C/5, F/1
F --- G /(2) (3) C \(2) (3) (5) (7) B --- E --- D --- A
A RIP message is composed of a maximum of 25 (20 bytes each) routes. The address family field is set to 2 for IP. The command field can take on the value 1 for request, 2 for reply, 5 for poll, or 6 for pollentry. Values of 3 and 4 are obsolete and 5 and 6 are undocumented. Ver is set to 1 in this case.
0 7 15 31 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | command | ver | 0 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ --- | address family | 0 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IP address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ route | 0 | (20 bytes) +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | metric (1-16) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ---
RIPs operation begins with a broadcast request out all interfaces. If a request is received, the router checks the address family of the request if it is 0 and the metric is 16, it then responds with its entire router table. If the address family is not 0, then it responds with the value for it has in its table for the IP address. If it has the address, it sends the metric it has. If it doesn't have the address, it sends a response with metric set to 16 (infinity). When a response is received by a router, it updates its routing table after validating the entry. This validation is usually very informal. A router regularly (every 30 seconds or so) sends its entire routing table to its neighbors using a broadcast. When a metric for a route changes, a router sends the changed routes to its neighbors.
In RIP, each route has a lifetime of about 3 minutes. If no update is sent in 3 minutes, the metric for the route is set to 16 and the route is marked for deletion. In no update is received for an additional 60 seconds, the route is then deleted.
Notice that RIPv1 does not have a subnet mask. This is because RIPv1 assumes that the subnet mask used is the same as the interfaces subnet mask. This is flawed, but works in some cases. RIPv2 adds subnet masks, a list of next-hop routers, and route tags (for ASs) as well as simple authentication, and supports multicast so that broadcasts can be avoided.
BGP uses four message types. The Open message is sent when a link comes up. An update message is sent to exchange routing information. A notification message is sent as the final message before a link is disconnected. And keepalive messages are sent to reassure a neighbor that everything is OK (in the absence of routing updates).
BGPv4 supports CIDR (Classless InterDomain Routing). CIDR allows subnet masks to lapse into the network ID portion of an address. This reduces the size of the routing table EGPs must support, by allowing many of the Class C addresses to be collapsed into a few addresses. This is only as good as the policy for allocating Class C addresses is enforced, though.