Internetwork Layer Protocols
Internet Protocol (IP)
The Internet Protocol (IP) is the primary network layer protocol in the TCP/IP protocol suite, and handles the movement of datagrams across a network. The main purpose of IP is to provide a connectionless, best-effort delivery service for datagrams through an internetwork, and to provide fragmentation and reassembly of datagrams to support data links with different maximum transmission unit (MTU) sizes. The service provided by IP is unreliable in the sense that delivery is not guaranteed. Error and flow control are left to higher level protocols such as the Transmission Control Protocol (TCP). Although it is an Internet protocol, IP can be used on any kind of network.
IP has a maximum packet size of 64 kilobytes (65,535 bytes). Because this is larger than most networks can handle, IP can break a datagram down into smaller datagrams when necessary. When the first fragment of a datagram that has been fragmented in this manner arrives at its destination, a reassembly timer is started. Unless all of the datagram fragments are received before the timer expires, all of the received fragments are discarded. A sequence number in the packet's IP header enables fragments to be reassembled in the correct order. The IP datagram header is defined in terms of 32-bit words (four bytes), and comprises a maximum of six words (24 bytes) in total. The layout of the IPv4 datagram header is illustrated below.
The IPv4 datagram header
IP datagram header fields:
- Version number - the IP version number.
- Header length - the total length of the IP header in bytes.
- Type of service - usually ignored (most implementations treat all datagrams as having the same priority).
- Packet length - the total length of the datagram in bytes.
- Identification - identifies a datagram as part of a particular message.
- Flags - if the DF (Don't Fragment) flag is set, the datagram cannot be fragmented. If the MF (More Fragments) flag is set, further datagrams that are part of the current message are still to come.
- Fragment offset - the offset of the current datagram from the beginning of the message.
- Time to live (TTL) - the time in seconds that a datagram can persist on the network before it is discarded.
- Transport protocol - identifies the transport protocol used (for example, TCP = 6).
- Header checksum - a checksum for the IP Datagram header itself.
- Source address - the 32-bit source IP address.
- Destination address - the 32-bit destination IP address.
- Options - an optional field comprising several variable length codes.
- Padding - used to ensure the header is a discrete number of bytes.
Once the datagram, complete with its IP header, has been constructed, the first "hop" en route to the destination is determined. This could be the destination computer itself, if it is on the same local network as the source computer, or the default gateway router if the destination computer is on a different network. If a specific route is to be used, routing information is added to the header using the appropriate option, and the datagram is handed down to the link layer.
As a datagram traverses an internetwork, the IP implementation in each gateway router recalculates the checksum to verify its integrity. If the recalculated checksum does not match the checksum contained within the datagram, it is discarded and an error message is sent to the sending device. Next, the TTL field is the decremented and checked. If the TTL value has reached 0, the datagram is discarded and an error message is sent to the sending device. If the datagram is still viable, the gateway router determines the next hop, either using the information available in its routing table, or by examining the specific routing information held in the Options field of the IP header (if it exists). The datagram is then reconstructed with the new TTL value and a new checksum. If fragmentation is necessary, two or more new datagrams, with the appropriate header information, are constructed from the original datagram. Routing and timestamp information is added if required.
When the datagram reaches the destination device, the checksum is recalculated. If the recalculated checksum matches the checksum contained in the datagram, the receiving device checks whether or not further fragments are expected. If so, it will wait for a specified time. If a time-out occurs before the message can be reassembled, the datagram is discarded and an error message sent to the sender. Otherwise, the data is extracted from the constituent datagrams, and the original message is reconstructed. The message is then passed to a higher layer protocol. If a reply is required, it is generated and sent to the sending device.
Many problems can beset a datagram en route across an internetwork. The TTL may reach zero, or the datagram may be damaged or lost. The Internet Control Message Protocol (ICMP) provides a mechanism by which the sending device may be notified of any problems that occur.
When the current version of IP (IPv4) was developed, a 32-bit IP address, which allows a theoretical maximum of 4.3 billion unique addresses, was thought to be more than adequate to handle the projected use of the Internet. The growth rate of the Internet in recent years, however, has exceeded all expectations, and it is now generally accepted that a much larger address space is needed. As a result, the next generation of IP (IPv6) has been developed. The number of unique addresses available with IPv6 is 2128, an astronomically large number. It is thought that IPv4 will continue to be supported for the foreseeable future, although IPv4-only hosts will only be able to communicate with IPv6 hosts via intermediate protocol-translation servers.
Most transport-layer and application-layer protocols will not need to be changed significantly, if at all, to work over IPv6, although some distributed applications may need minor changes. The larger address space obviously removes the need for workarounds like Network Address Translation (NAT), and simplifies the task of subnetting in corporate networks. The main drawback is that the fourfold increase in address size increases the overhead per packet, and therefore consumes more bandwidth than IPv4. IPv6 hosts can be configured automatically when connected to a routed IPv6 network. When a host first connects to a network, it broadcasts a request for configuration parameters. Suitably configured routers can respond to such a request with a router advertisement packet that contains network-layer configuration parameters. If this method of auto-configuration is not available, a host can retrieve network configuration parameters from a DHCPv6 server, or be configured manually.
IPv6 supports packet sizes in excess of the 64 kilobyte limit imposed by IPv4, which are currently referred to as "jumbograms". The use of jumbograms is intended to improve performance over high-throughput networks. A simplified header structure has also been implemented in an attempt to improve routing performance, although advances in router technology may have rendered this improvement somewhat redundant.
A study carried out in August 2008 described the rate at which IPv6 is being adopted as "glacial". Certainly, adoption has been slow due to the introduction of Classless Inter-Domain Routing (CIDR) and Network Address Translation (NAT), which have between them greatly reduced pressure on the global IP address space. Another reason for the lack of momentum is the enormous amount of investment required by major network operators, ISPs and content providers in migrating to IPv6. Opinions as to when the current supply of global IPv4 addresses will be exhausted vary widely, but until an acute shortage starts to manifest itself, it is unlikely that change will be rapid. Because IPv6 is a relatively conservative extension of IPv4, it is not difficult to write a network stack that supports both IP protocol versions (a dual-stack). Most current implementations of IPv6 employ a dual-stack.
The IPv6 packet header consists of 40 octets and is illustrated below.
The IPv6 datagram header
Internet Control Message Protocol (ICMP)
The Internet Control Message Protocol (ICMP) is a member of the TCP/IP suite of Internet protocols. It is an internetwork layer protocol, and is responsible for generating and processing messages about the status of network devices. ICMP can inform network devices about a failure in a particular node, and works together with the Internet Protocol (IP). ICMP messages are encapsulated within IP datagrams. Error messages arising as a result of a datagram having encountered a problem are sent to the device where the datagram originated. The sender can then determine the type of error that has occurred, and take the appropriate action.
The messages generated by ICMP are described below.
|15||Information request (now obsolete)|
|16||Information reply (now obsolete)|
|17||Address mask request|
|18||Address mask reply|
All ICMP messages include a 32-bit word at the beginning of the header consisting of a type field (8 bits), a code field (8 bits) and a checksum (16 bits). An ICMP message reporting a problem with the delivery of a datagram usually includes the header and the first 64 bits of the data field from the datagram which encountered the problem, enabling the originating device to identify the datagram and perform some diagnostics. The type field takes one of the values shown in the table above. The code field provides more detailed information. The checksum field is calculated in the same way as a normal IP datagram checksum. The layout of an ICMP messages will vary, depending on the message type. The various ICMP message headers are illustrated below.
ICMP message headers
Echo request ("ping") and echo reply messages are often used to test the reachability of network hosts. The ping command generates an echo request message that is sent to a remote host. Each router or other device between the host generating the echo request message and the target host must correctly decode the message headers and relay the datagram on to the next node. Receipt of an echo reply message indicates that the node was reached successfully. These request / reply pairs are often used to identify routing problems, router failures, or network cabling problems. If an echo request message cannot be delivered, no further messages are generated. If a router is unable to send a packet to its intended destination, a destination unreachable message is sent by the router to the originating host. The router subsequently discards the packet. Either the source host has specified a non-existent address, or the router does not have a route to the destination. The destination unreachable message may also be sent to the originating host in other circumstances, such as when a datagram must be fragmented but the Don't Fragment flag in the IP datagram header is set.
Destination unreachable messages include four basic types:
- Network unreachable - indicates that a failure has occurred in the routing or addressing of a packet
- Host unreachable - indicates a delivery failure, such as the use of an incorrect subnet mask
- Protocol unreachable - indicates that the destination does not support the upper-layer protocol specified in the packet
- Port unreachable - indicates that the TCP socket or port is not available
The source quench message is used to control the rate at which datagrams are transmitted. When a device starts to receive source quench messages, it should reduce its transmission rate until the source quench messages stop. These messages are typically generated by a router or a host that has a full receive buffer, or that or has reduced the rate at which it processes incoming datagrams for some other reason. The receiving device should issue a source quench message for each datagram it discards, and may also issue source quench messages when the available buffer space falls below a predetermined threshold.
Redirect messages are sent to a router when a better route becomes available. If a router receives a datagram from another router, but has routing information about a better route, it sends a redirect message to the originating router that gives the IP address of the preferred router. The code field in the header of the redirect message contains an integer value from 0 to 3. A value of 0 indicates that all datagrams for the destination network should be redirected. A value of 1 indicates that only datagrams for the specified destination host should be redirected. A value of 2 indicates that only datagrams for the destination network that have requested the same type of service should be redirected, and a value of 3 indicates that only datagrams for the specified destination host and that have requested the same type of service should be redirected. The router will still forward the original packet to its destination. ICMP redirects keep the size of routing tables down, because for any given destination network, only the address of a single router is stored, even if the router referred to does not currently represent the best path to the destination network.
The time exceeded message is sent by a router if an IP packet's time to live (TTL) field reaches zero. The router subsequently discards the packet.
The parameter problem message is used when a semantic or syntactic error is encountered in the IP datagram header. This may happen when options are used with incorrect parameters. The ptr field in the message contains a pointer to the byte in the IP datagram header that has caused the problem.
The timestamp request and timestamp reply messages are similar to echo request and echo reply messages, but they contain a timestamp to enable the time taken for the messages to traverse the network to be monitored. In combination with routing information, the timing information thus yielded can be used to identify bottlenecks.
The address mask request message is normally sent by a host to a router in order to obtain an appropriate subnet mask. The router replies with an address mask reply message.
The ICMP Router-Discovery Protocol (IDRP) is used by hosts to discover the addresses of routers on directly attached subnets. Hosts listen for IDRP router advertisement messages that are periodically broadcast by a router on each of its interfaces. A host can also generate a router solicitation message to request immediate router advertisement messages (rather than waiting for unsolicited messages). Although these messages enable hosts to find neighbouring routers, they do not provide any indication as to which router is best for a particular destination network. If a host forwards datagrams to a router that does not currently represent the best first-hop for a particular destination, the router may return a redirect message that identifies a better first-hop router.
Internet Group Management Protocol (IGMP)
IP multicasting is a way of simultaneously delivering the same IP datagram to multiple recipients, and is based on the concept of having a group of Internet hosts that can receive multicast IP datagrams addressed to the group?s multicast IP address. Multicast IP datagrams, like other IP datagrams, are delivered on a best effort basis, and are not guaranteed to arrive at their destination in order with respect to other multicast IP datagrams, or to arrive at all members of the destination group. Hosts that are members of a group can be located anywhere on the Internet. A group can have any number of members, including none, and a host can be a member of more than one group. Hosts interested in receiving multicasts for a particular group must register themselves as a member of the group using the Internet Group Management Protocol (IGMP).
The Internet Assigned Numbers Authority (IANA) has set aside the Class D IP address range (126.96.36.199 to 188.8.131.52) for IP multicast groups. The multicast groups themselves may be permanent or transient. A permanent group has a well-known, administratively assigned IP address. It is this address that is permanent, not the membership, and it is perfectly possible for a permanent group to have zero members. A transient group is assigned an address dynamically when the group is created at the request of a host. When the group ceases to exist (i.e. when its membership falls to zero), its address becomes available once more, and may be re-assigned to another group. The address 184.108.40.206 is not assigned to any group, while 220.127.116.11 is assigned to the permanent group of all IP hosts (including gateways). This is used to address all multicast hosts on the directly connected network.
The creation of a transient group can only be carried out by multicast agents. A multicast agent is an entity that resides in an Internet gateway router (or in some other kind of special-purpose host). The will be at least one multicast agent attached directly to every IP network or subnetwork that supports IP multicasting. A host may request the creation of a new (transient) multicast group by exchanging appropriate IGMP messages with a local multicast agent. Multicast agents are also responsible for the delivery of multicast IP datagrams to group members on other networks, if they exist. A host sends a multicast IP datagram to a multicast address that identifies all local members of the multicast group. If the group has members on other networks, the local multicast agent will also receive the multicast IP datagram, and relays it to the multicast agent on each of the other networks via the appropriate Internet gateway routers. These agents then retransmit the multicast IP datagram to hosts that are members of the group on their own local networks.
Membership of a multicast group is dynamic in the sense that hosts may join or leave a group at any time. The only restriction that may be imposed on membership of a group is the requirement for hosts to hold a private access key. A host does not need to be a member of a multicast group to send datagrams to it. IGMP is used to dynamically register hosts as members of a multicast group on a particular LAN. Hosts are required to maintain a data structure listing the IP addresses of all multicast groups to which they currently belong, together with each group's loopback policy, access key, and timer variables. Hosts identify themselves as members of a multicast group by periodically sending IGMP messages, encapsulated within IP datagrams with an IP protocol number of 2, to the local multicast router. These routers listen for IGMP messages, and periodically send out IGMP queries to discover which groups are active on a particular subnet. IGMP, then, is the protocol used to exchange information about multicast group membership between hosts and multicast routers on a single physical network.
Support for IP multicasting in hosts is defined at three levels:
- Level 0 - the host does not support for IP multicasting
- Level 1 - the host may send, but not receive multicast IP datagrams
- Level 2 - the host implements full support for IP multicasting, and may create, join or leave groups
IGMP version 0 (defined in RFC 988)
IGMPv0 packet format
The type field is an 8-bit number that specifies what type of message the IGMP packet holds. The message types are described in the table below.
|IGMPv0 Message Types|
|Type (8 bits)||Description|
|1||Create group request|
|2||Create group reply|
|3||Join group request|
|4||Join group reply|
|5||Leave group request|
|6||Leave group reply|
|7||Confirm group request|
|8||Confirm group reply|
In request messages, the 8-bit code field is relevant only in the create group request message, and indicates whether the new multicast group is to be public (0) or private (1). In all other request messages, this field is set to zero (0).
In reply messages, the code field specifies the outcome of the request. The possible outcomes are described in the table below.
|IGMPv0 Reply Message Codes|
|Code (8 bits)||Description|
|1||Request denied, no resources|
|2||Request denied, invalid code|
|3||Request denied, invalid group address|
|4||Request denied, invalid access key|
|5-255||Request pending, retry in this many seconds|
The checksum field contains the 16-bit one's complement of the one's complement sum of the IGMP message, starting with the IGMP Type field, and with the checksum field initially set to zero. When the IGMP message is received, the checksum is recalculated using the same method. If the recalculated checksum does not match the checksum in the checksum field, an error has occurred.
In a confirm group request message, the 32-bit identifier field contains zero. In all other request messages, it contains a value that distinguishes the request from other requests from the same host. In a reply message, it contains the same value as the corresponding request message.
In a create group request message, the 32-bit group address field contains zero. In all other request messages, it contains a multicast group address. In a create group reply message, it contains a newly allocated multicast group address (if the request is granted) or zero (if the request is denied). In all other reply messages, it contains the same multicast group address as the corresponding request message.
In a create group request message, the 64-bit access key field contains zero. In all other request messages, it contains the access key assigned to the multicast group identified in the group address field (this is zero for public groups). In a create group reply message, it contains either a non-zero 64-bit number (if a request for a private group is granted) or zero. In all other reply messages, it contains the same access key as the corresponding request message.
IGMP version 1 (defined in RFC 1112)
IGMPv1 packet format
The version field is a 4-bit field, set to 1.
The type field is a 4-bit field that specifies the message type. IGMPv1 has only two types of message:
- Host membership query (type = 1)
- Host membership report (type = 2)
The checksum field contains the 16-bit one's complement of the one's complement sum of the 8-byte IGMP message, starting with the IGMP version field, and with the checksum field initially set to zero. When the IGMP message is received, the checksum is recalculated using the same method. If the recalculated checksum does not match the checksum in the checksum field, an error has occurred.
In a host membership query message, the 32-bit group address field contains zero. In a host membership report message, it contains the multicast group address of the group being reported.
Multicast routers periodically send out host membership query messages in order to discover which multicast groups have members on their attached local networks. These queries are addressed to the all-hosts group (IP address 18.104.22.168) and, like other IGMP messages, have an IP time-to-live (TTL) of 1. The default interval at which host membership queries are sent is 125 seconds. Hosts respond to a query by generating a host membership report for each multicast group to which they belong. Rather than send reports immediately, one after the other, the host starts a delay timer for each group of which it is a member. Each timer is set to a random delay time of between zero and D seconds (the maximum value allowed for D is 10 seconds). As each timer expires, a report is generated for the corresponding multicast group. The reports are thus spread out over a period of up to 10 seconds.
The report message is addressed to the IP address of the multicast group being reported, so that other members of the same group on the same physical network will also receive a copy. If a host sees a report for a group to which it belongs, it will stop its own timer for that group, and will not generate a report for that group. This means that in most cases, only one report will be generated for each multicast group with members on the network (note that multicast routers need not be addressed explicitly, since they receive all IP multicast datagrams by default). When a host joins a new group, it should immediately transmit a host membership report for that group, rather than wait for a host membership query, in case it is the first member of that group on the network. The report may be re-sent once or twice after short delays to allow for the possibility that the initial report may be lost or damaged.
Multicast routers maintain a list of multicast groups with members on each of their attached networks, together with an interval timer for each multicast group in the list. They do not maintain a list of the individual members of each group. A multicast router will associate each multicast group?s IP address with the corresponding physical multicast address. If no member of an existing multicast group responds to a host membership query after a specified time-out interval, the multicast router assumes there are no group members on the local network, and will discard any multicast datagrams addressed to that group. The default time-out interval is 2 x (the query interval (125 seconds) + 10 seconds) = 270 seconds. When a router receives a host membership report it resets the interval timer for the group being reported. If the group is not currently in the router?s list of multicast groups, it is added, and a new interval timer is started.
IGMP version 2 (defined in RFC 2236)
IGMPv2 packet format
The type field is an 8-bit number that specifies what type of message the IGMPv2 packet holds. The most important message types are described in the table below.
|IGMPv2 Message Types|
|Type (8 bits)||Description|
|0x11||Group membership query (general or group-specific).|
A general query is used to discover groups with members
on an attached network. A group-specific query is used to
discover whether a specific group has any members on an
attached network. The two messages are differentiated by
the group address.
|0x12||IGMPv1 membership report.|
|0x16||IGMPv2 membership report.|
|0x17||IGMPv2 leave group.|
The 8-bit maximum response time field is only used in membership query messages, and specifies the maximum time (in units of 0.1 seconds) that may elapse before a membership report must be sent by a host. In all other messages, it is set to zero and is ignored by receivers.
The checksum field contains the 16-bit one's complement of the one's complement sum of the IGMP message, starting with the IGMP type field, and with the checksum field initially set to zero. When the IGMP message is received, the checksum is recalculated using the same method. If the recalculated checksum does not match the checksum in the checksum field, an error has occurred.
In a general group membership query message, the 32-bit group address field contains zero. In a group-specific group membership query message, it contains the multicast group address of the group being queried. In a membership report or leave group message, it contains the multicast group address of the group being reported or left.
IGMPv2 works in a similar fashion to IGMPv1. The main addition is the leave group message. Hosts can now explicitly inform a local multicast router of their intention to leave a group. The router responds by issuing a group-specific membership query to determine whether there are any remaining group members on the local network. If not, the router ceases to forward multicast datagrams for the group on the local network. The multicast router is thus made aware of groups that no longer have members on the local network far earlier, reducing the number of unnecessary multicasts. When a host joins a new group, it should immediately transmit a version 2 membership report for that group, rather than wait for a host membership query, in case it is the first member of that group on the network. When a host wishes to leave a group, if it was the last host to send a membership report for that group, it should send a leave group message to the all-routers multicast group (IP address 22.214.171.124). IGMPv2 includes an extension that allows for the election of a multicast querier for each LAN (in the event of there being more than one multicast router on each LAN). The router with the lowest IP address is elected as the querier.
IGMP version 3 (defined in RFC 3376)
IGMPv3 is not described in detail here. With version 3, however, a host may join a group and specify a subset of the group?s members from which it wishes to receive multicast traffic. Leave group messages have been enhanced to enable a host to remove a subset of group members from which it no longer wishes to receive multicast traffic.