# Error Correction and Detection

In telecommunications, the detection and correction of errors is important for maintaining data integrity on "noisy" communication channels. *Error detection* is the ability to detect the presence of errors introduced to a stream of data by interference or faults in the transmission system between a transmitter and a receiver. *Error correction* is the ability to restore data in which errors have been found to its original state. If an error is found using an error detection code, the receiver can respond either by explicitly requesting retransmission of the data from the transmitter, or by not sending an acknowledgement for the corrupted data, in which case the transmitter will assume that the data has either not been received or has been rejected by the receiver, and will re-transmit the data. Error correction codes are used by the receiver, acting alone, both to detect the presence of an error in the received data and to re-construct the data in its original form using the error correction encoding. Error correction necessarily involves the transmission of a significant amount of additional (*redundant*) data. The overhead involved is usually far greater than that required for error-detection schemes, and for this reason error correction is generally only used for applications where re-transmission of the data is not practical. In some schemes, a compromise (*hybrid*) solution is used in which minor errors are corrected using error correction codes, while major errors result in a request for retransmission.

Signals from Voyager 1 now take more than fourteen hours to reach Earth

## Error detection schemes

An *error-detecting code* (or *backward error correction*) involves the addition of sufficient redundant data to the information being sent to enable the receiver to detect errors and request the receiver to retransmit the data. This approach is known as an *automatic repeat request* (ARQ) strategy. A number of commonly used error detection schemes exist, which vary considerably in their complexity. The amount of additional information sent is usually the same for a given amount of data, and the error detection information will have a relationship to the data that is determined by the application of an algorithm of some kind to the data itself. The receiver applies the same algorithm to the data it receives to obtain its own version of the error correction code, and then compares that version with the error correction code it has received. If the two codes match, the receiver can be reasonably sure that the data is correct. If not, it will assume that an error has occurred and respond in the appropriate manner (i.e. request retransmission, either explicitly or by not sending an acknowledgement). The common types of error detection scheme are listed below, together with a brief description.

*Repetition schemes*- the data to be sent is broken down into blocks of bits of a fixed length, and each block is sent a predetermined number of times. If one or more block differs from another block, it is concluded that an error has occurred. This type of scheme is simple, but inefficient in that the amount of overhead (in the form of redundant data) is very high. Also, if the same error affects each block in the same way, an error may go undetected.*Parity schemes*- the data is again broken up into blocks of bits of a fixed length, and one additional bit is added (the*parity bit*). The number of bits in each block that are set to one (i.e. as opposed to zero) are counted. If an*even parity*scheme is being used, and if the number of ones counted is even, then the parity bit is set to zero. If the number of ones counted is odd, the parity bit is set to one (to make the number of ones even once more). The receiver simply counts the number of bits that are set to one, and if an odd number results, an error has occurred. An*odd parity*scheme works in exactly the same way, except that the number of bits set to one must always be odd. The weakness of parity schemes is that they can only detect errors in which an odd number of bits have been changed.*Checksum*- an arithmetic calculation of some kind is performed on the bytes or words making up the data, and the result is appended to the data as a*checksum*. The receiver performs the same calculation on the received data, and compares the result to the received checksum. If the results match, the data is correct. If not, an error has occurred (parity schemes can, in a sense, be considered to be very simple checksum schemes).*Cyclic redundancy check (CRC)*- this is a somewhat more complex error detection scheme. To generate a n-bit checksum, or*frame check sequence*(FCS), a*generator polynomial*is used that must be of the order*n*. For a 16-bit checksum, for example, the generator polynomial*x*is commonly used. The transmitting device adds^{16}+x^{12}+x^{5}+1*n*0-bits to the data to be transmitted and divides the resulting*code polynomial*by the generating polynomial, which produces a remainder polynomial of*n*degrees. This remainder polynomial becomes the checksum. The data transmitted (the*code vector*) is the original data followed by the*n*-bit checksum. The receiver can either compute the checksum again from the data and verify that it agrees with the received checksum, or it can divide the data together with the checksum by the generator polynomial. If the remainder is found to be 0, the data is correct.

## Error correction schemes

An *error-correcting code* (ECC) or *forward error correction* (FEC) code involves the addition of sufficient redundant data to the information being sent to enable the receiver to both detect and correct errors, without needing to request the receiver to retransmit the data. The advantage of this approach is that a return path is not required. This would be a critical requirement for applications such as communication with deep space probes, for example, where the delay between sending a message and receiving a reply could be considerable (Voyager 1, launched in 1977, is now more than ten billion miles from earth, and signals received by NASA arrive over *fourteen hours* affter they have been transmitted). The disadvantage is that, in order to ensure the required degree of data integrity, a large amount of redundant data will have to be transmitted with each message, significantly increasing the bandwidth required. *Shannon's theorem* defines the *code rate* as the total number of bits divided by the number of bits of actual data, and the *coding gain* as the difference in *signal-to-noise ratio* (SNR) between encoded and un-encoded data that would be necessary for them both to exhibit the same *bit error rate* (BER). The effectiveness of the encoding scheme is measured in terms of both its code rate and its coding gain. The theorem essentially sets an upper limit on the error correction rate that can be achieved with a given level of data redundancy, and for a given minimum signal-to-noise ratio.

Error-correction codes can be divided into *block codes* and *convolutional codes*. Block codes work on blocks of data of a fixed-size (e.g. packets). Convolutional codes work for bit streams of arbitrary length. They tend to be more complex and more difficult to implement than block codes, and involve considerably more overhead per unit data. Block codes are calculated for each individual frame or packet independently of one-another, whereas convolutional codes encode the entire data stream for a message as one long code word, and *then* transmit the message in segments. Convolutional codes have very powerful error correction capabilities, and are widely used in satellite communications and for communicating with deep space exploration vehicles. Some error-correction schemes work very well *above* a certain signal-to-noise ratio, but not at all below it (depending on how closely the scheme approaches Shannon's theoretical limit). Because most errors occur in random bursts rather than evenly distributed throughout the data stream, the message data bits are often shuffled (a process known as *interleaving*) after they have been encoded. When the message is un-shuffled (*de-interleaved*) at the receiver, bursts of errors are dispersed throughout the data stream as individual bit errors, which can be easily corrected using the error correction encoding.