An Algorithm for Error Correcting Cyclic Redundancy Checks

Bill McDaniel

The Cyclic Redundancy Check algorithm has been used, for years, to determine the existance of errors in a data transmission. It turns out that CRCs can also be used to correct a single bit error in any transmission.

The first I heard about error-correcting CRCs was in a conversation I had, several years ago. (1) At the time, I thought it was general knowledge, but as I did more research, I saw no mention of CRC error correcton in the literature.

Error Correcting CRCs

The algorithm for error correcting CRCs involves division in binary (modulo 2). It is also table driven. The method works like this. A CRC checksum is calculated and is appended onto the end of the message. The message and checksum are sent. The receiver gets the the message and calculates a second checksum (of both parts). If the checksum value is 0 then the message is considered valid. If not, then the checksum value is used as a subscript in a Error Correction table to find the position of the bad bit. The method will only find 1 bit errors.

The Error Correction Table is built by the receiver using a Generator Polynomial (GP). The GP has to be carefully selected, because different GPs have different characteristics. We discuss the selection of a usable error correcting GP later in this article.

Building the Tables

One begins with a series of 0's representing correct data. One of the bits is then set to 1. Using modulo 2 division (XOR), the receiver divides the message by the GP, calculating a value that is used as a subscript in the error-correction Error Correction table. See the following algorithm.

for (i = 1; i<=Message_Length; i++) 
  { 
    set all bits in the Message to 0
    change the i'th bit to 1 
    calculate the checksum (cs) 
    EC_table[cs] = i 
  }

This table would only have to built once for each generator polynomial.

Building the EC table for 1011

In the example that follows, I have chosen a 4 bit Generator Polynomial. With 4 bit GPs, only 2 GP values can be used for error correction - decimal values 11 and 13. I chose 11. When one divides the message by the 4 bit GP, the result will be a 3 bit remainder, yielding values from 0 to 7, This implies that one can use this GP for a message that is from 1 to 4 bits. (The other 3 bits are used for the checksum). The result of the calculation of the checksums is shown below. At the bottom of the table are the decimal values of the remainders after the division.

1000000   0100000   0010000   0001000   0000100   0000010   0000001 
1011      1011      1011      1011      1011      1011      1011 
0011000   0100000   0010000   0001000   0000100   0000010   0000001 
 1011      1011      1011      1011      1011      1011      1011 
0011000   0001100   0010000   0001000   0000100   0000010   0000001 
  1011      1011      1011      1011      1011      1011      1011 
0001110   0001100   0000110   0001000   0000100   0000010   0000001 
   1011      1011      1011      1011      1011      1011      1011 
0000101   0000111   0000110   0000011   0000100   0000010   0000001 
    
      5         7         6         3         4         2         1

These values are then used to fill the Error Correction table. The EC table, in checksum order, follows.

checksum  bad bit 
      1      7 
      2      6 
      3      4 
      4      5 
      5      1 
      6      3 
      7      2

An array in C could be initialized to contain these values. In this example, EC[0] is not used.

int EC[] = {0,7,6,4,5,1,3,2};

This process makes up the test for error correcting Generator Polynomial validity. For a given number, if the entire table cannot be built (i.e. two or more numbers map to 1 slot), then the number chosen cannot be used as an Error Correcting Generator Polynomial.

In practice, when data is to be sent, one would compute the checksum (cs) and then append the checksum onto the end of the data. When the receiver receives the message combined with the checksum, the receiver computes another checksum (cs2). If it is zero, then the data is (supposedly) ok. If it is NOT zero, then EC[cs2] contains the location of the bit in error.

Sending a Message with GP = 1011

Given the following Generator Polynomial: 1101. Assume the Original Data is 1100. First append 3 checksum bits on the end. Then calculate the checksum (using XOR):

The checksum is: 010 and the string that is sent is the original data appended with the checksum - 1100010. Suppose, the following string of bits is received 1101010. (Note: numbering from left to right, bit 4 has been changed.) The receiver computes the checksum ...

The checksum is a binary 011, which is 3 in decimal. EC[3] is 4, which is the position of the bad bit!
Now, this scheme can be used for much longer messages, but the above text demonstrates the essence of the algorithm.

I wrote a program that determines whether a given bit pattern can be used as an error-correcting GP.

There are other ways of finding the bad bit without using tables. The following code is a slight modification of an algorithm presented by Fred Halsall (2) for computing an 8 bit CRC.

    hpo2 = 8;
    v = 1; 
    for (i = hpo2-1; i>=1; i--) 
      { 
        if (v == checksum) bad_bit_pos = i; 
        v = v + v; 
        if (v >= hpo2) v = v ^ GP; 
      }

If one has a lot of packets to check, table lookup would be faster.

Using a Finite State Machine

Intrigued by the idea of CRC error correction, I looked at the method from different angles, and, as a result, I have discovered a method for Error Correcting CRCs using a Finite State Machine.

Given the following Generator Polynomial: 11 (1011 in binary), I first determine the value of the left-most 1 bit (8) in the GP. I called it hpo2 and it is equal to the highest power of 2 in the GP.

Then I built a Finite State Table (FSM) for GP = 1011. This table represents states of the machine. The table has 1..hpo2-1 rows and 2 columns (0 and 1). The procedure for building an FSM is as follows:

  Let 't' = the current row number - this represents the current state. 
  Double it.  (t = t * 2) 
  Add the current column number (zero or one) to 't'. 
  If the value of 't' is >= hpo2, then exclusive-or it with the GP. 
  Place the value of t into the current row and current column of the table.

The following C code computes the values in the FSM table.

for (r = 0; r < hpo2; r++) 
  for (c = 0; c < 2; c++) 
    { 
      t = r*2 + c; 
      if (t >= hpo2) t ^= GP; 
      FSM[r][c] = t; 
    }

After running the algorithm, the FSM table (for GP = 1011) would be:

  index    values 
         ---------- 
    0   |   0  1 
    1   |   2  3 
    2   |   4  5 
    3   |   6  7 
    4   |   3  2 
    5   |   1  0 
    6   |   7  6 
    7   |   5  4

Both the Sender and the Receiver would be required to build the FSM table. Only the Receiver would build the Error Correction (EC) table, and would do so using the following code.

  pos = 7;
  cc = 1;
  for (r = 1; r < hpo2; r++)
    {
      EC[cc] = pos--;
      if ((cc*=2) >= hpo2) cc ^= gp;
    }

The CRC Error Correction table for 1011 would look like the following.

  checksum   position of 
             the bad bit 
             ---------- 
     1      |     7 
     2      |     6 
     3      |     4 
     4      |     5 
     5      |     1 
     6      |     3 
     7      |     2

Correcting a 1 bit Error Using an FSM

An example sequence of sending a message, losing a bit, then recovering the bad bit follows:

  Sender 
    Given an original message, say 1100 
    Append zeros for the checksum on the end: 1100000. 
    Traverse the FSM table... 
 
       cs = 0; 
       for (i=1; i < hpo2; i++) 
         cs = FSM[cs][msg[i]]; 
 
    ... giving a 2 (010 in binary) for the checksum (cs). 
    Send the message with a binary 2 appended: 1100 010 
 
 
  Receiver 
    Receive a message with 1 bit changed (say bit 4): 1101 010 
    Traverse the FSM table to get a value (cs2) of 3 
    if cs2 = 0 then the message is ok 
    else 
        EC[3] = 4, therefore bit 4 is bad 
        Correct the bad bit: 1100 010 
    Discard the checksum: 1100

A Sample Program

Listing 1 "crc_fsa.c" is a sample program that will build the tables, generate a random string, and change a random bit. Part 2 will determine the position of the changed bit and correct it. (The code is written with the message|checksum in an array of int. This demonstrates the logic while making the code much simpler to read.)

Listing 1

/* crc_fsa.c */ 
#include <stdio.h> 
#include <stdlib.h> 
#include <time.h>
 
#define  gp   285      /* the generator polynomial */ 
#define  hpo2 256      /* highest power of 2 in the gp */ 
 
void main() 
  { 
    int r,c, t, cs, cs2, pos; 
    int i, bb, li, 
        bim,        /* bits in the message */ 
        btbc,       /* bit to be changed */ 
        bigp,       /* bits in gp */ 
        debug = 0; 
    int msg[hpo2]; 
    int fsa_table[hpo2][2]; 
    int checksum[hpo2]; 
    time_t tt; 
 
    /* initialize the random number generator */ 
    srand((unsigned) time(&tt); 
 
    /* determine the number of bits (bigp) in the gp */ 
    li = gp >> 1; 
    bigp = 1; 
    while (li) { li = li >> 1; bigp++; } 
    /* determine the maximum number of bits in the message */ 
    bim = hpo2 - bigp; 
 
    printf("generator polynomial          %2d\n",gp); 
    printf("highest power of 2 in gp      %2d\n",hpo2); 
    printf("bits in generator polynomial  %2d\n",bigp); 
    printf("bits in message               %2d\n\n",bim); 
 
    /* Both the sender and the receiver need to build the FSM */ 
    for (r = 0; r < hpo2; r++) 
      for (c = 0; c < 2; c++) 
        { 
          t = r*2+c; 
          if (t >= hpo2) t ^= gp; 
          fsa_table[r][c] = t; 
        } 
 
    /* display the FSM */ 
    if (debug) 
      { 
        printf("FSM table for gp = %d\n\n",gp); 
        printf("index     0     1\n"); 
        printf("        -----------\n"); 
        for (r = 0; r < hpo2; r++) 
          { printf("   %2d  | ",r); 
            for (c = 0; c < 2; c++) printf("%2d  | ",fsa_table[r][c]); 
            printf("\n"); 
          } 
      } 


    /* Only the receiver will need to build the Error Correction
     * table for the GP */
    for (r = 0; r < hpo2-1; r++) checksum[r]=0;

    pos = 7;
    cc = 1;
    for (r = 1; r < hpo2; r++)
      {
        if (checksum[cc])
          printf("%d cannot be used as a GP\n",gp),exit(0);
        checksum[cc] = pos--;
        if ((cc*=2) >= hpo2) cc ^= gp;
      }


    /* display the error correction table for the gp */ 
    if (debug) 
      { 
        printf("\n\nError Correction table for gp = %d\n",gp); 
        printf("index   position of\n"); 
        printf("        the bad bit\n"); 
        printf("            --- \n"); 
        for (r=1; r < hpo2; r++) 
          printf("    %2d     | %2d |\n",r,checksum[r]); 
        printf("\n"); 
      } 
 
    /* generate a random message */ 
    for (i=0; i <= bim; i++) msg[i] = rand()%2; 
 
    /* display the message */ 
    printf("The original message         "); 
    for (i=1; i <= bim; i++) printf("%d",msg[i]); 
    printf("\n"); 
 
    /* zero out then compute the checksum */ 
    for (i=bim+1; i < hpo2; i++) msg[i] = 0; 
    cs = 0; 
    for (r = 1; r < hpo2; r++) 
      cs = fsa_table[cs][msg[r]]; 
 
    /* Put the checksum on the end of the message */ 
    for (i = bim+1; i < hpo2; i++) msg[i] = (cs >> hpo2-i-1) & 1; 
 
    /* display the message and the checksum */ 
    if (debug) 
      { 
        printf("with the checksum            "); 
        for (i=1; i < hpo2; i++) 
          printf("%d",msg[i]); 
        printf("\n"); 
      } 
 
    /* The message|checksum can now be sent.  Simulate a problem by 
       changing a random bit */ 
    btbc = rand()%(hpo2-1)+1; 
    msg[btbc] ^= 1; 
 
   /* 
    * Part 2 
    * The receiver has received a message with 1 bad bit. 
    */ 
    printf("After receiving the message, msg = "); 
    for (r = 1; r < hpo2; r++) printf("%d",msg[r]); 
    printf("\n"); 
 
    /* compute the checksum */ 
    cs2 = 0; 
    for (r = 1; r < hpo2; r++) 
      cs2 = fsa_table[cs2][msg[r]]; 
 
    if (debug) printf("the checksum is %d\n",cs2); 
 
    if (cs2 > 0) 
      { 
        /* determine the bad bit */ 
        pos = checksum[cs2]; 

        /* correct the bad bit */ 
        msg[pos] ^= 1; 
      } 
 
    /* display the (corrected) message */ 
    printf("The corrected message        "); 
    for (i=1; i <= bim; i++) printf("%d",msg[i]); printf("\n"); 
 
    printf("Normal Termination\n"); 
  }

Sidebar

The following is a list of some of the smaller Error Correcting Generator Polynomials. There are thousands more.

generator        bits      bits      total   hpo2 
polynomial        in        in       bits 
               message   remainder 
 dec     bin 
  7      111       1        2          3       4 
 11     1011       4        3          7       8 
 13     1101       4        3          7       8 
 19    10011      11        4         15      16 
 25    11001      11        4         15      16 
 37   100101      26        5         31      32 
 41   101001      26        5         31      32 
 47   101111      26        5         31      32 
 55   110111      26        5         31      32 
 59   111011      26        5         31      32 
 61   111101      26        5         31      32 
 67  1000011      57        6         63      64 
 91  1011011      57        6         63      64 
 97  1100001      57        6         63      64 
103  1100111      57        6         63      64 
109  1101101      57        6         63      64 
115  1110011      57        6         63      64

1. Conversation with Maartin van Sway, now Professor Emeritus at Kansas State University.

2. "Data Communications, Computer Networks and Open Systems", Fred Halsall Addison-Wesley, 1996, (p. 137).

Sidebar:

Applications of forward error correction.

- Satellite Tramission - If a host is sending data via a satellite, the cost of sending a regular packet is high, so the cost of a resend just doubles the price for that packet.

- High Speed Technology - In the future, there may be a tendency to push the technology. (Lets crank this baby up and see what it will do.) The faster bits move through a medium, the higher the probability of error.

- Storage Devices (hard disks) that are not working properly. Most hard drives have CRC check codes associated with each sector. These check codes could be used to correct the data that are on disks with bad sectors.

- Compression programs use CRC to validate the data. The way the programs are written now, the CRC is used for error detection, not correction.

- Error Correcting memory - "ECM RAM using some kind of error detection and correction (EDAC) scheme. The two types of memory errors in RAM (especially DRAM) are "soft" errors due to radiation-induced bit switching, and "hard" errors due to the unexpected deterioration of a memory chip. Soft errors do not indicate lasting damage to the memory board, but they do corrupt programs or data." Direct quote from "http://wombat.doc.ic.ac.uk/foldoc/foldoc.cgi?error +correcting+memory" [Clearpoint, "The Designer's Guide to Add-In Memory", Third Addition]. (10-10-1995)"

With respect to RAM, there are memory chips that have extra bits for each byte of storage. Using this scheme, each 'byte' would consist of 12 bits - eight bits for the data and four to provide error-correcting capability. If the RAM has a fault then the situation is not catastrophic. Error recovery is possible, even if 1 of the bits is bad.

- PowerLine Carriers There is a growing interest in the use of PowerLine Carrier (PLC) for data communication using the intrabuilding electric power distribution circuits. Power lines were not designed for data communications and exhibit highly variable levels of impedance, signal attenuation and noise... Harmful effects of impulse noise on data communications systems can be expected. Direct quote from "http://www.metricom-corp.com/fec.html"