An Algorithm for Error Correcting Cyclic Redundancy Checks
Bill McDaniel
The Cyclic Redundancy Check algorithm has been used, for years, to
determine the existance of errors in a data transmission. It turns
out that CRCs can also be used to correct a single bit error in any
transmission.
The first I heard about error-correcting CRCs was in a conversation I had,
several years ago. (1) At the time, I thought it was general knowledge,
but as I did more research, I saw no mention of CRC error correcton in the
literature.
Error Correcting CRCs
The algorithm for error correcting CRCs involves division in binary
(modulo 2). It is also table driven. The method works like this.
A CRC checksum is calculated and is appended onto the end of the
message. The message and checksum are sent. The receiver gets the
the message and calculates a second checksum (of both parts). If the
checksum value is 0 then the message is considered valid. If not,
then the checksum value is used as a subscript in a Error Correction
table to find the position of the bad bit. The method will only find
1 bit errors.
The Error Correction Table is built by the receiver using a Generator
Polynomial (GP). The GP has to be carefully selected, because different
GPs have different characteristics. We discuss the selection of a usable
error correcting GP later in this article.
Building the Tables
One begins with a series of 0's representing correct data. One of the bits
is then set to 1. Using modulo 2 division (XOR), the receiver divides the
message by the GP, calculating a value that is used as a subscript in the
error-correction Error Correction table. See the following algorithm.
for (i = 1; i<=Message_Length; i++)
{
set all bits in the Message to 0
change the i'th bit to 1
calculate the checksum (cs)
EC_table[cs] = i
}
This table would only have to built once for each generator polynomial.
Building the EC table for 1011
In the example that follows, I have chosen a 4 bit Generator Polynomial.
With 4 bit GPs, only 2 GP values can be used for error correction -
decimal values 11 and 13. I chose 11. When one divides the message by
the 4 bit GP, the result will be a 3 bit remainder, yielding values from
0 to 7, This implies that one can use this GP for a message that is from
1 to 4 bits. (The other 3 bits are used for the checksum). The result of
the calculation of the checksums is shown below. At the bottom of the
table are the decimal values of the remainders after the division.
1000000 0100000 0010000 0001000 0000100 0000010 0000001
1011 1011 1011 1011 1011 1011 1011
0011000 0100000 0010000 0001000 0000100 0000010 0000001
1011 1011 1011 1011 1011 1011 1011
0011000 0001100 0010000 0001000 0000100 0000010 0000001
1011 1011 1011 1011 1011 1011 1011
0001110 0001100 0000110 0001000 0000100 0000010 0000001
1011 1011 1011 1011 1011 1011 1011
0000101 0000111 0000110 0000011 0000100 0000010 0000001
5 7 6 3 4 2 1
These values are then used to fill the Error Correction table. The EC table,
in checksum order, follows.
checksum bad bit
1 7
2 6
3 4
4 5
5 1
6 3
7 2
An array in C could be initialized to contain these values. In this example,
EC[0] is not used.
int EC[] = {0,7,6,4,5,1,3,2};
This process makes up the test for error correcting Generator Polynomial
validity. For a given number, if the entire table cannot be built (i.e. two
or more numbers map to 1 slot), then the number chosen cannot be used as an
Error Correcting Generator Polynomial.
In practice, when data is to be sent, one would compute the checksum (cs)
and then append the checksum onto the end of the data. When the receiver
receives the message combined with the checksum, the receiver computes
another checksum (cs2). If it is zero, then the data is (supposedly) ok.
If it is NOT zero, then EC[cs2] contains the location of the bit in error.
Sending a Message with GP = 1011
Given the following Generator Polynomial: 1101. Assume the Original
Data is 1100. First append 3 checksum bits on the end. Then calculate
the checksum (using XOR):
1100000
1011
----
0111000
1011
----
1010
1011
----
0010
The checksum is: 010 and the string that is sent is the original
data appended with the checksum - 1100010.
Suppose, the following string of bits is received 1101010. (Note:
numbering from left to right, bit 4 has been changed.) The receiver
computes the checksum ...
1101010
1011
----
01100
1011
----
1111
1011
----
1000
1011
----
011
The checksum is a binary 011, which is 3 in decimal. EC[3] is 4, which
is the position of the bad bit!
Now, this scheme can be used for much longer messages, but the above text
demonstrates the essence of the algorithm.
I wrote a program that determines whether a given bit pattern can be used
as an error-correcting GP.
There are other ways of finding the bad bit without using tables. The
following code is a slight modification of an algorithm presented by Fred
Halsall (2) for computing an 8 bit CRC.
hpo2 = 8;
v = 1;
for (i = hpo2-1; i>=1; i--)
{
if (v == checksum) bad_bit_pos = i;
v = v + v;
if (v >= hpo2) v = v ^ GP;
}
If one has a lot of packets to check, table lookup would be faster.
Using a Finite State Machine
Intrigued by the idea of CRC error correction, I looked at the method from
different angles, and, as a result, I have discovered a method for Error
Correcting CRCs using a Finite State Machine.
Given the following Generator Polynomial: 11 (1011 in binary), I first
determine the value of the left-most 1 bit (8) in the GP. I called it
hpo2 and it is equal to the highest power of 2 in the GP.
Then I built a Finite State Table (FSM) for GP = 1011. This table
represents states of the machine. The table has 1..hpo2-1 rows and 2 columns
(0 and 1). The procedure for building an FSM is as follows:
Let 't' = the current row number - this represents the current state.
Double it. (t = t * 2)
Add the current column number (zero or one) to 't'.
If the value of 't' is >= hpo2, then exclusive-or it with the GP.
Place the value of t into the current row and current column of the table.
The following C code computes the values in the FSM table.
for (r = 0; r < hpo2; r++)
for (c = 0; c < 2; c++)
{
t = r*2 + c;
if (t >= hpo2) t ^= GP;
FSM[r][c] = t;
}
After running the algorithm, the FSM table (for GP = 1011) would be:
index values
----------
0 | 0 1
1 | 2 3
2 | 4 5
3 | 6 7
4 | 3 2
5 | 1 0
6 | 7 6
7 | 5 4
Both the Sender and the Receiver would be required to build the FSM table.
Only the Receiver would build the Error Correction (EC) table, and would do
so using the following code.
pos = 7;
cc = 1;
for (r = 1; r < hpo2; r++)
{
EC[cc] = pos--;
if ((cc*=2) >= hpo2) cc ^= gp;
}
The CRC Error Correction table for 1011 would look like the following.
checksum position of
the bad bit
----------
1 | 7
2 | 6
3 | 4
4 | 5
5 | 1
6 | 3
7 | 2
Correcting a 1 bit Error Using an FSM
An example sequence of sending a message, losing a bit, then recovering the
bad bit follows:
Sender
Given an original message, say 1100
Append zeros for the checksum on the end: 1100000.
Traverse the FSM table...
cs = 0;
for (i=1; i < hpo2; i++)
cs = FSM[cs][msg[i]];
... giving a 2 (010 in binary) for the checksum (cs).
Send the message with a binary 2 appended: 1100 010
Receiver
Receive a message with 1 bit changed (say bit 4): 1101 010
Traverse the FSM table to get a value (cs2) of 3
if cs2 = 0 then the message is ok
else
EC[3] = 4, therefore bit 4 is bad
Correct the bad bit: 1100 010
Discard the checksum: 1100
A Sample Program
Listing 1 "crc_fsa.c" is a sample program that will build the tables, generate
a random string, and change a random bit. Part 2 will determine the position
of the changed bit and correct it. (The code is written with the
message|checksum in an array of int. This demonstrates the logic while
making the code much simpler to read.)
Listing 1
/* crc_fsa.c */
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define gp 285 /* the generator polynomial */
#define hpo2 256 /* highest power of 2 in the gp */
void main()
{
int r,c, t, cs, cs2, pos;
int i, bb, li,
bim, /* bits in the message */
btbc, /* bit to be changed */
bigp, /* bits in gp */
debug = 0;
int msg[hpo2];
int fsa_table[hpo2][2];
int checksum[hpo2];
time_t tt;
/* initialize the random number generator */
srand((unsigned) time(&tt);
/* determine the number of bits (bigp) in the gp */
li = gp >> 1;
bigp = 1;
while (li) { li = li >> 1; bigp++; }
/* determine the maximum number of bits in the message */
bim = hpo2 - bigp;
printf("generator polynomial %2d\n",gp);
printf("highest power of 2 in gp %2d\n",hpo2);
printf("bits in generator polynomial %2d\n",bigp);
printf("bits in message %2d\n\n",bim);
/* Both the sender and the receiver need to build the FSM */
for (r = 0; r < hpo2; r++)
for (c = 0; c < 2; c++)
{
t = r*2+c;
if (t >= hpo2) t ^= gp;
fsa_table[r][c] = t;
}
/* display the FSM */
if (debug)
{
printf("FSM table for gp = %d\n\n",gp);
printf("index 0 1\n");
printf(" -----------\n");
for (r = 0; r < hpo2; r++)
{ printf(" %2d | ",r);
for (c = 0; c < 2; c++) printf("%2d | ",fsa_table[r][c]);
printf("\n");
}
}
/* Only the receiver will need to build the Error Correction
* table for the GP */
for (r = 0; r < hpo2-1; r++) checksum[r]=0;
pos = 7;
cc = 1;
for (r = 1; r < hpo2; r++)
{
if (checksum[cc])
printf("%d cannot be used as a GP\n",gp),exit(0);
checksum[cc] = pos--;
if ((cc*=2) >= hpo2) cc ^= gp;
}
/* display the error correction table for the gp */
if (debug)
{
printf("\n\nError Correction table for gp = %d\n",gp);
printf("index position of\n");
printf(" the bad bit\n");
printf(" --- \n");
for (r=1; r < hpo2; r++)
printf(" %2d | %2d |\n",r,checksum[r]);
printf("\n");
}
/* generate a random message */
for (i=0; i <= bim; i++) msg[i] = rand()%2;
/* display the message */
printf("The original message ");
for (i=1; i <= bim; i++) printf("%d",msg[i]);
printf("\n");
/* zero out then compute the checksum */
for (i=bim+1; i < hpo2; i++) msg[i] = 0;
cs = 0;
for (r = 1; r < hpo2; r++)
cs = fsa_table[cs][msg[r]];
/* Put the checksum on the end of the message */
for (i = bim+1; i < hpo2; i++) msg[i] = (cs >> hpo2-i-1) & 1;
/* display the message and the checksum */
if (debug)
{
printf("with the checksum ");
for (i=1; i < hpo2; i++)
printf("%d",msg[i]);
printf("\n");
}
/* The message|checksum can now be sent. Simulate a problem by
changing a random bit */
btbc = rand()%(hpo2-1)+1;
msg[btbc] ^= 1;
/*
* Part 2
* The receiver has received a message with 1 bad bit.
*/
printf("After receiving the message, msg = ");
for (r = 1; r < hpo2; r++) printf("%d",msg[r]);
printf("\n");
/* compute the checksum */
cs2 = 0;
for (r = 1; r < hpo2; r++)
cs2 = fsa_table[cs2][msg[r]];
if (debug) printf("the checksum is %d\n",cs2);
if (cs2 > 0)
{
/* determine the bad bit */
pos = checksum[cs2];
/* correct the bad bit */
msg[pos] ^= 1;
}
/* display the (corrected) message */
printf("The corrected message ");
for (i=1; i <= bim; i++) printf("%d",msg[i]); printf("\n");
printf("Normal Termination\n");
}
Sidebar
The following is a list of some of the smaller Error Correcting Generator
Polynomials. There are thousands more.
generator bits bits total hpo2
polynomial in in bits
message remainder
dec bin
7 111 1 2 3 4
11 1011 4 3 7 8
13 1101 4 3 7 8
19 10011 11 4 15 16
25 11001 11 4 15 16
37 100101 26 5 31 32
41 101001 26 5 31 32
47 101111 26 5 31 32
55 110111 26 5 31 32
59 111011 26 5 31 32
61 111101 26 5 31 32
67 1000011 57 6 63 64
91 1011011 57 6 63 64
97 1100001 57 6 63 64
103 1100111 57 6 63 64
109 1101101 57 6 63 64
115 1110011 57 6 63 64
1. Conversation with Maartin van Sway, now Professor Emeritus at Kansas
State University.
2. "Data Communications, Computer Networks and Open Systems", Fred Halsall
Addison-Wesley, 1996, (p. 137).
Sidebar:
Applications of forward error correction.
- Satellite Tramission - If a host is sending data via a satellite, the cost of
sending a regular packet is high, so the cost of a resend just doubles the
price for that packet.
- High Speed Technology - In the future, there may be a tendency to push the
technology. (Lets crank this baby up and see what it will do.) The faster
bits move through a medium, the higher the probability of error.
- Storage Devices (hard disks) that are not working properly. Most
hard drives have CRC check codes associated with each sector. These check
codes could be used to correct the data that are on disks with bad sectors.
- Compression programs use CRC to validate the data. The way the programs
are written now, the CRC is used for error detection, not correction.
- Error Correcting memory - "ECM RAM using some kind of error detection and
correction (EDAC) scheme. The two types of memory errors in RAM (especially
DRAM) are "soft" errors due to radiation-induced bit switching, and "hard"
errors due to the unexpected deterioration of a memory chip. Soft errors do
not indicate lasting damage to the memory board, but they do corrupt programs
or data."
Direct quote from "http://wombat.doc.ic.ac.uk/foldoc/foldoc.cgi?error
+correcting+memory" [Clearpoint, "The Designer's Guide to Add-In Memory",
Third Addition]. (10-10-1995)"
With respect to RAM, there are memory chips that have extra bits for each
byte of storage. Using this scheme, each 'byte' would consist of 12 bits -
eight bits for the data and four to provide error-correcting capability.
If the RAM has a fault then the situation is not catastrophic. Error
recovery is possible, even if 1 of the bits is bad.
- PowerLine Carriers
There is a growing interest in the use of PowerLine Carrier (PLC) for data
communication using the intrabuilding electric power distribution circuits.
Power lines were not designed for data communications and exhibit highly
variable levels of impedance, signal attenuation and noise... Harmful effects
of impulse noise on data communications systems can be expected.
Direct quote from "http://www.metricom-corp.com/fec.html"