Technical Post

How to Calculate Checksum

September 5, 2017

Checksums of various kinds are commonly used in data communication protocols to allow the recipient of a message to determine, quickly and easily, whether the data is likely to have been corrupted in transit. If you add all the bytes of a message together and find (neglecting overflow) that the sum is 96, then you tack that number onto the message before sending it, the recipient can repeat your summation on the first N – 1 byte of the message, and compare the result to the final byte to see if it’s 96. If so, the recipient can infer that the message probably (arguably) wasn’t altered in transit.

You’ll find a wide array of checksum techniques in common use. Three of the most popular ones are the conventional checksum, LRC (longitudinal redundancy check), and CRC (cyclic redundancy check). The latter isn’t really a checksum in the usual sense but is an example of a one-way hash that falls in the “linear congruential generator” family.

Notice, earlier, I said these data-integrity techniques can tell you whether data is “likely to have been corrupted.” No checksum technique is 100% foolproof in the general case, for arbitrary-length data. But some techniques are definitely better than others.

Let’s take a look at how to calculate LRC, checksum, and CRC using JavaScript.

How to Calculate Checksum Traditionally

The conventional 8-bit checksum is just what it sounds like: a sum of all bytes values in the input, with any overflow (from carry operations) discarded. In JavaScript:

The input to this function is expected to be a hexstring that looks something like “48656C6C6F20776F726C6421” (which, in this case, is the hex version of the ASCII string “Hello world!”). If we use “48656C6C6F20776F726C6421” as input to the above function, the output will be “5d,” which is the hex value of the final 8-bit sum.

The code is very simple. We start (in Line 3) by parsing the hex input into two-nibble chunks using the regular expression /../g — which means find substrings that match the pattern “anything followed by anything” (that’s what the two dots mean), and do it globally (that’s what the ‘g’ means).” The result is an array, s, of two-digit hex values.

At Line 5, we enter a loop (using the forEach iteration construct) in which we convert the string version of a two-nibble hex value to an actual number that we can operate on. In Line 7, we do the actual summation. Note that the number sum is essentially a 32-bit integer, behind the scenes, meaning our final value could be much larger than 255. When we’re finished looping, we need to be sure to constrain the sum to an 8-bit value. We do that in Line 9, with the logical-AND against 255. At the same time, we convert back to hex using the toString() method, with an argument of 16 (meaning we want to use radix-16 or “base 16” in our final representation of the number).

In Lines 10 and 11, we need to check that the final hex value is two nibbles long. JavaScript’s toString(16) operation returns a single-digit value for values less than 10. In that case, we need to prepend ‘0’ to the answer.

If you want to try the code out, copy and paste the above code into your JS console (in Chrome, use Shift-Cmd-J to get the console), then add a line, at the bottom (outside the function), of CHECKSUM(“48656C6C6F20776F726C6421”). When you hit Enter, the console should show ‘5d’ as the return value.

How to Calculate Checksum w/LRC

Longitudinal redundancy check (LRC) is a variation on the 8-bit checksum, differing only in that the “summation” is done using XOR, instead of numerical addition.

In Line 6, you can see the XOR-in-place operator (^=).

Since XOR never overflows, we don’t have to constrain the final result to 8 bits with an AND. We simply check for a two-nibble length, then return the final value.

If you try the code in the console using the string shown further above, you should get an answer of ’21’.

Critique OF Checksum And LRC

Neither checksum nor LRC can be considered robust against message corruption. For example, consider the original message (“48656C6C6F20776F726C6421”); suppose we change the last two bytes of the message from 6421 to 6520. Both the LRC and the checksum remain unchanged! (We simply turned one bit ON in an upstream byte, and turned the same-position bit OFF downstream, creating two changes that cancel each other out at checksum time.)

Likewise, consider what happens if you invert the message (that is, you reverse the message bytewise, so that it starts with 21 and ends with 48). Again, LRC and checksum remain the same as they were for the original message. That’s because XOR and addition are commutative. A + B will always equal B + A.

Also, consider the fact that since an 8-bit LRC or checksum can only have 256 different values, there’s a 1 in 256 chance that any message will give the exact same LRC (or checksum) another message chosen at random.

In general, it’s very easy to “fool” checksum and LRC algorithms, so they’re not really very reliable for message integrity-checking.

Fortunately, there are better algorithms than LRC or checksum for integrity-checking, but they come at a cost in terms of computational overhead.

How to Calculate Checksum w/CRC

When integrity-checking really matters, generally speaking you need to use a non-commutative hash of some kind. Often, this means a cryptographic hash, like SHA-1 or MD5, but these are computationally intensive and can be considered “overkill” in many situations.

A good tradeoff of computational overhead and reliability can be found in the Cyclic Redundancy Check, which comes in various versions, although the 16-bit CRC described below is adequate (and quite popular) for short messages (up to around 4 kilobytes).

CRC is an interesting subject, but space forbids any kind of comprehensive treatment of it here. (Consult Google.) Suffice it to say, from a practicality standpoint, a two-byte CRC provides very good sensitivity to random bitflips in the data and is unlikely to give false positives with data containing multiple bitflips. Because of this, and because it’s easy to implement in hardware or software, and runs very quickly in very little memory, you’ll find it used in a variety of data communication environments, including storage-disk drivers (where disk errors are often detected using CRC), modems, and small electronic devices (including all of ID TECH’s ViVOpay-series credit card readers).

The following JavaScript code shows how to calculate a 16-bit CRC value (returned as four nibbles of hex-ASCII).

CRC implements a hash algorithm that can be described in English as:

Set the starting crc value as 0xFFFF — Line 10
Take a byte of input data (as an 8-bit number) — Line 13
Right-shift the existing crc value by 8 bits — Line 14
XOR the right-shifted crc with the input byte — Line 14
Use the resulting value j (bottom 8 bits only) as a table offset to look up a “substitution byte” from the table known as crcTable — Line 15
Shift the crc value LEFT by 8 bits, and XOR it against the “substitution byte” — Line 15
Repeat these operations, starting from Line 13, using the next byte of input
After all input has been processed this way, XOR the result with zero and retain the bottom 16 bits of the crc — Line 17

We use a small utility routine to convert the final number from an integer to a hex string:

If you load both of these functions (numToHex, and CRC) into your browser’s JS console, and execute CRC( "48656C6C6F20776F726C6421" ), you should get a CRC for our “Hello world!” input data of ‘BD22’.

As an exercise, you might want to try flipping a bit in the input, to see what happens to output. For example, if you use our “Hello world!” string as input, and change the data’s final byte from 21 to 20, the CRC changes to ‘AD03’, which bears no relationship to ‘BD22’. Changing the final two bytes to ‘6520’ gives a CRC of ‘9E32’. (Remember, this same change did not perturb LRC or checksum.)

Consider a string representing ten zero (null) bytes. The LRC value of such a string would be zero. The checksum would obviously be zero. But the CRC would be E139.

You should be able to convince yourself pretty easily that reversing the input would give an entirely different CRC than using the original-direction input. (Which was not true for LRC or checksum.) CRC is not commutative, because of the way the top 8 bits of the CRC are XORed with the input byte before shifting everything to the left (which is similar to how cipher block chaining works, except in this case the “cipher block” size is 8 bits).

Note that while CRC is a one-way hash, it’s not a cryptographic hash per se, because it’s actually quite easy to calculate a “correction value” that, if tacked onto the data, would make a given alteration of the data yield the desired final CRC. (This is not true for so-called cryptographic hashes, where it’s difficult to calculate a “correction factor” that will back-correct an altered data block to the desired hash.) The CRC is thus appropriate for detecting unintentional data corruption.

Conclusion

There’s no shortage of “integrity check” algorithms you can use to monitor data packets for corruption. In some cases, a simple checksum or LRC will do. But for situations involving non-trivial amounts of data and a strict requirement for data integrity, you almost have to look to a “linear congruential generator” type of hash. The CRC family of algorithms have been well-tuned and optimized to provide good integrity discrimination combined with ease of implementation, rapid execution, and low memory requirements, making CRC attractive for a wide variety of data-checking scenarios involving everything from hard drives to credit card readers.

Want to Learn More About Checksum? ID TECH Has You Covered!

Get Started Today!