In our review of cryptographic algorithms, we will add to our Message-Digest discussion with the SHA (pronounced “Shaw”) hashing algorithms. Remember from our previous post that hashing functions are not technically encryption algorithms. More correctly, they are a mathematical way of generating a “fingerprint” of some type of data element. They are an essential part of cryptography, along with symmetric and asymmetric encryption.

## Message-Digests as data fingerprints

Message-Digest algorithms are mathematical functions that transform a data string of arbitrary length into fixed length string of data. The second version of the data (call it the output of the function), is what we call the “fingerprint” of the original data. With this type of function, it should be impossible to have two different versions of the input data that returns the same output data. Also, because hashes are “one-way” functions, it should be impossible to produce the input value even if you know the output value. They can’t be reversed.

### Collisions

A hashing function fails to do its job when we find a “collision.” A collision is when there are two distinct inputs which generate the same hash. If this should happen, we no longer have confidence that the hashing function produces a unique fingerprint of the input data. The existence of a collision is triggered either when someone finds an actual data pattern set leading to a collision, or when it is shown that a collision can be produced within a certain threshold of costs (for computing resources and time). The latter notion is important to assess whether a breach of the algorithm is likely for a reasonably funded or pervasive organization (an organized crime ring, for example).

In the mid-1990’s, collisions were discovered in the (then widely used) MD5 hashing function. This led the National Institute for Standards and Technology (NIST) in the USA to propose a more secure hashing algorithm. This led to the series of Secure Hash Algorithms (SHA).

## What are all these SHA algorithms anyway?

There is a sort of family of SHA hashing algorithms, denoted by “SHA” followed by a hyphenated number. This all started through a US government project called Capstone, and was driven by NIST and the NSA. The project sought to support publically available strong cryptography. It was not viewed in a fully positive way by the cryptography community, with the notion of keys escrowed by the government and government-designed cryptography chips.

Anyway, the series began with SHA-0, which was withdrawn shortly after its release, and was replaced by a revision known as SHA-1.

SHA-1 produces a 160-bit hash value. It is in wide use today in spite of being deprecated by NIST in 2013. In 2015, there were recommendations to mark SHA-1 unsafe, because the cost of creating collisions were revealed to be on the order of only $75K-120K on EC2 nodes, putting it within the capabilities of criminal syndicates.

SHA-2 is recommended by the US government as a replacement for SHA-1. It is actually a family of hash functions with lengths (digests) of 224, 256, 384, or 512 bits.

And if you’re wondering, of course there is also a SHA-3. NIST created a competition in 2006 to create a new hashing function standard. This was not to replace SHA-2, but as an alternative and dissimilar cryptographic hashing function. SHA-3 has been an official NIST hashing standard since 2015. A notable “dissimilarity” with SHA-3 is its use of a sponge function, which is unlike earlier SHA algorithms.

## Uses of SHA hashing algorithms

So when do you use which SHA algorithm? Well the Federal Information Processing Standard (FIPS) recommends the following. Use SHA-1, SHA-224, and SHA-256 for messages less than 2^{64} bits in length. SHA-384 and SHA-512 are recommended for messages less than 2^{128} bits in length.

The value of digital fingerprints is straightforward, and as we’ve shown there are many choices of hashing algorithms to use. When applying a hashing algorithm, one may encounter tradeoffs such as collision resistance and also processing speed.

The hashing algorithms will consume data processing resources of one form or another. The chart below comes courtesy of Javamex and shows the differences in processing time for the various hashing algorithms.

* *

*Comparison of Hashing Algorithm Speeds*

Applications for SHA-1 and SHA-2 are many for demonstrating message integrity, including password storage, file verification, and digital signatures. They are used in common Internet applications such as TLS and SSL, PGP, SSH, S/MIME and IPsec. SHA-2 is widely used for authentication of software packages and digital media. SHA-256 and SHA-512 have been proposed for use in DNSSEC and also for Unix and Linux password hashing. SHA-256 is used for Bitcoin transaction verification.

One of the most common exposures you’re likely to have to hashing algorithms is your ATM card, which holds the hash of your PIN for comparison to the hash of the PIN number you type at the terminal.