The hashing and encrypting of sensitive data protects the data from unwanted access by people with ill intentions. Even though the two algorithms serve the same purpose, they are very different and each are suited for specific forms of protecting your valuable resources. Knowing these differences will make it easier to verify that you have the correct option in place for the right scenario.
What is Hashing?
Hashing has been used to organize strings of variant lengths into a value that is always the same length – usually smaller. The important part about hashing is that it is a one way mapping. i.e.you cannot get the original value from the hashed value. It is, however, possible for multiple input values to hash to the same output value, but the best hash algorithms will eliminate these collisions as much as possible. This link explains minimizing collisions in more detail.
Proper hashing algorithms will also make it very hard (virtually impossible) to reverse the hash by processing the input multiple times. MD5 does it 64 times for each 512 bit chunk of data. The results from each of the 64 iterations are then combined together to create the hash.
If you were to decode the hash, the first item of business would be to separate the hash back into the 64 individual hashings. Then of course reverse each of the 64 individual units.
“Now, to explain why this is VERY hard, imagine trying to deduce
b from the following formula:
10 = a + b. There are 10 positive combinations of
b that can work. Now loop over that a bunch of times:
tmp = a + b; a = b; b = tmp. For 64 iterations, you’d have over 10^64 possibilities to try. And that’s just a simple addition where some state is preserved from iteration to iteration. Real hash functions do a lot more than 1 operation (MD5 does about 15 operations on 4 state variables). And since the next iteration depends on the state of the previous and the previous is destroyed in creating the current state, it’s all but impossible to determine the input state that led to a given output state (for each iteration no less). Combine that, with the large number of possibilities involved, and decoding even an MD5 will take a near infinite (but not infinite) amount of resources. So many resources that it’s actually significantly cheaper to brute-force the hash if you have an idea of the size of the input (for smaller inputs) than it is to even try to decode the hash.” 1
One of the ways hackers use to get around the effort required to compute all possible hashes is to use pre-computed rainbow tables (link). This is where salting the hash comes in handy for the good guys. There is already an article on our blog about salted hashes here
What is Encrypting?
Encrypting sensitive data hides the true value so it can’t be seen while being stored or transported, but allows for decryption later in time so the value can be viewed and used again by the proper person. The important piece about an encryption algorithm is that there is always a one to one mapping between the input and the output values. It is possible for more than one input to generate the same encrypted output, but a proper encryption algorithm won’t let this happen. The input and output lengths can always vary as opposed to the fixed length hashing output. There is always a specific way to reverse the encryption by using a well-defined method. Properly encrypted data cannot be identified as different than noise while hashing is always a consistent format.
Storing the hash of a password works well because the plain text of the password cannot be retrieved from the storage area. Any password entered by a user is put through the hash function and the resulting hash is compared against the stored one to authenticate the user.
You can also validate input data using a hash function; instead of doing a lengthy compare of two large files, hash both files and if the hashes match, the files are the same.
“The probability of a collision is astronomical for small input sizes (assuming a good hash function). That’s why it’s recommended for passwords. For passwords up to 32 characters, md5 has 4 times the output space. Sha1 has 6 times the output space (about). Sha512 has about 16 times the output space. You don’t really care what the password was, you care if it’s the same as the one that was stored. That’s why you should use hashes for passwords”.1
Encryption of a value is necessary when the plain text of the value will be needed when the item is looked up in storage – if you are storing your bank account numbers, you will need to see the plain text version of them to use them. The same applies to data that is sent securely from one location to the other and needs to be viewed by the receiving party.