Data Security

What Is Hashing? Working, Types, and Applications

Hashing assigns a numeric value to a string using hash functions and algorithms to make data retrieval faster and enable its encryption.

Chiradeep BasuMallick Technical Writer

Last Updated: June 12, 2023

Symbolic imagery depicting the different hash algorithms

Hashing is defined as the process of assigning a numeric value to an alphanumeric string by first converting it into another numeric value and storing it in an indexed table to make data retrieval faster and/or masking the data for encryption, performed by a hash function.
This article explains how hashing works, its types, and its critical functions.

What Is Hashing?
How Does Hashing Work?
Types of Hashing
Functions of Hashing

What Is Hashing?

Hashing is the process of assigning a numeric value to an alphanumeric string by first converting it into another numeric value and storing it in an indexed table to make data retrieval faster and/or masking the data for encryption, performed by a hash function.

How Hashing Converts Strings Into Numeric Values

Source: khalilstemmler.comOpens a new window

Hashing is used to transform a key or character string into another value. This is often reflected by a smaller, fixed-length value or variable that reflects the original string and makes it simpler to locate or use.

The most common use of hashing is the creation of hash tables. A hash table holds key-value pairs in a collection that can be accessed by its index. Given that key-value pairings are infinite, the hash function may match the values to the table size. The hash value is then used as the pointer for a particular element.

A hash function creates new values based on a mathematical hashing technique, often called a hash value or hash. A good hash always employs a one-way hashing technique to avoid the conversion of the hash to the original key.

Hashing is applicable to data searching and retrieving, digital signatures, cybersecurity, and cryptography, among many other applications.

Why is hashing necessary?

Every day, the amount of data on the internet increases exponentially, and preserving this data effectively is always a challenge. This quantity of data may not be very large for routine programming. However, it must still be saved, retrieved, and processed easily and efficiently. The array data model is a relatively frequently used data structure for this purpose.

A collection of things stored in contiguous memory regions constitutes an array. The aim is to group together several entries that fall into one category.

So the question arises about why a new data structure was necessary if array data models already existed! This question is answered with the term ‘efficiency.’ Although storing in an array requires O(1) time, searching inside it requires at least O(log n) time.

O(1) denotes constant time, which indicates that constant time algorithms would always run for the same amount of time. Constant time complexity describes an algorithm whose execution time is independent of the number of inputs. In contrast, O(log n) time shows that the total amount of operations rises slowly as the input length increases. This means that time increases linearly while ‘n’ grows exponentially.

O(log n) appears to be a small amount of time; however, with a large data collection, it may create many problems, rendering the array data model inefficient.

Searching for a data model that could retain the data and perform searches in constant time, or O(1) time, was necessary. This is why hashing data structure was introduced. With the development of the hash data model, it’s now feasible to keep and retrieve information constantly with relative ease.

See More: What Is Data Transformation? Types, Tools, and Importance

What is a hash data structure?

A hash architecture is a tabular data model for associatively storing data. Data is kept in an array format in the table. However, unlike a typical array, each data element is assigned a unique index number. Once we possess the index of the needed data, information access becomes incredibly quick.

As such, it develops into a data model wherein addition and retrieval actions are swift regardless of the data size. It utilizes an array, such as a storage facility, and employs a hashing algorithm to build an index to insert or locate an element.

For example, if the key value is Jane and the content is the phone number, we pass it as follows when providing the key value to the hash function:

Hash(key) = index;

When we process the key via the hash function, it generates the index.

Hash(Jane) = 2461;

The above example adds the key, Jane, at index 2461.

The entire hashing process works using three components:

Key: A key can be any text or number provided as input to the hash function, which determines an item’s index or storage position within a data structure. In this scenario, Jane is the key.
Hash Function: The hash function accepts a key as input and produces the index of a component within an array known as a hash table. The index is referred to as a hash index.
Hash Table: A hash table refers to a data model that links keys to values via a hash function. Hash holds the data associatively in an array with a unique index for each data value.

What is hashing used for?

Hashing has several key applications, such as:

1. Hashing in cybersecurity

Hashing is used by several encryption techniques to improve cybersecurity. Without a decryption key, hackers cannot interpret hashed messages and inputs. For instance, if hackers access a database and discover information such as ‘Jane Doe, Phone number 2461’, they may instantly utilize it for malicious purposes. A hashed value such as ‘b4gh8’ is meaningless to threat actors without the decryption key. Hence, hashing protects passwords saved in a database.

2. Hashing for faster data operations

Hashing is the mapping of object data to a representation value using functions or algorithms. A hash can be utilized to restrict searches when seeking these objects on the object data map. This significantly expedites data processes. For instance, with hash tables, programmers store data as key-value pairs, such as a customer record. The key identifies the data and functions like an input, whereas the hash code or integer is mapped to generate the output.

3. Hashing in cryptography

To safeguard data, the science of cryptography employs numerous hash algorithms. The secure hash algorithm (SHA), for instance, is a typical technique used to generate 160-bit message digests (a fixed-size numerical replica of the substance of a message). SHA-2 is employed to generate a bigger message digest (224 bits). SHA-3 is the successor to SHA-2, and all these algorithms are built on hashing.

4. Hashing in digital signatures

One of the fundamental components of digital signature systems is hashing. Hash functions can be used with encryption to offer a hash value (message digest), which serves as a distinct digital fingerprint. This indicates that any modification to the input data will result in an entirely different outcome (hash value). Consequently, this aids in validating digital certificates and their sources.

5. Hashing in cryptocurrency

Bitcoin mining utilizes a double SHA-256 hash function. When a transaction has been completed, the particular blockchain node is assigned two randomly selected numbers. The nonce, a 32-bit integer, is embedded first. This creates a hash, or 256-bit number, which contains information about the instance, including the time, location, and author (relayed by). Before adding a block to a chain, miners should create a Proof-of-Work. This is how Bitcoin works.

See More: Five Best SSL Certificate Providers To Consider in 2022

How Does Hashing Work?

To comprehend how hashing functions, see the following example. Assume we want to store a collection of strings ‘ab,’ ‘cd,’ and ‘efg’ in a table.

The purpose of hashing, in this case, is to quickly search or update the data contained in the table in O(1) time, and the order of strings in the table is irrelevant. The specified collection of strings can serve as a key, and the string on its own will serve as the key’s value. How do we save the value matching to the key is the question. Here, we shall employ a simple mathematical formula: mod.

Mod is a computing operation that calculates the remainder when one input is divided by the other. For instance, 14 mod 6 = 2. In our example, we will use mod as the algorithm to calculate the key-value pair. However, real-world algorithms use highly complex formulas that are difficult to decode or decrypt.

We know that hash functions (which is a mathematical formula) are used to arrive at the hash value. For our formula, we have selected mod.
So let’s assign ‘a’ = 1, ‘b’= 2, etc., to all alphabetical characters.

The numerical value by summation of all characters of the string is calculated as follows:

‘bc’ = 2 + 3 = 5

‘cd’ = 3 + 4 = 7

‘fh’ = 6 + 8 = 14

Assume now that we have a 6-cell table to hold these strings. The hash function used in this instance is the total of the digits in (key) mod 6 (table size). By calculating the numeric total of each string in the set, divided by 6 to arrive at the remainder, we can figure out the position of the string in the array.
We then calculate the table location where we will store each string value:

‘ab’ is 5 mod 6 = 6

‘cd’ is 7 mod 6 = 1

‘fh’ is 14 mod 6 = 2

The hash data structure (which is a table) will then look like this:

1	2	3	4	5	6
cd	fh				ab

Now, what if you had to store fg instead of fh? The hashing would work as follows:

6 + 7 = 13

Therefore, fg is 13 mod 6 = 1.

However, the cd string has already occupied 1 in the hashing table. Such a scenario is known as collision, resulting from a poorly thought-out algorithmic function.

What is collision in the working of hashing?

A hash collision or clash occurs in computer science when two bits of data in a hash table have the same hash value. Although hash algorithms were designed to be collision-resistant, they can sometimes translate distinct data to the same hash. The reason for this is the pigeonhole principle.

The pigeonhole principle asserts that if ‘n’ things are placed in ‘m’ containers and n is more than m, a minimum of one container should hold more than one item. This can sometimes result in unexpected outcomes. Assuming that the population of New York City is more than the maximum number of hair strands that can exist on a human’s head, the pigeonhole principle dictates that there must be at least two individuals in London with the same number of hair strands.

Users with malicious intent might use the hash collision phenomena to imitate, access, or modify data. To counteract this and guarantee the integrity of encrypted results, cybersecurity experts might include random numbers in the hash algorithm. This technique, known as ‘salting,’ ensures a distinct output even if the data is similar (like fg = 1 and cd = 1 in our example).

See More: Top 10 SQL Courses in 2022

Types of Hashing

Hashing can be of the following four types:

Hashing Types

1. Trivial hashing

Information is directly translated to an index inside a hash table in trivial hashing, often called index mapping. This approach commonly employs the identity hash function, which translates any input data toward itself. In this instance, the data key is utilized as an index within the hash table while the corresponding value is saved at that position.

The naive hashing algorithm would simply translate the key ‘b’ to the index ‘2’ in the hash table and save the value ‘apple’ at that position.

The simplicity of trivial hashing is among its primary benefits. The hash function is simple to comprehend and implement, and retrieving data using the key is straightforward. However, it also has certain restrictions. Due to the requirement that the length of the hash table remains equal to the number of keys, it is restricted to use with tiny data sets. In addition, collisions are not handled. This means that if two keys translate to an identical index, one of the values will be overwritten.

2. Double hashing

Hash tables use double hashing as a collision resolution approach. Using two hash functions creates two distinct hash results for a given key. The first hash function is employed for determining the initial hash value, while the second is utilized to determine the step length of the probing sequence (the checklist of locations to be looked at as alternatives in the event of collisions).

Since it employs two hash functions to calculate the hash value and the step size, double hashing provides the potential for a low collision rate. This means that the likelihood of a collision is lower than with other collision resolution methods.

Yet, double hashing comes with some disadvantages. It necessitates using two hash algorithms, which might raise the computational effort of insert and search operations. Secondly, to obtain excellent speed, a suitable selection of hash functions is necessary. The collision rate could continue to be large if the hash algorithms are poorly constructed.

3. Chained hashing

Chaining is a strategy to prevent collisions. With chained hashing, every slot in the hash table serves as a head node for data, which will be inserted afterward. Consequently, if the node is vacant, data is added to the root node. Instead, if data already exists, the incoming data is attached or inserted after the existing head node. In our example, fg would be added to cd as a linked list or ‘chain.’

4. Closed hashing

Closed hashing, often called open addressing, is a technique used to solve collisions in hash tables. This approach resolves a hash collision by exploring (or scanning) alternate positions inside the array (the probe sequence) till the target index or an unused slot is located.

This probing can occur in the form of:

Linear probing, where the interval among probes is typically specified at 1 unit.
Quadratic probing, where the distance between probes grows quadratically (as outlined by a quadratic function).
Double hashing, wherein the interval among probes is specified for every record but is calculated using another hash algorithm, as discussed above.

See More: What Is a Decision Tree? Algorithms, Template, Examples, and Best Practices

Functions of Hashing

A hash function is a mathematical algorithm compressing a numerical input value. The input length to the hash function is flexible, while the output length is always fixed. The message digest and hash values are the values generated by a hash algorithm.

A few typical hash functions are:

Functions of Hashing

1. Division method

Suppose one has a hash table of the size ‘S’ and would like to record a (key, value) pair in it. According to the division approach, the hash function will be H(k) = key mod N. N is a positive integer used to calculate the hash value, which has to be bigger than S. Sometimes, S is substituted for N, as in this case.

2. Mid-square method

The computation of the hash value requires two steps. With a (key, value) pair, a hash function is computed by calculating the key’s square, or key*key. The code then selects some digits from the center of the integer to generate the hash value.

3. Digital folding method

This type of hashing function involves two steps. Initially, we partition the key-value k and create several parts: k1, k2, k3, k4, …, kn, where every part has the same number of digits except for the final part, which might include shorter digits. After this, the individual components are added. The hash value is determined by disregarding the previous carryover, if any.

4. Multiplication method

This type of function has several steps, unlike the first three.

Pick a constant value A so that 0 is less than A and greater than 1.
Multiply the value of the key by A.
Calculate (k*A) mod 1 to get the fractional or decimal portion of k*A.
Multiply the outcome of the preceding step by the hash table’s size, like M.
The hash value is generated by taking the floor (i.e., the complete integer value instead of the decimal) of the result from step 4.

The formula for this function looks like this: h(K) = floor (M (kA mod 1)), where h(K) is the hash key.

See More: Will Symmetric and Asymmetric Encryption Withstand the Might of Quantum Computing?

Takeaway

Hashing is one of the fundamental concepts that power cryptography and encryption today. It was invented in the 1950s when an IBM engineer realized that placing all the data in a singular, indexed bucket would make search and retrieval much faster. Today, hashing is a technique used by search engines, cloud services, and most other consumer and enterprise applications that involve data operations.

Are you now clear with how hashing works? Tell us on FacebookOpens a new window , TwitterOpens a new window , and LinkedInOpens a new window . We’d love to hear from you!

Image source: Shutterstock

MORE ON DATA

Cybersecurity Encryption

Chiradeep BasuMallick

Technical Writer

opens a new window

Chiradeep is a content marketing professional, a startup incubator, and a tech journalism specialist. He has over 11 years of experience in mainline advertising, marketing communications, corporate communications, and content marketing. He has worked with a number of global majors and Indian MNCs, and currently manages his content marketing startup based out of Kolkata, India. He writes extensively on areas such as IT, BFSI, healthcare, manufacturing, hospitality, and financial analysis & stock markets. He studied literature, has a degree in public relations and is an independent contributor for several leading publications.

Do you still have questions? Head over to the Spiceworks Community to find answers.

What Is Hashing? Working, Types, and Applications

Table of Contents