YouTip LogoYouTip

Dsa Hash Table

Hash Table

\\n\\n

A hash table (Hash Table), also known as a scatter table, is an extremely efficient data structure.

\\n\\n

Hash tables can perform data insertion, deletion, and lookup with a time complexity close to O(1) on average, which is much faster than array traversal (O(n)) and binary search tree search (O(log n)).

\\n\\n\\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n
OperationArray/ListHash Table
SearchO(n)O(1)
InsertO(n)O(1)
DeleteO(n)O(1)
Space EfficiencyHighModerate
\\n\\n

Hash Table is like an intelligent locker system:

\\n\\n
    \\n
  • Traditional Locker: You need to remember which locker each item is placed in (linear search)
  • \\n
  • Smart Locker: You tell the attendant the item name, and they directly tell you the locker number (hash search)
  • \\n
\\n\\n

Core Idea:

\\n\\n

Use a hash function to convert a Key into an array index to achieve fast access.

\\n\\n

Real-life Examples:

\\n\\n
    \\n
  • Dictionary Lookup: Quickly locate the page based on the first letter of the word
  • \\n
  • Phonebook: Sorted by surname pinyin, quickly find contacts
  • \\n
  • Library Call Number: Directly find the shelf location based on the classification number
  • \\n
\\n\\n
\\n

Imagine you have a huge library with thousands of books.

\\n

If you want to find a specific book, such as "Harry Potter", what would you do? The dumbest way is to start from the first book on the first shelf, searching one by one until you find it. This could take hours or even days!

\\n

What would a smart librarian do? They would use an index system: based on some rule of the book title or author name (such as the first letter), quickly locate the general area where the book is located, and then only need to search within that small area. The idea of this index system is the core of the hash table.

\\n
\\n\\n
\\n\\n

Basic Concepts

\\n\\n

What is a Hash Table?

\\n\\n

A hash table is a data structure that directly accesses values (Value) through keys (Key). It uses a magic formula called a hash function to convert keys of any size (such as a string, a number, or an object) into a fixed-size number. This number is called a hash value or hash code. Then, using this hash value as an index for the array, the value is stored at the array position corresponding to that index.

\\n\\n

This process is like assigning a unique seat number (hash value) to each key, and then telling it: your data is placed in the seat with this number. When you want to find this data, you just need to calculate the seat number using the same rule again, walk directly over to get it, without asking seat by seat.

\\n\\n

Core Components

\\n\\n

A hash table mainly consists of the following three core parts:

\\n\\n
    \\n
  1. Key: The identifier used when storing and searching for data. For example, in a phonebook, the person's name is the key.
  2. \\n
  3. Value: The actual data associated with the key. In a phonebook, the phone number is the value.
  4. \\n
  5. Hash Function: A mathematical function that maps keys to array indices. It is the heart of the hash table.
  6. \\n
\\n\\n

Working Principle Diagram

\\n\\n

Let's explain with a simple example. Suppose we want to create a phonebook to map names to phone numbers.

\\n\\n

Image 1

\\n\\n

The figure above clearly shows the working flow of a hash table: whether inserting or searching, it starts with a key, calculates the index through the hash function, and then directly operates on that position of the array, thereby achieving high-speed access.

\\n\\n

Hash Function Design

\\n\\n

Characteristics of a Good Hash Function

\\n\\n\\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n
CharacteristicDescriptionImportance
DeterminismSame input always produces the same output⭐⭐⭐⭐⭐
Uniform DistributionOutput is uniformly distributed within the hash table range⭐⭐⭐⭐⭐
Efficient CalculationCalculation process is simple and fast⭐⭐⭐⭐
Avalanche EffectSmall changes in input cause huge changes in output⭐⭐⭐
\\n\\n

Common Hash Functions

\\n\\n\\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n
TypeExampleApplicable Scenario
Division Modulo Methodhash = key % table_sizeInteger Keys
Multiplication Rounding Methodhash = floor(key * A % 1 * table_size)Float Keys
Digit Analysis MethodAnalyze the distribution pattern of numbersSpecific Pattern Data
Mid-Square Methodhash = (key * key) // 10^(n/2) % table_sizeNumber Keys
String HashingProcess character by characterString Keys
\\n\\n

Collision Resolution Methods

\\n\\n

Separate Chaining

\\n\\n\\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n
CharacteristicDescriptionTime Complexity
PrincipleEach hash bucket maintains a linked list
InsertAdd to the head of the corresponding linked listO(1)
SearchSearch in the corresponding linked listO(k), k is the length of the linked list
DeleteDelete in the corresponding linked listO(k)
Space OverheadRequires additional pointers storageLarger
\\n\\n

Open Addressing

\\n\\n\\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n \\n
TypeProbing SequenceAdvantagesDisadvantages
Linear Probingh, h+1, h+2, ...Simple and easy to implementEasy to cluster
Quadratic Probingh, h+1Β², h+2Β², ...Reduces clusteringMay not probe all positions
Double Hashingh, h+hash2(key), h+2*hash2(key), ...Uniform distributionComplex calculation
\\n\\n

Hash Table Structure Diagram

\\n\\n

Image 2

\\n\\n

Comparison of Collision Resolution Methods

\\n\\n

Image 3

\\n\\n
\\n\\n

Deep Dive into Hash Functions

\\n\\n

Hash functions are the key to hash table efficiency. A good hash function should have the following characteristics:

\\n\\n
    \\n
  • Determinism: Same keys must always produce the same hash value.
  • \\n
  • Efficiency: Calculation speed must be fast.
  • \\n
  • Uniform Distribution: Can map different keys as evenly as possible to the entire array space, reducing collisions.
  • \\n
\\n\\n

A Simple Hash Function Example

\\n\\n

Suppose our keys are strings, and the array length is 10. A simple hash function can be to sum the ASCII codes of each character in the string, then take the remainder modulo the array length.

\\n\\n

Example

\\n\\n
def simple_hash(key, array_size):\\n\\n"""\\n  A simple string hash function.\\n  :param key: Input key (string)\\n  :param array_size: Size of the hash table array\\n  :return: Calculated array Index\\n"""\\n\\n total =0\\nfor char in key:\\n# Convert characters to their ASCII code values and sum them up\\n total +=ord(char)\\n# Take modulo of array size to ensure Index is within array bounds\\nreturn total % array_size\\n\\n# Test our hash function\\n keys_to_test =["Alice","Bob","Charlie","David"]\\n array_size =10\\nprint("Simple Hash Function Test Results:")\\nfor key in keys_to_test:\\n hash_index = simple_hash(key, array_size)\\nprint(f" Key '{key}' -> Hash Index:{hash_index}")
\\n\\n

Output Result:

\\n\\n

Simple Hash Function Test Results:Key 'Alice' -> Hash Index:0 Key 'Bob' -> Hash Index:8 Key 'Charlie' -> Hash Index:5 Key 'David' -> Hash Index: 3

\\n\\n

This function is very simple, but it may not be the best, because different strings may produce the same sum (e.g., ab and ba), leading to hash collisions.

\\n\\n
\\n\\n

Hash Collisions and Resolution Methods

\\n\\n

Hash Collision refers to two or more different keys producing the same array index after being calculated by the hash function. It's like two moviegoers being assigned the same seat number, which is obviously a problem. Collisions are inevitable because the range of hash values (array size) is usually limited, while the possible keys are infinite.

\\n\\n

There are two classic methods to resolve collisions:

\\n\\n

Method 1: Separate Chaining

\\n\\n

This is the most commonly used method. Instead of placing data directly in each position of the array, it stores a linked list (or other data structures, such as red-black trees) at each array position. When a collision occurs, the new key-value pair is added to the linked list at that position.

\\n\\n

Process Description:

\\n\\n

Calculate the hash value of the key to find the array index.

\\n\\n
    \\n
  • If the linked list at that index position is empty, create a new node to store the key-value pair.
  • \\n
  • If the linked list is not empty (collision occurs), traverse this linked list.
  • \\n
  • If the same key is found, update its value (or handle according to requirements).
  • \\n
  • If the same key is not found, add the new key-value pair to the end of the linked list.
  • \\n
\\n\\n

Example

\\n\\n
class HashTableChaining:\\n\\n"""Hash table implementation using separate chaining to resolve collisions."""\\n\\ndef __init__ (self, size=10):\\n"""Initialize the hash table.\\n\\n :param size: Initial size of the underlying array\\n"""\\nself.size= size\\n# Create a size of `size` list, with each element initialized to an empty list (representing a linked list)\\nself.table=[[]for _ in range(size)]\\n\\ndef _hash(self, key):\\n"""Internal hash function. Here uses Python's built-in hash function and takes the modulo."""\\nreturn hash(key) % self.size\\n\\ndef insert(self, key, value):\\n"""Insert a key-value pair into the hash table."""\\n index =self._hash(key)\\n bucket =self.table# Get the corresponding item for this Index"Bucket"(Linked list)\\n# Traverse the bucket to check if the key already exists\\nfor i,(k, v)in enumerate(bucket):\\nif k == key:\\n# KeyAlready exists, update the value\\n bucket=(key, value)\\nreturn\\n# key does not exist,Add to the end of the linked list\\n bucket.append((key, value))\\n\\ndef get(self, key):\\n"""Get the value based on the key. If the key does not exist, return None."""\\n index =self._hash(key)\\n bucket =self.table\\nfor k, v in bucket:\\nif k == key:\\nreturn v\\nreturn None# key does not exist\\n\\ndef delete(self, key):\\n"""Delete the key-value pair based on the key."""\\n index =self._hash(key)\\n bucket =self.table\\nfor i,(k, v)in enumerate(bucket):\\nif k == key:\\ndel bucket# Remove this node from the linked list\\nreturn True# Deletion successful\\nreturn False# key does not exist,Deletion failed\\n\\ndef display(self):\\n"""Print all contents of the hash table."""\\nfor i, bucket in enumerate(self.table):\\nprint(f"Index {i}: {bucket}")\\n\\n# Usage example\\nprint("\\\\n--- Separate Chaining Hash Table Example ---")\\n phone_book =
← Dsa GraphDsa Basic Algorithm β†’