Hash Table
\\n\\nA hash table (Hash Table), also known as a scatter table, is an extremely efficient data structure.
\\n\\nHash tables can perform data insertion, deletion, and lookup with a time complexity close to O(1) on average, which is much faster than array traversal (O(n)) and binary search tree search (O(log n)).
| Operation | \\nArray/List | \\nHash Table | \\n
|---|---|---|
| Search | \\nO(n) | \\nO(1) | \\n
| Insert | \\nO(n) | \\nO(1) | \\n
| Delete | \\nO(n) | \\nO(1) | \\n
| Space Efficiency | \\nHigh | \\nModerate | \\n
Hash Table is like an intelligent locker system:
\\n\\n- \\n
- Traditional Locker: You need to remember which locker each item is placed in (linear search) \\n
- Smart Locker: You tell the attendant the item name, and they directly tell you the locker number (hash search) \\n
Core Idea:
\\n\\nUse a hash function to convert a Key into an array index to achieve fast access.
\\n\\nReal-life Examples:
\\n\\n- \\n
- Dictionary Lookup: Quickly locate the page based on the first letter of the word \\n
- Phonebook: Sorted by surname pinyin, quickly find contacts \\n
- Library Call Number: Directly find the shelf location based on the classification number \\n
\\n\\n\\nImagine you have a huge library with thousands of books.
\\nIf you want to find a specific book, such as "Harry Potter", what would you do? The dumbest way is to start from the first book on the first shelf, searching one by one until you find it. This could take hours or even days!
\\nWhat would a smart librarian do? They would use an index system: based on some rule of the book title or author name (such as the first letter), quickly locate the general area where the book is located, and then only need to search within that small area. The idea of this index system is the core of the hash table.
\\n
\\n\\n
Basic Concepts
\\n\\nWhat is a Hash Table?
\\n\\nA hash table is a data structure that directly accesses values (Value) through keys (Key). It uses a magic formula called a hash function to convert keys of any size (such as a string, a number, or an object) into a fixed-size number. This number is called a hash value or hash code. Then, using this hash value as an index for the array, the value is stored at the array position corresponding to that index.
\\n\\nThis process is like assigning a unique seat number (hash value) to each key, and then telling it: your data is placed in the seat with this number. When you want to find this data, you just need to calculate the seat number using the same rule again, walk directly over to get it, without asking seat by seat.
\\n\\nCore Components
\\n\\nA hash table mainly consists of the following three core parts:
\\n\\n- \\n
- Key: The identifier used when storing and searching for data. For example, in a phonebook, the person's name is the key. \\n
- Value: The actual data associated with the key. In a phonebook, the phone number is the value. \\n
- Hash Function: A mathematical function that maps keys to array indices. It is the heart of the hash table. \\n
Working Principle Diagram
\\n\\nLet's explain with a simple example. Suppose we want to create a phonebook to map names to phone numbers.
\\n\\n
The figure above clearly shows the working flow of a hash table: whether inserting or searching, it starts with a key, calculates the index through the hash function, and then directly operates on that position of the array, thereby achieving high-speed access.
\\n\\nHash Function Design
\\n\\nCharacteristics of a Good Hash Function
\\n\\n| Characteristic | \\nDescription | \\nImportance | \\n
|---|---|---|
| Determinism | \\nSame input always produces the same output | \\nβββββ | \\n
| Uniform Distribution | \\nOutput is uniformly distributed within the hash table range | \\nβββββ | \\n
| Efficient Calculation | \\nCalculation process is simple and fast | \\nββββ | \\n
| Avalanche Effect | \\nSmall changes in input cause huge changes in output | \\nβββ | \\n
Common Hash Functions
\\n\\n| Type | \\nExample | \\nApplicable Scenario | \\n
|---|---|---|
| Division Modulo Method | \\nhash = key % table_size | \\n Integer Keys | \\n
| Multiplication Rounding Method | \\nhash = floor(key * A % 1 * table_size) | \\n Float Keys | \\n
| Digit Analysis Method | \\nAnalyze the distribution pattern of numbers | \\nSpecific Pattern Data | \\n
| Mid-Square Method | \\nhash = (key * key) // 10^(n/2) % table_size | \\n Number Keys | \\n
| String Hashing | \\nProcess character by character | \\nString Keys | \\n
Collision Resolution Methods
\\n\\nSeparate Chaining
\\n\\n| Characteristic | \\nDescription | \\nTime Complexity | \\n
|---|---|---|
| Principle | \\nEach hash bucket maintains a linked list | \\n\\n |
| Insert | \\nAdd to the head of the corresponding linked list | \\nO(1) | \\n
| Search | \\nSearch in the corresponding linked list | \\nO(k), k is the length of the linked list | \\n
| Delete | \\nDelete in the corresponding linked list | \\nO(k) | \\n
| Space Overhead | \\nRequires additional pointers storage | \\nLarger | \\n
Open Addressing
\\n\\n| Type | \\nProbing Sequence | \\nAdvantages | \\nDisadvantages | \\n
|---|---|---|---|
| Linear Probing | \\nh, h+1, h+2, ... | \\n Simple and easy to implement | \\nEasy to cluster | \\n
| Quadratic Probing | \\nh, h+1Β², h+2Β², ... | \\n Reduces clustering | \\nMay not probe all positions | \\n
| Double Hashing | \\nh, h+hash2(key), h+2*hash2(key), ... | \\n Uniform distribution | \\nComplex calculation | \\n
Hash Table Structure Diagram
\\n\\n
Comparison of Collision Resolution Methods
\\n\\n
\\n\\n
Deep Dive into Hash Functions
\\n\\nHash functions are the key to hash table efficiency. A good hash function should have the following characteristics:
\\n\\n- \\n
- Determinism: Same keys must always produce the same hash value. \\n
- Efficiency: Calculation speed must be fast. \\n
- Uniform Distribution: Can map different keys as evenly as possible to the entire array space, reducing collisions. \\n
A Simple Hash Function Example
\\n\\nSuppose our keys are strings, and the array length is 10. A simple hash function can be to sum the ASCII codes of each character in the string, then take the remainder modulo the array length.
\\n\\nExample
\\n\\ndef simple_hash(key, array_size):\\n\\n"""\\n A simple string hash function.\\n :param key: Input key (string)\\n :param array_size: Size of the hash table array\\n :return: Calculated array Index\\n"""\\n\\n total =0\\nfor char in key:\\n# Convert characters to their ASCII code values and sum them up\\n total +=ord(char)\\n# Take modulo of array size to ensure Index is within array bounds\\nreturn total % array_size\\n\\n# Test our hash function\\n keys_to_test =["Alice","Bob","Charlie","David"]\\n array_size =10\\nprint("Simple Hash Function Test Results:")\\nfor key in keys_to_test:\\n hash_index = simple_hash(key, array_size)\\nprint(f" Key '{key}' -> Hash Index:{hash_index}")\\n\\nOutput Result:
\\n\\nSimple Hash Function Test Results:Key 'Alice' -> Hash Index:0 Key 'Bob' -> Hash Index:8 Key 'Charlie' -> Hash Index:5 Key 'David' -> Hash Index: 3
\\n\\nThis function is very simple, but it may not be the best, because different strings may produce the same sum (e.g., ab and ba), leading to hash collisions.
\\n\\n\\n\\n
Hash Collisions and Resolution Methods
\\n\\nHash Collision refers to two or more different keys producing the same array index after being calculated by the hash function. It's like two moviegoers being assigned the same seat number, which is obviously a problem. Collisions are inevitable because the range of hash values (array size) is usually limited, while the possible keys are infinite.
\\n\\nThere are two classic methods to resolve collisions:
\\n\\nMethod 1: Separate Chaining
\\n\\nThis is the most commonly used method. Instead of placing data directly in each position of the array, it stores a linked list (or other data structures, such as red-black trees) at each array position. When a collision occurs, the new key-value pair is added to the linked list at that position.
\\n\\nProcess Description:
\\n\\nCalculate the hash value of the key to find the array index.
\\n\\n- \\n
- If the linked list at that index position is empty, create a new node to store the key-value pair. \\n
- If the linked list is not empty (collision occurs), traverse this linked list. \\n
- If the same key is found, update its value (or handle according to requirements). \\n
- If the same key is not found, add the new key-value pair to the end of the linked list. \\n
Example
\\n\\nclass HashTableChaining:\\n\\n"""Hash table implementation using separate chaining to resolve collisions."""\\n\\ndef __init__ (self, size=10):\\n"""Initialize the hash table.\\n\\n :param size: Initial size of the underlying array\\n"""\\nself.size= size\\n# Create a size of `size` list, with each element initialized to an empty list (representing a linked list)\\nself.table=[[]for _ in range(size)]\\n\\ndef _hash(self, key):\\n"""Internal hash function. Here uses Python's built-in hash function and takes the modulo."""\\nreturn hash(key) % self.size\\n\\ndef insert(self, key, value):\\n"""Insert a key-value pair into the hash table."""\\n index =self._hash(key)\\n bucket =self.table# Get the corresponding item for this Index"Bucket"οΌLinked list)\\n# Traverse the bucket to check if the key already exists\\nfor i,(k, v)in enumerate(bucket):\\nif k == key:\\n# KeyAlready exists, update the value\\n bucket=(key, value)\\nreturn\\n# key does not existοΌAdd to the end of the linked list\\n bucket.append((key, value))\\n\\ndef get(self, key):\\n"""Get the value based on the key. If the key does not exist, return None."""\\n index =self._hash(key)\\n bucket =self.table\\nfor k, v in bucket:\\nif k == key:\\nreturn v\\nreturn None# key does not exist\\n\\ndef delete(self, key):\\n"""Delete the key-value pair based on the key."""\\n index =self._hash(key)\\n bucket =self.table\\nfor i,(k, v)in enumerate(bucket):\\nif k == key:\\ndel bucket# Remove this node from the linked list\\nreturn True# Deletion successful\\nreturn False# key does not existοΌDeletion failed\\n\\ndef display(self):\\n"""Print all contents of the hash table."""\\nfor i, bucket in enumerate(self.table):\\nprint(f"Index {i}: {bucket}")\\n\\n# Usage example\\nprint("\\\\n--- Separate Chaining Hash Table Example ---")\\n phone_book =
YouTip