YouTip LogoYouTip

Python Anagram Strings

## Python: How to Check If Two Strings Are Anagrams An **anagram** is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once. For example, the words **"listen"** and **"silent"** are anagrams of each other because they contain the exact same characters with the same frequencies, just in a different order. In this tutorial, we will explore how to implement an anagram checker in Python, analyze the underlying logic, and discuss alternative high-performance approaches. --- ## 1. Core Concept & Logic To determine if two strings are anagrams, we must verify that: 1. They contain the exact same characters. 2. Each character appears the same number of times in both strings. 3. (Optional but recommended) The comparison ignores case sensitivity and whitespace, depending on your requirements. --- ## 2. Standard Implementation: Sorting Method The most intuitive way to check for anagrams is to clean the strings (remove spaces and convert to lowercase), sort their characters, and compare the sorted results. If the sorted lists of characters are identical, the strings are anagrams. ### Code Example ```python def is_anagram(str1, str2): # Remove whitespace and convert to lowercase for a case-insensitive comparison str1 = str1.replace(" ", "").lower() str2 = str2.replace(" ", "").lower() # If the lengths of the cleaned strings are different, they cannot be anagrams if len(str1) != len(str2): return False # Sort both strings and compare them return sorted(str1) == sorted(str2) # Test Cases print(is_anagram("listen", "silent")) # Output: True print(is_anagram("apple", "pale")) # Output: False print(is_anagram("Astronomer", "Moon starer")) # Output: True ``` ### Code Explanation 1. **Preprocessing (`replace()` and `lower()`)**: `str1.replace(" ", "").lower()` removes all spaces and converts the string to lowercase. This ensures that phrases like `"Moon starer"` and `"Astronomer"` are correctly identified as anagrams. 2. **Length Check (`len()`)**: If the two processed strings do not have the same length, they cannot contain the same set of characters. Checking this upfront allows the function to return `False` immediately, saving computation time. 3. **Sorting and Comparison (`sorted()`)**: The `sorted()` function returns a sorted list of characters from the string. If `sorted(str1)` equals `sorted(str2)`, it means both strings share the exact same character composition. ### Output ```text True False True ``` --- ## 3. Alternative High-Performance Method: Frequency Counting While the sorting method is simple and elegant, its time complexity is $O(N \log N)$ due to the sorting step. For a more optimal $O(N)$ time complexity, we can count the occurrences of each character using Python's built-in `collections.Counter`. ### Code Example ```python from collections import Counter def is_anagram_optimized(str1, str2): # Clean the strings str1 = str1.replace(" ", "").lower() str2 = str2.replace(" ", "").lower() # Compare character frequency counts return Counter(str1) == Counter(str2) # Test Cases print(is_anagram_optimized("conversation", "voices rant on")) # Output: True print(is_anagram_optimized("hello", "billion")) # Output: False ``` ### Why use `Counter`? * **Time Complexity**: $O(N)$, where $N$ is the length of the strings. We only traverse the strings to count character frequencies. * **Space Complexity**: $O(K)$, where $K$ is the number of unique characters stored in the hash map (dictionary). --- ## 4. Considerations & Best Practices When implementing an anagram checker in production, keep the following edge cases in mind: * **Case Sensitivity**: Decide whether `"Listen"` and `"silent"` should be considered anagrams. If yes, always apply `.lower()` or `.casefold()` before comparison. * **Whitespace and Punctuation**: In wordplay, anagrams often ignore spaces and punctuation (e.g., `"A gentleman"` and `"Elegant man"`). Use `.replace(" ", "")` or regular expressions (`re`) to strip out non-alphanumeric characters if necessary. * **Unicode and Diacritics**: If your application supports multiple languages, characters with accents (like `Γ©` vs `e`) might need normalization using Python's `unicodedata` module to ensure accurate comparisons.
← Python List IntersectionPython Reverse List Elements β†’