Python Anagram Check
## Python Anagram Check: How to Determine if Two Strings are Anagrams
An **anagram** is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once. For example, the words **"listen"** and **"silent"** are anagrams of each other.
In Python, we can determine if two strings are anagrams by comparing their character compositions. This tutorial covers the most common and efficient ways to perform an anagram check in Python.
---
## Method 1: Using the Sorting Approach
The most intuitive way to check if two strings are anagrams is to sort both strings and compare them. If they contain the exact same characters with the exact same frequencies, their sorted representations will be identical.
### Code Example
```python
def is_anagram(str1, str2):
# Remove whitespace and convert strings to lowercase
str1 = str1.replace(" ", "").lower()
str2 = str2.replace(" ", "").lower()
# If lengths are different, they cannot be anagrams
if len(str1) != len(str2):
return False
# Sort the characters of both strings and compare
return sorted(str1) == sorted(str2)
# Test cases
print(is_anagram("listen", "silent")) # Output: True
print(is_anagram("triangle", "integral")) # Output: True
print(is_anagram("apple", "pale")) # Output: False
```
### Code Explanation
1. **Normalization**: `str1.replace(" ", "").lower()` removes any spaces and converts the string to lowercase. This ensures that the comparison is case-insensitive and ignores spacing (which is useful for multi-word anagrams like "New York" and "nowhere").
2. **Length Check**: `if len(str1) != len(str2):` acts as a guard clause. If the two processed strings do not have the same length, they cannot be anagrams, allowing the function to return `False` early.
3. **Sorting and Comparison**: `sorted(str1) == sorted(str2)` converts the strings into sorted lists of characters (e.g., `"listen"` becomes `['e', 'i', 'l', 'n', 's', 't']`). If the sorted lists are identical, the function returns `True`.
### Output
```text
True
True
False
```
---
## Method 2: Using `collections.Counter` (Optimal Performance)
While the sorting method is simple and clean, its time complexity is $O(N \log N)$ due to the sorting step.
For a more efficient $O(N)$ linear time complexity solution, you can use Python's built-in `Counter` class from the `collections` module. This counts the frequency of each character in both strings and compares the counts.
### Code Example
```python
from collections import Counter
def is_anagram_optimized(str1, str2):
# Remove whitespace and convert to lowercase
str1 = str1.replace(" ", "").lower()
str2 = str2.replace(" ", "").lower()
# Compare character frequencies
return Counter(str1) == Counter(str2)
# Test cases
print(is_anagram_optimized("astronomer", "moon starer")) # Output: True
print(is_anagram_optimized("hello", "billion")) # Output: False
```
### Why use `Counter`?
* **Time Complexity**: $O(N)$ because it only requires a single pass through each string to count character occurrences.
* **Space Complexity**: $O(K)$ where $K$ is the number of unique characters in the strings.
---
## Considerations & Best Practices
When implementing an anagram check in real-world applications, consider the following edge cases:
* **Case Sensitivity**: Decide whether "Listen" and "Silent" should be considered anagrams. Standard practice is to convert both strings to lowercase using `.lower()` or `.casefold()` before comparison.
* **Whitespace and Punctuation**: Phrases like "Dormitory" and "Dirty room" are anagrams. You should strip out spaces and punctuation marks if you want to support multi-word phrase anagrams.
* **Character Encoding**: If you are dealing with Unicode characters (e.g., accented letters like `Γ©` or non-Latin scripts), ensure proper normalization using the `unicodedata` module if necessary.
YouTip