YouTip LogoYouTip

Python Word Length

## Python: How to Calculate Word Lengths in a String In Python, analyzing text data is a common task in natural language processing (NLP), data cleaning, and text analytics. One fundamental operation is splitting a string into individual words and calculating the length of each word. This tutorial demonstrates how to write a clean, efficient Python program to count the length of each word in a given string and map the results into a structured format. --- ### Method Overview To calculate the length of each word in a string, we follow a simple three-step process: 1. **Tokenization**: Split the input string into a list of individual words using the `.split()` method. 2. **Length Calculation**: Iterate through the list of words and calculate the length of each word using the built-in `len()` function. 3. **Mapping**: Pair each word with its corresponding length. We can store this mapping in a dictionary where the keys are the words and the values are their respective lengths. --- ### Code Example Here is a complete Python implementation using a dictionary comprehension and standard built-in functions: ```python def get_word_lengths(text_string): # Split the string into a list of words based on whitespace words = text_string.split() # Calculate the length of each word lengths = [len(word) for word in words] # Combine words and lengths into a dictionary return dict(zip(words, lengths)) # Example string text = "Hello world this is a test" result = get_word_lengths(text) print(result) ``` #### Output: ```python {'Hello': 5, 'world': 5, 'this': 4, 'is': 2, 'a': 1, 'test': 4} ``` --- ### Code Explanation 1. **`text_string.split()`**: By default, the `.split()` method splits a string by any consecutive whitespace (spaces, tabs, newlines). This converts our raw string into a list of words: `['Hello', 'world', 'this', 'is', 'a', 'test']`. 2. **`[len(word) for word in words]`**: This is a **list comprehension**. It iterates through each word in the `words` list, calculates its length using `len()`, and returns a new list of integers: `[5, 5, 4, 2, 1, 4]`. 3. **`zip(words, lengths)`**: The `zip()` function pairs elements from the `words` list and the `lengths` list together into tuples: `(('Hello', 5), ('world', 5), ...)`. 4. **`dict(...)`**: The `dict()` constructor converts the zipped pairs into a key-value dictionary, making it easy to look up the length of any specific word. --- ### Alternative & Optimized Approaches While the `zip()` method is highly readable, Python offers other elegant ways to achieve the same result. #### 1. Using Dictionary Comprehension (Recommended) You can combine the splitting and mapping steps into a single, highly optimized line using a **dictionary comprehension**: ```python text = "Hello world this is a test" # One-liner dictionary comprehension word_lengths = {word: len(word) for word in text.split()} print(word_lengths) # Output: {'Hello': 5, 'world': 5, 'this': 4, 'is': 2, 'a': 1, 'test': 4} ``` #### 2. Handling Punctuation In real-world scenarios, strings often contain punctuation marks (like commas, periods, or exclamation points). If you do not strip them, they will be counted as part of the word length (e.g., `"world,"` would have a length of 6 instead of 5). You can clean the text using the `string.punctuation` module: ```python import string def clean_word_lengths(text_string): # Remove punctuation from the text cleaned_text = text_string.translate(str.maketrans('', '', string.punctuation)) # Generate the word length dictionary return {word: len(word) for word in cleaned_text.split()} text_with_punctuation = "Hello, world! This is a test." print(clean_word_lengths(text_with_punctuation)) # Output: {'Hello': 5, 'world': 5, 'This': 4, 'is': 2, 'a': 1, 'test': 4} ``` --- ### Considerations * **Duplicate Words**: Because dictionary keys must be unique, if a word appears multiple times in the input string, the dictionary will only keep a single entry for that word. * **Case Sensitivity**: Words with different casing (e.g., `"Test"` and `"test"`) will be treated as separate keys in the dictionary. If you want case-insensitive results, convert the string to lowercase using `.lower()` before splitting: `text.lower().split()`.
← Python Most Frequent LetterPython Ascending List β†’