C Examples Remove Characters String
## C Program to Remove All Characters Except Alphabets from a String
In C programming, manipulating strings is a fundamental skill. A common task is filtering a string to remove unwanted charactersβsuch as numbers, spaces, and special symbolsβleaving only alphabetic characters ($a-z$ and $A-Z$).
This tutorial demonstrates how to remove non-alphabetic characters from a string using two different approaches:
1. **The In-Place Shifting Method** (as shown in the classic example).
2. **The Two-Pointer (Read/Write) Method** (a highly optimized, industry-standard approach).
---
## Method 1: In-Place Shifting (Classic Approach)
This method iterates through the string. When it encounters a non-alphabetic character, it shifts all subsequent characters one position to the left to overwrite the unwanted character.
### C Source Code
```c
#include
int main()
{
char line;
int i, j;
printf("Enter a string: ");
// Using fgets to safely read string input including spaces
fgets(line, sizeof(line), stdin);
for (i = 0; line != '\0'; ++i)
{
// While the current character is not an alphabet and not the null terminator
while (!( (line >= 'a' && line <= 'z') || (line >= 'A' && line <= 'Z') || line == '\0' ))
{
// Shift all characters after the current index one position to the left
for (j = i; line != '\0'; ++j)
{
line = line[j + 1];
}
line = '\0';
}
}
printf("Output string: ");
puts(line);
return 0;
}
```
### Output
```text
Enter a string: run4#$1oob
Output string: runoob
```
### How It Works
1. **Input Reading**: `fgets(line, sizeof(line), stdin)` reads a line of text from standard input safely, preventing buffer overflow.
2. **Outer Loop (`i`)**: Traverses the string character by character.
3. **Condition Check**: The `while` loop checks if the character at `line` is **not** a lowercase letter, **not** an uppercase letter, and **not** the null terminator (`\0`).
4. **Shifting Loop (`j`)**: If a non-alphabetic character is found, the inner `for` loop shifts all subsequent characters one position to the left, effectively overwriting the invalid character.
5. **Re-evaluation**: Because the character at index `i` was replaced by the next character, the `while` loop checks the new character at index `i` again before the outer loop increments `i`.
---
## Method 2: Two-Pointer Method (Optimized Approach)
While Method 1 is easy to conceptualize, shifting the entire tail of the string repeatedly results in a worst-case time complexity of $\mathcal{O}(N^2)$.
A more efficient, industry-standard approach uses **two pointers** (a `read` index and a `write` index). This processes the string in a single pass with a time complexity of $\mathcal{O}(N)$.
### C Source Code
```c
#include
#include
int main()
{
char line;
int write_index = 0;
printf("Enter a string: ");
fgets(line, sizeof(line), stdin);
// Process the string using a single pass
for (int read_index = 0; line != '\0'; read_index++)
{
// isalpha() from checks if a character is alphabetic (a-z, A-Z)
if (isalpha((unsigned char)line))
{
line = line;
write_index++;
}
}
// Null-terminate the newly filtered string
line = '\0';
printf("Output string: ");
puts(line);
return 0;
}
```
### Why Method 2 is Preferred
* **Performance**: It runs in $\mathcal{O}(N)$ time because it visits each character exactly once.
* **Standard Library Helpers**: It utilizes `isalpha()` from ``, which makes the code cleaner, more readable, and automatically compliant with different locales.
* **No Redundant Writes**: It avoids shifting blocks of memory repeatedly.
---
## Key Considerations
1. **Handling Newline Characters**:
When you press Enter, `fgets()` includes the newline character (`\n`) at the end of the string. Both methods above automatically strip `\n` because it is not an alphabetic character.
2. **Buffer Overflow Prevention**:
Always use `fgets()` instead of `gets()`. `gets()` does not check for buffer limits and is highly vulnerable to security exploits.
3. **The `isalpha()` Cast**:
When using functions from ``, it is best practice to cast the argument to `(unsigned char)` to prevent undefined behavior with negative `char` values (e.g., extended ASCII characters).
YouTip