Python Longest Substring
## Finding the Longest Common Substring in Python
The **Longest Common Substring (LCS)** problem is a classic computer science challenge. Given two strings, the goal is to find the longest sequence of characters that appears in both strings in the exact same order and consecutively.
Unlike *subsequences*, which do not require characters to be adjacent, *substrings* must be contiguous.
In this tutorial, we will explore how to solve this problem efficiently in Python using **Dynamic Programming (DP)**.
---
## The Dynamic Programming Approach
Dynamic programming solves complex problems by breaking them down into simpler subproblems. For the Longest Common Substring problem, we can define a 2D array (or table) `dp` where:
* `dp` represents the length of the longest common suffix of the substrings `s1[0...i-1]` and `s2[0...j-1]`.
### State Transition Formula
1. If the characters at the current positions match (`s1 == s2`), we extend the common substring found so far:
$$\text{dp} = \text{dp} + 1$$
2. If the characters do not match (`s1 != s2`), the continuity is broken, so we reset the length:
$$\text{dp} = 0$$
By keeping track of the maximum value in this `dp` table and its position, we can easily extract the longest common substring.
---
## Python Implementation
Below is the complete Python implementation using the dynamic programming approach:
```python
def longest_common_substring(s1: str, s2: str) -> str:
m = len(s1)
n = len(s2)
# Create a 2D array initialized with 0s to store lengths of common substrings
dp = [ * (n + 1) for _ in range(m + 1)]
max_length = 0 # Keeps track of the length of the longest common substring
end_pos = 0 # Keeps track of the ending index of the longest common substring in s1
# Fill the DP table
for i in range(1, m + 1):
for j in range(1, n + 1):
if s1 == s2:
dp = dp + 1
# Update the maximum length and ending position if a longer substring is found
if dp > max_length:
max_length = dp
end_pos = i
else:
dp = 0
# Extract and return the longest common substring using slicing
return s1[end_pos - max_length:end_pos]
# Test the function
s1 = "abcdef"
s2 = "zbcdf"
result = longest_common_substring(s1, s2)
print("The longest common substring is:", result)
```
### Output
```text
The longest common substring is: bcd
```
---
## Code Explanation
1. **DP Table Initialization**: We initialize a 2D list `dp` of size `(m + 1) x (n + 1)` with zeros. The extra row and column handle the base cases where one of the strings is empty.
2. **Nested Loops**: We iterate through each character of `s1` (index `i`) and `s2` (index `j`).
3. **Character Comparison**:
* If `s1 == s2`, we add `1` to the diagonal value `dp`.
* If they do not match, we set `dp` to `0`.
4. **Tracking the Maximum**: The variable `max_length` stores the length of the longest match found so far, and `end_pos` stores the ending index in `s1`.
5. **String Slicing**: Finally, we use Python's slice notation `s1[end_pos - max_length : end_pos]` to extract and return the actual substring.
---
## Complexity Analysis
* **Time Complexity**: $\mathcal{O}(m \times n)$, where $m$ and $n$ are the lengths of the two strings. We use nested loops to traverse the DP table of size $(m+1) \times (n+1)$.
* **Space Complexity**: $\mathcal{O}(m \times n)$ to store the 2D DP table.
> **Tip:** If you only need the *length* of the longest common substring and not the substring itself, the space complexity can be optimized to $\mathcal{O}(n)$ by keeping track of only the current and previous rows of the DP table.
YouTip