Repeated Substring Pattern: Leetcode tough KMP and Robin Karp Algorithm

1. Knuth-Morris-Pratt Algorithm and

2. Divisors + Rabin-Karp

Given a non-empty string check if it can be constructed by taking a substring of it and appending multiple copies of the substring together. You may assume the given string consists of lowercase English letters only and its length will not exceed 10000.

Input: "abab"
Output: True
Explanation: It's the substring "ab" twice.
Input: "aba"
Output: False
Input: "abcabcabcabc"
Output: True
Explanation: It's the substring "abc" four times. (And the substring "abcabc" twice.)
class Solution:
def repeatedSubstringPattern(self, s: str) -> bool:
# 11:00 am 11:14 am

if not s: return True
if len(s) == 1: return False

# find the index of the last character:
susp = ""
for i, x in enumerate(s):
if s[-1] == x:
susp = s[:i+1]
break
if susp == "":
return False
i = 0
while i < len(s):
if s[i:i+len(susp)] != susp:
return False
else:
i += len(susp)

return True

Solution

Overview

The problem could be solved in many ways. Easy approaches have O(N2)O(N2)time complexity, though one could improve it by using one of string searching algorithms.

Approach 3: Find Divisors + Rabin-Karp

Rabin-Karp

  • Check hash of the string in the sliding window.
  • the second string is L..2L
  • the last string is N - L..N

Side note. The good practice is to verify the equality of two substrings after the hash match. This logic is not hard to add, and it could bring you kudos during the interview.

Implementation

class Solution:
def repeatedSubstringPattern(self, s: str) -> bool:
n = len(s)
if n < 2:
return False
if n == 2:
return s[0] == s[1]

for i in range(int(n**0.5), 0, -1):
if n % i == 0:
divisors = [i]
if i != 1:
divisors.append(n // i)
for l in divisors:
first_hash = curr_hash = hash(s[:l])
start = l
while start != n and curr_hash == first_hash:
curr_hash = hash(s[start:start + l])
start += l
if start == n and curr_hash == first_hash:
return True

return False
  • Space complexity: up to O(sqrt(N)) to keep a copy of each substring during the hash computation.

Approach 4: Knuth-Morris-Pratt Algorithm (KMP)

Algorithm : https://www.youtube.com/watch?v=GTJr8OvyEVQ and the Longest prefix array algorithm is https://www.youtube.com/watch?v=KG44VoDtsAA

Rabin-Karp is the best fit for the multiple pattern search, whereas KMP is typically used for the single pattern search.

The key to KMP is the partial match table (PMT), often called lookup table(LP, or failure function table or longest prefix array. It stores the length of the longest prefix that is also a suffix.

First use case of a pattern
2nd use case of the pattern
Repeated pattern string
Not a repeated pattern string.
class Solution:
def repeatedSubstringPattern(self, s: str) -> bool:
n = len(s)
dp = [0] * n
# Construct partial match table (lookup table).
# It stores the length of the proper prefix that is also a proper suffix.
# ex. ababa --> [0, 0, 1, 2, 1]
# ab --> the length of common prefix / suffix = 0
# aba --> the length of common prefix / suffix = 1
# abab --> the length of common prefix / suffix = 2
# ababa --> the length of common prefix / suffix = 1
for i in range(1, n):
j = dp[i - 1]
while j > 0 and s[i] != s[j]:
j = dp[j - 1]
if s[i] == s[j]:
j += 1
dp[i] = j
l = dp[n - 1]
# check if it's repeated pattern string
return l != 0 and n % (n - l) == 0
class Solution:
def repeatedSubstringPattern(self, s: str) -> bool:
# 11:00 am 11:14 am
# took me like 3 hours to learn the algorithm first and then see how can I apply it here
longest_prefix_array = [0] * len(s)

# the substring length will be the value of longest_prefix_array[-1]
# following algorihtm https://www.youtube.com/watch?v=KG44VoDtsAA
j = 0
i = 1
while i < len(s):
if s[j] == s[i]:
longest_prefix_array[i] = j + 1
j += 1
i += 1
else:
if j != 0:
j = longest_prefix_array[j -1]
else:
longest_prefix_array[i] = 0
i += 1
# length of the substring is
l = longest_prefix_array[-1]
print("substring" , s[0:l])
n = len(s)

# check if it's repeated pattern string
# if this length is nonzero and n - l is a divisor of n
print(len(s) % (len(s) - l) == 0)

return l != 0 and n % (n - l) == 0
  • Space complexity: O(N) to keep the lookup table.

Other approaches :

Approach 1: Regex

To use regex during the interviews is like to use built-in functions, the community has no single opinion about it yet, and it’s a sort of risk.

import re
class Solution:
def repeatedSubstringPattern(self, s: str) -> bool:
pattern = re.compile(r'^(.+)\1+$')
return pattern.match(s)
  • The difference between the greedy and the non-greedy match is the following:
  • the non-greedy match will try to match as few repetitions of the quantified pattern as possible.
  • the greedy match will try to match as many repetitions as possible.
  • The worst-case situation here is to check all possible pattern lengths from N to 1 that would result in O(N-square) time complexity.
  • Space complexity: O(1). We don’t use any additional data structures, and everything depends on internal regex implementation, which is evolving quite fast nowadays. If you’re interested to dig depeer, here is a famous article by Russ Cox which inspired a lot of discussions and code changes in Python community.

Approach 2: Concatenation

Repeated pattern string looks like PatternPattern, and the others like Pattern1Pattern2.

class Solution:
def repeatedSubstringPattern(self, s: str) -> bool:
return s in (s + s)[1: -1]
  • Space complexity: O(N), the space is implicitly used to keep s + sstring.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store