Pattern Search in String Using Python – Naive Method
The following post deals with pattern search in a string i.e looking for the occurrences of a substring in a given string and displaying the starting index. It uses the naive method for the implementation.
Prerequisites: Basic idea of Python Strings and Loops
Naive Method – Pattern Search
The naive method is simply a brute force method of searching for the given substring in the main string.
The method is to start looking for each letter in the main string. If the first letter of the supplied substring matches, we start an inner loop to check if all elements from substring match with the consecutive elements in the main string. That is, we simply see if the entire substring is present or not. If it is present, we return the starting index in the main string. The algorithm does well for small strings but consumes too much time for longer ones. Nevertheless, it helps us understand the basic idea of pattern search and is a good place to start.
Naive Method Implementation in Python
Consider the following program,
def match(string,sub): l = len(string) ls = len(sub) start = sub for k in range(l-ls+1): if start==string[k]: i,j = 1,k+1 while i<ls: if sub[i]==string[j]: i += 1 j += 1 else: break else: print "Found at index",k match("AABAACAADAABAABA","AABA")
In the above program, ‘string’ is the main string and ‘sub’ is the pattern to be matched.
We start off with a for loop that goes from index 0 to l-ls index as if the first letter of substring is not found within this index, there will not be enough space to accommodate the complete substring and we can rule out the possibility. This is a very small improvement of the naive method.
If the first letters match, then we use a while loop to check if the other letter of the pattern also matches using i as the index for the pattern and j for the string. Notice the use of else for the while loop. This block is executed when the loop terminates naturally i.e. due to its condition becoming false and not because of a break statement. If the loop exits because of the condition becomes false, it means that all the letters have matched. If it ended due to the break statement, it means there was a mismatch somewhere.
Hence, under the else part, we print the index k, where the first element was found to match. Given below is the output for the above program.
Using Python’s Built-In Functions
Python offers a large number of built-in string functions. It is extremely easy to implement the above-said problem by just using them. The following code illustrates such a method,
def match(string,sub): if sub in string: ind = string.find(sub) while ind!=-1: print "Found at index",ind ind = string.find(sub,ind+1) match("AABAACAADAABAABA","AABA")
The same function, match() has been redefined using an in-built function – find()
When find() is used as <string_var>.find(<substr>,[start[,end]]) it returns the first starting index where the substring is found. It returns -1 if it is not found. start and end are used to specify the starting and ending indices of the range in the main string within which we want to find the substring.
We, therefore, check whether the substring is present at least once, using the if statement and keep finding the next occurrence by specifying start as one position after the previously found index. This is continued until the function returns -1.
The output will be the same!
Drawbacks of the Naive Method
The naive method, as mentioned before, is a brute force method and is very time consuming for long strings. It is particularly slow for cases where the substring is long and in cases like this ->main string – “AAAAAAAAAB” and pattern – “AAAAA”. For the last 5 elements, the inner loop does 4 iterations and ends up rejecting the index. Imagine the time wasted if the substring was, say 100 characters long and the mismatch is only at the last position!
More efficient algorithms exist. Refer Pattern Search in String with Rabin-Karp Algorithm in Python for instance.
Feel free to leave behind any sort of feedback, suggestions, doubts below.