How to implement Navie String Searching Algorithm in Python
In this post, we will study about finding a pattern in the text. There will be the main text a substring. The goal is to find how many times at what positions the substring occurs in the text. This technique of pattern-finding helps when there is a huge text and we have to find the occurrences of some keywords or especially given words. Here, we will talk about the most basic ‘Naive String Matching Algorithm in Python’ and will further improvise it through better and shorter codes.
Naive Algorithms as the word ‘naive’ itself suggest algorithms that are very basic and simple to implement. These algorithms perform the most simple and obvious techniques to perform work just like how a child would. These methods are good to start with for the newbies before proceeding towards more efficient and complicated algorithms. A naive string searching algorithm is also one of them. It is the simplest method among other string matching/pattern-finding algorithms.
The method starts by matching the string letter by letter. It checks for the first character in the main text and the first character in the substring. If it matches it moves ahead checking the next character of both the strings. If at any place the characters don’t match the loop breaks and it starts again from the next character of the main text string.
Python code for Naive String matching algorithm
Below is the code for the Naive String matching algorithm.
def naive(txt,wrd): lt=len(txt)#length of the string lw=len(wrd)/3length of the substring(pattern) for i in range(lt-lw+1): j=0 while(j<lw): if txt[i+j]==wrd[j]: j+=1 else: break else: print('found at position',i)
In the code above the function, ‘naive’ takes two arguments txt(the main string from which the pattern is to searched) and ward(the pattern to be searched). A loop is taken from 0 to (length of string-length of substring+1) since at least the substring’s length should be left to be matched towards the end. Each character is extracted from the string through the ‘for’ loop(txt[i]). Then there is an inner while loop that matches that character with the subsequent character of the substring unless the entire substring is matched. If it is not found the loop breaks and the next iteration as in the very next character is taken out for the process. As soon as the entire substring is found the while condition gets false and the else part is executed and the position is printed. It is to be noted carefully that one else is inside the loop which gets executed only when the if the condition is false whereas the other else gets executed when the while loop condition becomes False.
Let’s try the code for multiple inputs-
- String- “”AABAACAADAABAABA”
found at position 0 found at position 9 found at position 12
- String- “1011101110”
found at position 2 found at position 6
Best Case- The best case of this method occurs when the first character of the pattern doesn’t match and hence the entire string is rejected there.
Worst Case- When all the characters or only the last character of the string and substring are different. eg-
String-“AAAAAAAAAA” & Substring-“AAA” or “AAAB”
- Using the ‘find’ in-built function in Python. This feature finds the position of the substring in the string. The second parameter given here denotes the index position it will begin its search from. Every time the string is found at a position ‘j’ is incremented and the matching starts from the next position till the entire length of the string.
def optimized_naive2(txt,wrd): if wrd in txt: j=txt.find(wrd) while(j!=-1): print("found at position",j) j=txt.find(wrd,j+1)
Both of the programs generate the same output as the previous one.