# Explain the Aho-Corasick Algorithm for Pattern Searching in Python

Aho-Corasick algorithm, a type of Dictionary-matching algorithm. This algorithm can be helpful to find word or words appearing from a set of keywords or data we feed. Aho-Corasick algorithm is a fast and efficient way of finding word and location. As because we want things to be found easily and efficiently so it finds the easy way of finding through the dictionary we provide.

### Introduction: More about Aho-Corasick Algorithm

Aho-Corasick is a string searching algorithm and was invented by Alfred V. Aho and Margaret J. Corasick.

Aho-Corasick algorithm makes a state machine and also uses a TRIE concept.

The algorithm is implemented by using a tree data structure. Once we make the tree, what it does is, it coverts or try to convert tree in the form of automaton which helps to take linear time to complete or make the search.

Aho-Corasick Algorithm is subdivided into three major phases:

• Go(go to the main function and set points and the main root)
• Output(The results or outcome after the condition matches and availability matches)
• Failure (If the keyword does not match then it failed to count it)

Go stage: It forms the tree with the help of provided keywords, the data which we feed in as recognisable pattern/design.

Failure stage: It seeks to look for the backward transformation to get appropriate appendix of keywords.

Output stage: It searches all word which ends at a particular state. Suppose it ends at the state ‘a’ for each state ‘a’ of the automaton.

### Time complexity

If we talk about the time complexity of this algorithm.

Let L is the length of entered Text, let B is the length of keywords (dataset), let X be the number of possible matches or matches made.

Then, the time complexity of Algorithm is: O(L+B+X)

### Input and Output

How it actually works:

Input:

Sample set : {‘you’, ‘are’, ‘good’}

Searched string: “youarenotkindbutgoodboy”

Output:

The word ‘you’ found at position :0

The word ‘are’ found ate position :3

The word ‘good’ found at position: 16

In Python, it works with the help of a Trie. We can also learn it from the technique of a Trie.

### Code (Python)

The CODE is mainly divided into four sections:

1. We create the tree (defining function)
2. We create the state transition (defining function)
3. We now find the string given as input (defining function)
4. The main section where we give the design (patterns) that is the data set we provide the system and the input string to search and then we call the function.

Below is our Python program for Aho-Corasick Algorithm for Pattern Searching:

```print "Hello, World!"
class Ahomain:    #We create class for Aho-Corasick
def __init__(self):   #constructor with its
self.go = {}
self.out = []
self.breaks = None

def aho_treeform(list1):  #creating the tree
main = Ahomain()      # Object of Aho class

for way in list1:
point = main
for sym in way:
point = point.go.setdefault(sym,
Ahomain())
point.out.append(way)
return main
def aho_state_transition(list1):  #state transition
main = aho_treeform(list1)    # machine
queue = []
for point in main.go.itervalues():
queue.append(point)
point.breaks = main

while len(queue) > 0:
rightpoint = queue.pop(0)

for clue,uniquepoint in rightpoint.go.iteritems():
queue.append(uniquepoint)
firstpoint = rightpoint.breaks
while firstpoint != None and not firstpoint.go.has_key(clue):
firstpoint = firstpoint.breaks
uniquepoint.breaks = firstpoint.go[clue] if firstpoint else main
uniquepoint.out += uniquepoint.breaks.out

return main

def aho_search(y, main, call):  #searching the input
point = main

for i in xrange(len(y)):
while point != None and not point.go.has_key(y[i]):
point = point.breaks
if point == None:
point = main
continue
point = point.go[y[i]]
call(i - len(design) + 1, design)
def found(loc, list1):    #printing the results
print "The Design found at position %s, found­ pattern: %s" % (loc, list1)

list1 = ['a', 'ab', 'aa', 'abc', 'bc', 'bca', 'cc', 'c', 'cba', 'cab']
y = "abcbaacab"
main = aho_state_transition(list1)
aho_search(y, main, found)```

### Output :

```Hello, World!
The design found at position 0, found pattern: a
The design found at position 0, found pattern: ab
The Design found at position 0, found pattern: abc
The design found at position 1, found pattern: bc
The Design found at position 2, found pattern: c
The Design found at position 2, found pattern: cba
The Design found at position 4, found pattern: a
The Design found at position 4, found pattern: aa
The Design found at position 5, found pattern: a
The Design found at position 6, found pattern: c
The Design found at position 7, found pattern: a
The Design found at position 6, found pattern: cab
The Design found at position 7, found pattern: ab```

We can also use the input function after we have written the found function. We can take input for the list1 by the help of loop, enter the data and it will append in the list.

We can also use input for the ‘y’ from the user.

y=input (“Enter the string to search”). In this case, we used a, b, c. But we can add more to the list and search it from the input.

If this code does not run on your Python IDLE or computer.

Just make some changes.

1. Instead of using d.itervalues and d. iteritems , we can also write iter(d.values()) and iter(d.items()) . This happens because some version nay does not support some function of Python.

2. Instead of using .has_key we can use ‘ in ‘ . Example: go.has_key(clue) can be written as go in(clue).

3. We can use a range instead of writing range.

4. You may face the error of indentation or outer scope than you should declare list 1 and main as global.

And rest we can change variable names, no more changes required

If you want to know more visit the Python documentation site, If you face version compatibility errors.  https://docs.python.org/3.1/whatsnew/3.0.html

### 3 responses to “Explain the Aho-Corasick Algorithm for Pattern Searching in Python”

1. hinette says:

as much as this is helpful , I wish you commented the code for more details 🙁

• Taanvi goyal says:

Hello, I didn’t understand your query. Is there any error in the code? Should we make some changes?