# Aho-Corasick Algorithm for Pattern Searching in C++

Hello everyone, today we are going to perform the search for a pattern in a string using Aho-Corasick Algorithm in C++.

So, why is this algorithm better than other available ways to search for a pattern? That’s because this algorithm provides a much better time complexity as opposed to others. The Aho-Corasick Algorithm can operate at the complexity of :

O(length of original string+no.of words to search+word occurrence frequency).

Thus, compared to other algorithms, it provides a much better time complexity, and thus speeds up the search.

## Aho-Corasick Algorithm for Pattern Searching in C++

Let us begin the code for the algorithm now:

As usual, begin by declaring the header files and namespace to define the scope. We may also pre-define some constants for easier readability.

```#include <bits/stdc++.h>
using namespace std;

const int S =500;
const int C =26;
int op[S],int nf[S], int gt[S];```

The constant declaration S is used to specify the length total of the letters that we wish to check. C stands for the no. of English alphabet letters. We also declare two 1-D arrays and one 2D array, where ‘op’ is for the output, ‘nf’ is for the failure and ‘gt’ is to go-to function.

Next, we define a function ‘matcher’, which makes the matching mechanism and then returns the no. of states.

First we declare the function and then review what it means.

```int matcher(string letter[], int n)
{
memset(op, 0, sizeof op);
memset(gt, -1, sizeof gt);
int states = 1;

for (int i = 0; i < n; ++i)
{ const string &word = letter[i];
int cs = 0;
for (int j = 0; j < word.size(); ++j)
{
int ch = word[j] - 'a';

if (gt[cs][ch] == -1)
gt[cs][ch] = states++;

cs = gt[cs][ch];
}
op[cs] = (1 << i);
}

for (int ch = 0; ch < C; ++ch)
if (gt[ch] == -1)
gt[ch] = 0;

memset(nf, -1, sizeof nf);

queue<int> q;

for (int ch = 0; ch < C; ++ch)
{
if (gt[ch] != 0)
{
nf[gt[ch]] = 0;
q.push(gt[ch]);
}
}

while (q.size())
{
int state = q.front();
q.pop();

for (int ch = 0; ch <= C; ++ch)
{
if (gt[state][ch] != -1)
{
int fail = nf[state];

while (gt[fail][ch] == -1)
fail = nf[fail];

fail = gt[fail][ch];
nf[gt[state][ch]] = fail;
op[gt[state][ch]] |= op[fail];

q.push(gt[state][ch]);
}
}
}

return states;
}```

The ‘memeset’ is an inbuilt function which assigns a memory block with a desired value. Thus, we preset all output values with 0 and go-to values with -1.

We then fill the go-to function, take the characters and insert it into the input and assign new nodes to the ‘ch’ variable.  Once all done, we initialize the not found function ‘nf ‘ with -1 and use a queue data structure to compute using Breadth-First Search. All nodes of depth 1 is found to possess nf=0. We remove the front state, find nf, and proceed further to finally return the states.

We next define a function to take the current state and a character input, and return the state the code goes into.

```int findnew(int cs, char ip)
{       int ans = cs;
int ch = ip - 'a';
while (gt[ans][ch] == -1)
ans = nf[ans];
return gt[ans][ch];
}
```

Now comes the most important function. This checks if the required substring is present or not.

```void search(string letter[], int k, string input)
{
matcher(letter, k);
int cs = 0;
for (int i = 0; i < input.size(); ++i)
{
cs = findnew(cs, input[i]);
if (op[cs] == 0)
continue;
for (int index = 0; index < k; ++index)
{
if (op[cs] & (1 << index))
{ cout << "The required word you want - " << letter[index] << " -starts from "<< i - letter[index].size() + 1 << " until " << i << endl;
} } } }
```

The matcher is the function we had defined above. We initialize a variable cs=0, and use a loop to scan through the main string to find the occurrences. If the substring is present, we output using a loop, else we move forward to the next character.

Once all done, we declare the main() to call the functions and run the code.

```int main()
{
string letter[] = {"co", "speed", "code", "abcd"};
string input = "codespeedy";
int n = sizeof(letter)/sizeof(letter);

search(letter, n, input);

return 0;
}```

We enter the main string ‘codespeedy’ and try to search for the words co/speed/ code / abcd.

OUTPUT

```The required word you want - co -starts from 0 until 1
The required word you want - code -starts from 0 until 3
The required word you want - speed -starts from 4 until 8

```

Now, on checking the output, we find that the words co/speed/code to be available. It says starts from 0, as an array/string starts from index 0. Since ‘abcd’ is not present, we get no output.

Thus, by using the Aho-Corasick Algorithm, we found matching patterns. I hope the tutorial is of your help.