Program to Demonstrate Huffman Coding in C++

In this tutorial, we are going to learn about the Program to Demonstrate Huffman Coding in C++. Firstly there is an introduction of Huffman coding. Then implementation of the program using c++.

Introduction

It is a technique of lossless data encoding algorithm. It works on sorting numerical values from a set order of frequency. The least frequent numbers are gradually removed via the Huffman tree, which adds the two lowest frequencies from the sorted list in every new “branch”. Then sum replaces the two eliminated lower frequency values in the sorted array. Each time a new branch is created, it moves the general direction of the tree either to the right (for higher values) or the left (for lower values). When the sorted list is complete and the tree is complete, the final value is zero if the tree terminates on a left number, or it is one if it terminates on the right. This is a brief introduction to Huffman coding in the next section we are going to see the code.

Huffman Coding in C++

#include<iostream>
#include<vector>
#include<string>
using namespace std;
struct node
{

node * leftChild;
node * rightChild;
double frequency;
string content;
string code;

};

vector<node> nodeArray;

node  extractMin()
{
double temp = (double) INT_MAX;
vector<node>::iterator i1,pos;
for(i1 = nodeArray.begin();i1!=nodeArray.end();i1++)
{

if(temp>(*i1).frequency)
{
pos = i1;
temp = (*i1).frequency;
}
}

node tempNode = (*pos);
nodeArray.erase(pos);

return tempNode;
}

node getHuffmanTree()
{

while(!nodeArray.empty())
{

node * tempNode = new node;
node * tempNode1 = new node;
node * tempNode2 = new node;
*tempNode1 = extractMin();
*tempNode2 = extractMin();

tempNode->leftChild = tempNode1;
tempNode->rightChild = tempNode2;
tempNode->frequency = tempNode1->frequency+tempNode2->frequency;
nodeArray.push_back(*tempNode);
if(nodeArray.size() == 1)//only the root node exsits
{
break;
}
}
return nodeArray;
}

void BFS(node * temproot,string s)
{
node * root1 = new node;
root1 = temproot;

root1->code = s;
if(root1 == NULL)
{

}
else if(root1->leftChild == NULL && root1->rightChild == NULL)
{

cout<<"the content is "<<root1->content<<endl;
cout<<"and its corresponding code is "<<root1->code<<endl;
}
else
{

root1->leftChild->code = s.append("0");
s.erase(s.end()-1);
root1->rightChild->code = s.append("1");
s.erase(s.end()-1);

BFS(root1->leftChild,s.append("0"));
s.erase(s.end()-1);
BFS(root1->rightChild,s.append("1"));
s.erase(s.end()-1);
}

}

void getHuffmanCode()
{
int size,i;
double tempDouble;
string tempString = "";

cout<<"please input the number of things you want to encode!"<<endl;
cin>>size;

for(i = 0;i<size;i++)
{
cout<<"please input the things you want to encoded and their frequencies!"<<endl;
node tempNode;
cin>>tempString;
cin>>tempDouble;

tempNode.frequency = tempDouble;
tempNode.content = tempString;
tempNode.leftChild = NULL;
tempNode.rightChild = NULL;
nodeArray.push_back(tempNode);
}

node root = getHuffmanTree();

BFS(&root,"");

}

int main()
{

vector<int> test;
test.push_back(1);
test.push_back(2);
test.push_back(3);
test.push_back(4);
vector<int>::iterator i1 = test.begin();
test.erase(i1);

getHuffmanCode();
return 0;
}

INPUT and OUTPUT:

Enter the number of inputs:
4
Charater you want to encode with its frequencies:
a 100
Charater you want to encode with its frequencies:
b 50
Charater you want to encode with its frequencies:
c 20
Charater you want to encode with its frequencies:
d 10
Character : d
Code :  000
Character : c
Code :  001
Character : b
Code :  01
Character : a
Code :  1

Explanation:

Firstly, the program takes the number of characters to be taken in the input.

Secondly, characters and their frequency is entered. Frequency is the number of time the character appears.

The way to save memory in this program is to substitute each occurrence of the character with a binary code. The character with max. occurrences are replaced with the smallest code. By this process, memory used by the code is saved.

Therefore Huffman coding is very popular because it compresses data without any loss.

Finally, the output shows the character with there binary code.

Also Checkout:

Boundary traversal of the Binary tree in C++