Fetch or get all email id from a text file in C++

Hello, we will be learning how to fetch all the email id’s present in a text(‘.txt’) file in C++ and store the fetched data in a ‘.csv’ file.

For this purpose, we’ll be using regular expressions. Regular expressions are patterns consisting of a sequence of characters, mostly used for searching purposes. For using regular expressions in C++, we need to import a header known as ‘regex’, introduced in C++11.

Fetching all email id from a text file in C++

Let’s start by writing a regular expression pattern that we’ll be using for the searching purpose. An email field generally consists of some text(lower-case, upper-case and numeric characters) including the underscore(‘_’) and the dot(‘.’) character, followed by an at(‘@’) character and some text again including at least one dot(‘.’) character. In regex, we’ll be using a ‘\\w’ meta character which is used to find all the characters ranging from ‘A-Z’, ‘a-z’, ‘0-9’ including an underscore(‘_’) character.

std::regex reg("(\\w+)(\\.)?(\\w+)@[\\w+]+(\\.(\\w+))+");

This pattern consists of three sub-parts where the first part ‘(\\w+)(\\.)?(\\w+)’  check the text before the at(‘@’) character. Breaking this down, ‘(\\w+)’ checks whether the email id consists at least one character ranging from ‘A-Z’, ‘a-z’, ‘0-9’ or an underscore(‘_’) character. The ‘(\\.)?’ pattern allows the dot(‘.’) character for zero or more times. Again ‘(\\w+)’ checks for atleast one character from ‘A-Z’, ‘a-z’, ‘0-9’ or an underscore(‘_’) character. This means that the email id must consist of at least 2 ‘(\\w)’ characters before the at(‘@’) character.

The third sub-part ‘[\\w+]+(\\.(\\w+))+’ checks for at least one ‘(\\w)’ character, a dot(‘.’) character and followed by atleast one ‘(\\w)’ character.

Algorithm:

  • Open the input text file as well as an output ‘.csv’ file
  • Reading the input text file line by line
  • Searching the regex pattern through the line and add values to the output file simultaneously
  • Closing both the input and output file

Code:

#include <iostream>
#include <regex>
#include <fstream>
#include <string>

int main() {
    std::regex reg("(\\w+)(\\.)?(\\w+)@[\\w+]+(\\.(\\w+))+");
    std::string line, emailfilename;
    std::cout << "Enter the name of the input file: ";
    std::cin >> emailfilename;
    std::ifstream emailfile(emailfilename);
    std::ofstream resultfile("output_file.csv");

    if (emailfile.is_open() && resultfile.is_open()) {
        while (getline(emailfile, line)) {
            std::sregex_iterator current(line.begin(), line.end(), reg);
            std::sregex_iterator last;

            while (current != last) {
                std::smatch match = *current;
                resultfile << match.str() << ",\n";
                current++;
            }
        }
        std::cout << "Data successfully saved in 'output_file.csv'" << std::endl;
        emailfile.close();
        resultfile.close();
    }
    else {
        std::cout << "File not opened." << std::endl;
    }

    return 0;
}

Execution:

Input file

This is a sample text file containing a email address: [email protected]
This is next line having an example email address [email protected]

Output file

[email protected],
[email protected],

See also:

Please share this blog post if you enjoyed reading it!

Leave a Reply

Your email address will not be published. Required fields are marked *