Fetch or get all email id from a text file in C++
Hello, we will be learning how to fetch all the email id’s present in a text(‘.txt’) file in C++ and store the fetched data in a ‘.csv’ file.
For this purpose, we’ll be using regular expressions. Regular expressions are patterns consisting of a sequence of characters, mostly used for searching purposes. For using regular expressions in C++, we need to import a header known as ‘regex’, introduced in C++11.
Fetching all email id from a text file in C++
Let’s start by writing a regular expression pattern that we’ll be using for the searching purpose. An email field generally consists of some text(lower-case, upper-case and numeric characters) including the underscore(‘_’) and the dot(‘.’) character, followed by an at(‘@’) character and some text again including at least one dot(‘.’) character. In regex, we’ll be using a ‘\\w’ meta character which is used to find all the characters ranging from ‘A-Z’, ‘a-z’, ‘0-9’ including an underscore(‘_’) character.
std::regex reg("(\\w+)(\\.)?(\\w+)@[\\w+]+(\\.(\\w+))+");
This pattern consists of three sub-parts where the first part ‘(\\w+)(\\.)?(\\w+)’ check the text before the at(‘@’) character. Breaking this down, ‘(\\w+)’ checks whether the email id consists at least one character ranging from ‘A-Z’, ‘a-z’, ‘0-9’ or an underscore(‘_’) character. The ‘(\\.)?’ pattern allows the dot(‘.’) character for zero or more times. Again ‘(\\w+)’ checks for atleast one character from ‘A-Z’, ‘a-z’, ‘0-9’ or an underscore(‘_’) character. This means that the email id must consist of at least 2 ‘(\\w)’ characters before the at(‘@’) character.
The third sub-part ‘[\\w+]+(\\.(\\w+))+’ checks for at least one ‘(\\w)’ character, a dot(‘.’) character and followed by atleast one ‘(\\w)’ character.
Algorithm:
- Open the input text file as well as an output ‘.csv’ file
- Reading the input text file line by line
- Searching the regex pattern through the line and add values to the output file simultaneously
- Closing both the input and output file
Code:
#include <iostream> #include <regex> #include <fstream> #include <string> int main() { std::regex reg("(\\w+)(\\.)?(\\w+)@[\\w+]+(\\.(\\w+))+"); std::string line, emailfilename; std::cout << "Enter the name of the input file: "; std::cin >> emailfilename; std::ifstream emailfile(emailfilename); std::ofstream resultfile("output_file.csv"); if (emailfile.is_open() && resultfile.is_open()) { while (getline(emailfile, line)) { std::sregex_iterator current(line.begin(), line.end(), reg); std::sregex_iterator last; while (current != last) { std::smatch match = *current; resultfile << match.str() << ",\n"; current++; } } std::cout << "Data successfully saved in 'output_file.csv'" << std::endl; emailfile.close(); resultfile.close(); } else { std::cout << "File not opened." << std::endl; } return 0; }
Execution:
Input file
This is a sample text file containing a email address: [email protected] This is next line having an example email address [email protected]
Output file
[email protected], [email protected],
See also:
Please share this blog post if you enjoyed reading it!
Leave a Reply