Tokenizing a string in C++

In this tutorial, we are going to learn how to tokenize a string in C++. The process of splitting a string into tokens with respect to a given delimiter is known as tokenizing.

Example:
String: “Welcome to CodeSpeedy”
Delimiter: ‘ ‘
Tokens:

Welcome
to
CodeSpeedy

A simple method for tokenizing a string in C++

At first, we should create empty string str and a vector v to store the tokens. The next step is to traverse through the characters of a given string. While traversing through the characters of the string if we encounter a character which is not a delimiter then we should add that character to the string str, or else if we encounter a character which is a delimiter then we should push the string str into the vector v and after pushing it into the vector, we should convert the string str into an empty string so that we can store the next token.

Note: We should not consider an empty string as a token. In a given string while traversing through two consecutive characters which are delimiters, str will become an empty string. So we should not push it into the vector containing the tokens.

Implementation:

#include <bits/stdc++.h>
using namespace std;
vector<string> tokenize(string s,char delimiter){
    string str;
    vector<string> v;
    for(int i = 0;i < s.size();i++){
        if(i != s.size()-1){
            /* If the i-th character is not a delimiter
               then add it to the string str */
            if(s[i] != delimiter){
                str = str+s[i];
            }
            else{
                /* If the i-th character is a delimiter
                   and the string str is not empty then
                   push str into the vector containing 
                   tokens.Also make the string str empty
                   after pushing it into the vector*/
                if(str != ""){
                    v.push_back(str);
                    str = "";
                }
            }
        }
        else{
            /* If the i-th character is the last character
               of the string and it is not a delimiter
               then add it to the string str */
            if(s[i] != delimiter){
                str = str + s[i];
            }

            /* As there are no characters left, if the string
               str is not empty then push it into the vector */
            if(str != ""){
                v.push_back(str);
            }
        }
    }
    return v;
}
int main(){
    string str;
    char delimiter;
    str = "Tokenizing a string in C++";
    delimiter = ' ';
    vector<string> v;
    v = tokenize(str,delimiter);
    for(int i = 0;i < v.size();i++){
        cout<<v[i]<<endl;
    }
}

Output:

Tokenizing
a
string
in
C++

In the above code, tokenize() function takes a string and a delimiter as arguments and returns a vector containing tokens.

Tokenizing a string using strtok() function

strtok() is an inbuilt function in C++. It takes a string and a delimiter as arguments and it returns one token at a time. When the function is called it returns a pointer to the first character in the next token and if there are no tokens left it returns a NULL pointer.

#include <bits/stdc++.h>
using namespace std;
int main() 
{ 
    char str[] = "Welcome to CodeSpeedy";  
    char delimiter[] = " ";
    vector<char* > v;
    // Getting the first token
    char *token = strtok(str,delimiter); 
    while (token != NULL) 
    { 
        v.push_back(token);
        // Getting the next token
        // If there are no tokens left, NULL is returned
        token = strtok(NULL,delimiter); 
    } 
    for(int i = 0;i < v.size();i++){
        cout<<v[i]<<endl;
    }
}

 

Output:

Welcome
to
CodeSpeedy

We should call the strtok() function multiple times to get all the tokens. So in the above code, the strtok() function is called inside a loop that keeps executing until a NULL pointer is returned by the strtok() function.

 

We hope that you got a clear idea of how to tokenize a string in C++.

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *