Tokenizing a string in C++
In this tutorial, we are going to learn how to tokenize a string in C++. The process of splitting a string into tokens with respect to a given delimiter is known as tokenizing.
Example:
String: “Welcome to CodeSpeedy”
Delimiter: ‘ ‘
Tokens:
Welcome to CodeSpeedy
A simple method for tokenizing a string in C++
At first, we should create empty string str and a vector v to store the tokens. The next step is to traverse through the characters of a given string. While traversing through the characters of the string if we encounter a character which is not a delimiter then we should add that character to the string str, or else if we encounter a character which is a delimiter then we should push the string str into the vector v and after pushing it into the vector, we should convert the string str into an empty string so that we can store the next token.
Note: We should not consider an empty string as a token. In a given string while traversing through two consecutive characters which are delimiters, str will become an empty string. So we should not push it into the vector containing the tokens.
Implementation:
#include <bits/stdc++.h> using namespace std; vector<string> tokenize(string s,char delimiter){ string str; vector<string> v; for(int i = 0;i < s.size();i++){ if(i != s.size()-1){ /* If the i-th character is not a delimiter then add it to the string str */ if(s[i] != delimiter){ str = str+s[i]; } else{ /* If the i-th character is a delimiter and the string str is not empty then push str into the vector containing tokens.Also make the string str empty after pushing it into the vector*/ if(str != ""){ v.push_back(str); str = ""; } } } else{ /* If the i-th character is the last character of the string and it is not a delimiter then add it to the string str */ if(s[i] != delimiter){ str = str + s[i]; } /* As there are no characters left, if the string str is not empty then push it into the vector */ if(str != ""){ v.push_back(str); } } } return v; } int main(){ string str; char delimiter; str = "Tokenizing a string in C++"; delimiter = ' '; vector<string> v; v = tokenize(str,delimiter); for(int i = 0;i < v.size();i++){ cout<<v[i]<<endl; } }
Output:
Tokenizing a string in C++
In the above code, tokenize() function takes a string and a delimiter as arguments and returns a vector containing tokens.
Tokenizing a string using strtok() function
strtok() is an inbuilt function in C++. It takes a string and a delimiter as arguments and it returns one token at a time. When the function is called it returns a pointer to the first character in the next token and if there are no tokens left it returns a NULL pointer.
#include <bits/stdc++.h> using namespace std; int main() { char str[] = "Welcome to CodeSpeedy"; char delimiter[] = " "; vector<char* > v; // Getting the first token char *token = strtok(str,delimiter); while (token != NULL) { v.push_back(token); // Getting the next token // If there are no tokens left, NULL is returned token = strtok(NULL,delimiter); } for(int i = 0;i < v.size();i++){ cout<<v[i]<<endl; } }
Output:
Welcome to CodeSpeedy
We should call the strtok() function multiple times to get all the tokens. So in the above code, the strtok() function is called inside a loop that keeps executing until a NULL pointer is returned by the strtok() function.
We hope that you got a clear idea of how to tokenize a string in C++.
Leave a Reply