Tokenizing a string in C++
In this tutorial, we are going to learn how to tokenize a string in C++. The process of splitting a string into tokens with respect to a given delimiter is known as tokenizing.
Example:
String: “Welcome to CodeSpeedy”
Delimiter: ‘ ‘
Tokens:
Welcome to CodeSpeedy
A simple method for tokenizing a string in C++
At first, we should create empty string str and a vector v to store the tokens. The next step is to traverse through the characters of a given string. While traversing through the characters of the string if we encounter a character which is not a delimiter then we should add that character to the string str, or else if we encounter a character which is a delimiter then we should push the string str into the vector v and after pushing it into the vector, we should convert the string str into an empty string so that we can store the next token.
Note: We should not consider an empty string as a token. In a given string while traversing through two consecutive characters which are delimiters, str will become an empty string. So we should not push it into the vector containing the tokens.
Implementation:
#include <bits/stdc++.h>
using namespace std;
vector<string> tokenize(string s,char delimiter){
string str;
vector<string> v;
for(int i = 0;i < s.size();i++){
if(i != s.size()-1){
/* If the i-th character is not a delimiter
then add it to the string str */
if(s[i] != delimiter){
str = str+s[i];
}
else{
/* If the i-th character is a delimiter
and the string str is not empty then
push str into the vector containing
tokens.Also make the string str empty
after pushing it into the vector*/
if(str != ""){
v.push_back(str);
str = "";
}
}
}
else{
/* If the i-th character is the last character
of the string and it is not a delimiter
then add it to the string str */
if(s[i] != delimiter){
str = str + s[i];
}
/* As there are no characters left, if the string
str is not empty then push it into the vector */
if(str != ""){
v.push_back(str);
}
}
}
return v;
}
int main(){
string str;
char delimiter;
str = "Tokenizing a string in C++";
delimiter = ' ';
vector<string> v;
v = tokenize(str,delimiter);
for(int i = 0;i < v.size();i++){
cout<<v[i]<<endl;
}
}
Output:
Tokenizing a string in C++
In the above code, tokenize() function takes a string and a delimiter as arguments and returns a vector containing tokens.
Tokenizing a string using strtok() function
strtok() is an inbuilt function in C++. It takes a string and a delimiter as arguments and it returns one token at a time. When the function is called it returns a pointer to the first character in the next token and if there are no tokens left it returns a NULL pointer.
#include <bits/stdc++.h>
using namespace std;
int main()
{
char str[] = "Welcome to CodeSpeedy";
char delimiter[] = " ";
vector<char* > v;
// Getting the first token
char *token = strtok(str,delimiter);
while (token != NULL)
{
v.push_back(token);
// Getting the next token
// If there are no tokens left, NULL is returned
token = strtok(NULL,delimiter);
}
for(int i = 0;i < v.size();i++){
cout<<v[i]<<endl;
}
}
Output:
Welcome to CodeSpeedy
We should call the strtok() function multiple times to get all the tokens. So in the above code, the strtok() function is called inside a loop that keeps executing until a NULL pointer is returned by the strtok() function.
We hope that you got a clear idea of how to tokenize a string in C++.
Leave a Reply