split
The code below splits a given string s
by any character in delim
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#include <iostream>
#include <string>
#include <vector>
#include <cstring>
using namespace std;
// s: the string that needs splitting.
// delim: the delimiters. Each containing charater(not the whole delim) is used to split s.
// return a vector, which contains all the tokens
vector<string> split(const string& s, string delim){
//char* pch = strtok(strdup(s.c_str()), delim.c_str());
char* pch = strtok(const_cast<char*>(s.c_str()), delim.c_str());
vector<string> tokens;
while(pch != NULL){
string str(pch);
tokens.push_back(str);
pch = strtok(NULL, delim.c_str());
}
//free(pch); // must be called if strdup is used
return tokens;
}
I adopt strtok(char* str, const char * delimiters)
, which is in <cstring>
. It accepts char*
as the type of its first parameter. However, string::c_str()
returns a pointer of type const char*
. There are two ways to address the mismatch:
- use
strdup()
, which accepts aconst char *
and returns achar *
- use const_cast<> to cast
const char *
tochar*
I tested these two methods. I split “this is for test lalal” by whitespace for 1000 times for each of the two method. Here is the total elapsed time (both including fileIO):
I also copied the program for 1000 times into a file and split each line in that file by whitespace. Here is the result:
It seems that for small strings, using strdup()
is more efficient. However, strdup()
is not a standard C++ function but a well-known POSIX function. What’s more, strdup()
makes a duplication of the original string, which means that you should use free()
to release the memory after.
I tested these two versions further, and found that their performance in efficiency are quite close. I think the previous results may contain bias. Anyway, choose either one you like.