You want to split a
delimited string into multiple strings. For example, you may want to split
the string "Name|Address|Phone
" into three separate
strings, "Name
“, "Address
“, and "Phone
“, with the delimiter
removed.
Use basic_string
’s find
member function to advance from one occurrence of the delimiter to the
next, and substr
to copy each substring out of the
original string. You can use any standard sequence to hold the results; Example 4-10 uses a vector
.
Example 4-10. Split a delimited string
#include <string> #include <vector> #include <functional> #include <iostream> using namespace std; void split(const string& s, char c, vector<string>& v) { string::size_type i = 0; string::size_type j = s.find(c); while (j != string::npos) { v.push_back(s.substr(i, j-i)); i = ++j; j = s.find(c, j); if (j == string::npos) v.push_back(s.substr(i, s.length())); } } int main() { vector<string> v; string s = "Account Name|Address 1|Address 2|City"; split(s, '|', v); for (int i = 0; i < v.size(); ++i) { cout << v[i] << '\n'; } }
Making the example above a function template that accepts any kind of character is
trivial; just parameterize the character type and change references to string
to basic_string<T>
:
template<typename T> void split(const basic_string<T>& s, T c, vector<basic_string<T> >& v) { basic_string<T>::size_type i = 0; basic_string<T>::size_type j = s.find(c); while (j != basic_string<T>::npos) { v.push_back(s.substr(i, j-i)); i = ++j; j = s.find(c, j); if (j == basic_string<T>::npos) v.push_back(s.substr(i, s.length())); } }
The logic is identical.
Tip
Notice, though, that I put an extra space between the last two right-angle brackets on the last line of the function header. You have to do this to tell the compiler that it’s not reading a right-shift operator.
Example 4-10 splits a string using a
simple algorithm. Starting at the beginning, it looks for the first occurrence of the
delimiter c
, then considers everything before it and
after the beginning the next meaningful chunk of text. The example uses the find
member function to locate the first occurrence of a
character starting at a particular index in the original string
, and substr
to copy the characters
in a range to a new string
, which is pushed onto a
vector
. This is the same behavior as the split
function in most scripting languages, and is actually a special case of
tokenizing a stream of text, which is described in Recipe 4.7.
Splitting strings based on single character delimiters is a common requirement, and it
probably won’t surprise you that it’s in the Boost String Algorithms library. It is easy
to use; see Example 4-11 to see how to
split a string with Boost’s split
function.
Example 4-11. Splitting a string with Boost
#include <iostream> #include <string> #include <list> #include <boost/algorithm/string.hpp> using namespace std; using namespace boost; int main() { string s = "one,two,three,four"; list<string> results; split(results, s, is_any_of(",")); // Note this is boost::split for (list<string>::const_iterator p = results.begin(); p != results.end(); ++p) { cout << *p << endl; } }
split
is a function template that takes three
arguments. Its declaration looks like this:
template<typename Seq, typename Coll, typename Pred> Seq& split(Seq& s, Coll& c, Pred p, token_compress_mode_type e = token_compress_off);
The types Seq
, Coll
, and Pred
, represent the types of the
result sequence, the input collection, and the predicate that will be used to determine if
something is a delimiter. The sequence argument is a sequence in the C++ standard’s
definition that contains something that can hold pieces of what is in the input
collection. So, for example, in Example
4-11 I used a list<string>
, but you
could use something else like a vector<string>
.
The collection argument is the type of the input sequence. A collection is a nonstandard
concept that is similar to a sequence, but with fewer requirements (see the Boost
documentation at www.boost.org for specifics).
The predicate argument is an unary function object or function pointer that returns a
bool
indicating whether its argument is a delimiter
or not. It will be invoked against each element in the sequence in the form f(*it)
, where it
is an
iterator that refers to an element in the sequence.
is_any_of
is a convenient function template that
comes with the String Algorithms library that makes your life easier if you are using
multiple delimiters. It constructs an unary function object that returns true
if the argument you pass in is a member of the set. In
other words:
bool b = is_any_of("abc")('a'); // b = true
This makes it easy to test for multiple delimiters without having to write the function object yourself.
Get C++ Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.