libostd
Files | Classes | Typedefs | Enumerations | Functions
Strings

Provides string processing extensions. More...

Files

file  format.hh
 APIs for type safe formatting using C-style format strings.
 
file  string.hh
 String slice implementation as well as other utilities.
 

Classes

struct  ostd::format_error
 Thrown when format string does not properly match the arguments. More...
 
struct  ostd::format_traits< typename >
 Specialize this to format custom objects. More...
 
struct  ostd::format_spec
 A structure implementing type safe C-style formatting. More...
 
struct  ostd::basic_char_range< T >
 A string slice type. More...
 
struct  ostd::ranged_traits< std::basic_string< T, TR, A > >
 Mutable range integration for std::basic_string. More...
 
struct  ostd::ranged_traits< std::basic_string< T, TR, A > const >
 Immutable range integration for std::basic_string. More...
 
struct  ostd::utf::utf_error
 Thrown on UTF-8 decoding failure. More...
 
struct  std::hash< ostd::basic_char_range< T > >
 Standard std::hash integration for string slices. More...
 

Typedefs

using ostd::char_range = basic_char_range< char >
 A mutable slice over char.
 
using ostd::wchar_range = basic_char_range< wchar_t >
 A mutable slice over wchar_t.
 
using ostd::char16_range = basic_char_range< char16_t >
 A mutable slice over char16_t.
 
using ostd::char32_range = basic_char_range< char32_t >
 A mutable slice over char32_t.
 
using ostd::string_range = basic_char_range< char const >
 An immutable slice over char. More...
 
using ostd::wstring_range = basic_char_range< wchar_t const >
 An immutable slice over wchar_t. More...
 
using ostd::u16string_range = basic_char_range< char16_t const >
 An immutable slice over char16_t. More...
 
using ostd::u32string_range = basic_char_range< char32_t const >
 An immutable slice over char32_t. More...
 
using ostd::utf::wchar_fixed_t = std::conditional_t< sizeof(wchar_t)==sizeof(char32_t), char32_t, std::conditional_t< sizeof(wchar_t)==sizeof(char16_t), char16_t, char > >
 A Unicode type of the same size as wchar_t. More...
 
template<std::size_t N>
using ostd::utf::unicode_t = typename detail::unicode_t_base< N >::type
 The UTF unit type according to utf::unit_bits. More...
 
template<typename T >
using ostd::utf::unicode_base_t = unicode_t< unit_bits< T > >
 A normalized Unicode type for the given character type. More...
 

Enumerations

enum  ostd::format_flags {
  ostd::FMT_FLAG_DASH = 1 << 0, ostd::FMT_FLAG_ZERO = 1 << 1, ostd::FMT_FLAG_SPACE = 1 << 2, ostd::FMT_FLAG_PLUS = 1 << 3,
  ostd::FMT_FLAG_HASH = 1 << 4, ostd::FMT_FLAG_AT = 1 << 5
}
 An enumeration defining flags for C-style formatting marks. More...
 

Functions

template<typename R , typename ... A>
R && ostd::format (R &&writer, string_range fmt, A const &...args)
 Formats into an output range using a format string and arguments. More...
 
template<typename R , typename ... A>
R && ostd::format (R &&writer, std::locale const &loc, string_range fmt, A const &...args)
 Formats into an output range using a format string and arguments. More...
 
bool ostd::operator== (string_range lhs, string_range rhs) noexcept
 Like !lhs.compare(rhs).
 
bool ostd::operator!= (string_range lhs, string_range rhs) noexcept
 Like lhs.compare(rhs).
 
bool ostd::operator< (string_range lhs, string_range rhs) noexcept
 Like lhs.compare(rhs) < 0.
 
bool ostd::operator> (string_range lhs, string_range rhs) noexcept
 Like lhs.compare(rhs) > 0.
 
bool ostd::operator<= (string_range lhs, string_range rhs) noexcept
 Like lhs.compare(rhs) <= 0.
 
bool ostd::operator>= (string_range lhs, string_range rhs) noexcept
 Like lhs.compare(rhs) >= 0.
 
bool ostd::operator== (u16string_range lhs, u16string_range rhs) noexcept
 Like !lhs.compare(rhs).
 
bool ostd::operator!= (u16string_range lhs, u16string_range rhs) noexcept
 Like lhs.compare(rhs).
 
bool ostd::operator< (u16string_range lhs, u16string_range rhs) noexcept
 Like lhs.compare(rhs) < 0.
 
bool ostd::operator> (u16string_range lhs, u16string_range rhs) noexcept
 Like lhs.compare(rhs) > 0.
 
bool ostd::operator<= (u16string_range lhs, u16string_range rhs) noexcept
 Like lhs.compare(rhs) <= 0.
 
bool ostd::operator>= (u16string_range lhs, u16string_range rhs) noexcept
 Like lhs.compare(rhs) >= 0.
 
bool ostd::operator== (u32string_range lhs, u32string_range rhs) noexcept
 Like !lhs.compare(rhs).
 
bool ostd::operator!= (u32string_range lhs, u32string_range rhs) noexcept
 Like lhs.compare(rhs).
 
bool ostd::operator< (u32string_range lhs, u32string_range rhs) noexcept
 Like lhs.compare(rhs) < 0.
 
bool ostd::operator> (u32string_range lhs, u32string_range rhs) noexcept
 Like lhs.compare(rhs) > 0.
 
bool ostd::operator<= (u32string_range lhs, u32string_range rhs) noexcept
 Like lhs.compare(rhs) <= 0.
 
bool ostd::operator>= (u32string_range lhs, u32string_range rhs) noexcept
 Like lhs.compare(rhs) >= 0.
 
bool ostd::operator== (wstring_range lhs, wstring_range rhs) noexcept
 Like !lhs.compare(rhs).
 
bool ostd::operator!= (wstring_range lhs, wstring_range rhs) noexcept
 Like lhs.compare(rhs).
 
bool ostd::operator< (wstring_range lhs, wstring_range rhs) noexcept
 Like lhs.compare(rhs) < 0.
 
bool ostd::operator> (wstring_range lhs, wstring_range rhs) noexcept
 Like lhs.compare(rhs) > 0.
 
bool ostd::operator<= (wstring_range lhs, wstring_range rhs) noexcept
 Like lhs.compare(rhs) <= 0.
 
bool ostd::operator>= (wstring_range lhs, wstring_range rhs) noexcept
 Like lhs.compare(rhs) >= 0.
 
bool ostd::starts_with (string_range a, string_range b) noexcept
 Checks if a string slice starts with another slice.
 
bool ostd::starts_with (u16string_range a, u16string_range b) noexcept
 Checks if a string slice starts with another slice.
 
bool ostd::starts_with (u32string_range a, u32string_range b) noexcept
 Checks if a string slice starts with another slice.
 
bool ostd::starts_with (wstring_range a, wstring_range b) noexcept
 Checks if a string slice starts with another slice.
 
OSTD_EXPORT bool ostd::utf::decode (string_range &r, char32_t &ret) noexcept
 
OSTD_EXPORT bool ostd::utf::decode (u16string_range &r, char32_t &ret) noexcept
 
OSTD_EXPORT bool ostd::utf::decode (u32string_range &r, char32_t &ret) noexcept
 
OSTD_EXPORT bool ostd::utf::decode (wstring_range &r, char32_t &ret) noexcept
 
template<typename C , typename R >
std::size_t ostd::utf::encode (R &sink, char32_t ch)
 
template<typename C , typename OR , typename IR >
std::size_t ostd::utf::encode (OR &sink, IR &r)
 Encode a Unicode code point from a string in the given encoding. More...
 
template<typename C , typename R >
std::size_t ostd::utf::replace (R &sink)
 Write a Unicode replacement character into the sink. More...
 
OSTD_EXPORT std::size_t ostd::utf::length (string_range r, string_range &cont) noexcept
 Get the number of Unicode code points in a string. More...
 
OSTD_EXPORT std::size_t ostd::utf::length (u16string_range r, u16string_range &cont) noexcept
 Get the number of Unicode code points in a string. More...
 
OSTD_EXPORT std::size_t ostd::utf::length (u32string_range r, u32string_range &cont) noexcept
 Get the number of Unicode code points in a string. More...
 
OSTD_EXPORT std::size_t ostd::utf::length (wstring_range r, wstring_range &cont) noexcept
 Get the number of Unicode code points in a string. More...
 
OSTD_EXPORT std::size_t ostd::utf::length (string_range r) noexcept
 Get the number of Unicode code points in a UTF-8 string. More...
 
OSTD_EXPORT std::size_t ostd::utf::length (u16string_range r) noexcept
 Get the number of Unicode code points in a UTF-16 string. More...
 
OSTD_EXPORT std::size_t ostd::utf::length (u32string_range r) noexcept
 Get the number of Unicode code points in a UTF-32 string. More...
 
OSTD_EXPORT std::size_t ostd::utf::length (wstring_range r) noexcept
 Get the number of Unicode code points in a wide string. More...
 
template<typename C , typename R >
auto ostd::utf::iter_u (R &&str)
 Iterate a Unicode string as a different encoding. More...
 
OSTD_EXPORT bool ostd::utf::isalnum (char32_t c) noexcept
 Check whether a code point is alphanumeric. More...
 
OSTD_EXPORT bool ostd::utf::isalpha (char32_t c) noexcept
 Check whether a code point is alphabetic. More...
 
OSTD_EXPORT bool ostd::utf::isblank (char32_t c) noexcept
 Check whether a code point is a blank. More...
 
OSTD_EXPORT bool ostd::utf::iscntrl (char32_t c) noexcept
 Check whether a code point is a control character. More...
 
OSTD_EXPORT bool ostd::utf::isdigit (char32_t c) noexcept
 Check whether a code point is a digit. More...
 
OSTD_EXPORT bool ostd::utf::isgraph (char32_t c) noexcept
 Check whether a code point is graphic. More...
 
OSTD_EXPORT bool ostd::utf::islower (char32_t c) noexcept
 Check whether a code point is lowercase. More...
 
OSTD_EXPORT bool ostd::utf::isprint (char32_t c) noexcept
 Check whether a code point is printable. More...
 
OSTD_EXPORT bool ostd::utf::ispunct (char32_t c) noexcept
 Check whether a code point is punctuation. More...
 
OSTD_EXPORT bool ostd::utf::isspace (char32_t c) noexcept
 Check whether a code point is a whitespace. More...
 
OSTD_EXPORT bool ostd::utf::istitle (char32_t c) noexcept
 Check whether a code point is titlecase. More...
 
OSTD_EXPORT bool ostd::utf::isupper (char32_t c) noexcept
 Check whether a code point is uppercase. More...
 
OSTD_EXPORT bool ostd::utf::isvalid (char32_t c) noexcept
 Check whether a code point is a valid character. More...
 
OSTD_EXPORT bool ostd::utf::isxdigit (char32_t c) noexcept
 Check whether a code point is a hexadecimal digit. More...
 
OSTD_EXPORT char32_t ostd::utf::tolower (char32_t c) noexcept
 Convert a Unicode code point to lowercase. More...
 
OSTD_EXPORT char32_t ostd::utf::toupper (char32_t c) noexcept
 Convert a Unicode code point to uppercase. More...
 
int ostd::utf::compare (string_range s1, string_range s2) noexcept
 Compare two UTF-8 strings. More...
 
int ostd::utf::compare (u16string_range s1, u16string_range s2) noexcept
 Compare two UTF-16 strings. More...
 
int ostd::utf::compare (u32string_range s1, u32string_range s2) noexcept
 Compare two UTF-32 strings. More...
 
int ostd::utf::compare (wstring_range s1, wstring_range s2) noexcept
 Compare two wide strings. More...
 
OSTD_EXPORT int ostd::utf::case_compare (string_range s1, string_range s2) noexcept
 Compare two UTF-8 strings as case insensitive. More...
 
OSTD_EXPORT int ostd::utf::case_compare (u16string_range s1, u16string_range s2) noexcept
 Compare two UTF-16 strings as case insensitive. More...
 
OSTD_EXPORT int ostd::utf::case_compare (u32string_range s1, u32string_range s2) noexcept
 Compare two UTF-32 strings as case insensitive. More...
 
OSTD_EXPORT int ostd::utf::case_compare (wstring_range s1, wstring_range s2) noexcept
 Compare two wide strings as case insensitive. More...
 
string_range ostd::literals::string_literals::operator"" _sr (char const *str, std::size_t len) noexcept
 A custom literal for string ranges. More...
 
size_type ostd::basic_char_range< T >::length () const noexcept
 Gets the number of code points in the slice. More...
 
size_type ostd::basic_char_range< T >::length (basic_char_range &cont) const noexcept
 Gets the number of code points in the slice. More...
 
template<typename C >
auto ostd::basic_char_range< T >::iter_u () const
 Iterate over the Unicode units of the given type. More...
 
int ostd::basic_char_range< T >::case_compare (basic_char_range< value_type const > s) const noexcept
 Compares two slices in a case insensitive manner. More...
 

Detailed Description

Provides string processing extensions.

As libostd provides a range system, it represents string slices as contiguous ranges of characters. This has many advantages, such as being able to use them with generic algorithms. The string slices are not zero terminated, which means creating subslices is very fast, it's basically just pointer arithmetic.

Integration with existing string handling facilities is ensured, so you can incorporate libostd into any existing project and still benefit from the new features.

A simple example:

#include <ostd/string.hh>
#include <ostd/io.hh>
int main() {
ostd::string_range x = "hello world";
auto p1 = x.slice(0, 5);
auto p2 = x.slice(6);
ostd::writeln(p1); // hello
ostd::writeln(p2); // world
}

An example of using libostd string formatting:

#include <unordered_map>
#include <vector>
#include <tuple>
#include <clocale>
#include <ostd/range.hh>
#include <ostd/io.hh>
using namespace ostd;
using namespace ostd::string_literals;
struct Foo {
};
/* implementing formatting for custom objects */
template<>
template<typename R>
static void to_format(Foo const &, R &writer, format_spec const &fs) {
switch (fs.spec()) {
case 'i':
range_put_all(writer, "Foo_i"_sr);
break;
default:
range_put_all(writer, "Foo_s"_sr);
break;
}
if (fs.flags() & FMT_FLAG_AT) {
range_put_all(writer, "_esc"_sr);
}
}
};
int main() {
std::vector<int> x = { 5, 10, 15, 20 };
/* prints [5|10|15|20] (using | as the delimiter and %s for each item),
* the syntax for ranges is %(CONTENTS%) where CONTENTS is a sequence
* up until and including the last format mark followed by a delimiter,
* so for example "%s, " has "%s" for formatting and ", " for the delimiter
* and "%d: %s, " has "%d: %s" for format and ", " for the delimiter; if
* you need to specify a complicated manual delimiter, you can use the
* "FORMAT%|DELIMITER" syntax, where %(%s, %) equals %(%s%|, %)
*/
writeln("-- range format --");
writefln("[%(%s|%)]", x);
/* prints a range with default format {item, item, item, ...}
* you can enable item escaping by passing the @ flag
*/
writeln("\n-- range default format --");
writefln("%s", x);
int y[] = { 2, 4, 8, 16, 32 };
/* prints { 2, 4, 8, 16, 32 } using ", " as the delimiter */
writeln("\n-- range format of static array --");
writefln("{ %(%s, %) }", y);
/* nested range printing - prints each item of the main
* range with [ %(%s, %) ] and ",\n" as the delimiter
*/
writeln("\n-- range format of nested range --");
writefln("[\n%([ %(%s, %) ]%|,\n%)\n]", map(range(10), [](int v) {
return range(v + 1);
}));
std::unordered_map<std::string, int> m = {
{ "foo", 5 },
{ "bar", 10 },
{ "baz", 15 }
};
/* prints something like { "baz": 15, "bar": 10, "foo": 5 }, note that
* the tuple is expanded into two formats (using the # flag) and the
* items are escaped with the @ flag (applies to strings and chars)
*/
writeln("\n-- range format of hash table --");
writefln("{ %#(%@s: %d, %) }", m);
/* not escaped, you get { baz: 15, bar: 10, foo: 5} */
writeln("\n-- range format of hash table (no escape) --");
writefln("{ %#(%s: %d, %) }", m);
/* no expansion of the items, print entire tuple with default format,
* gets you something like { <"baz", 15>, <"bar", 10>, <"foo", 5> }
* because the default tuple format is <item, item, item, ...>
*/
writeln("\n-- range format of hash table (no item expansion) --");
writefln("{ %(%@s, %) }", m);
/* as the @ flag enables escaping on strings and chars,
* you can use it standalone outside of range/tuple format
*/
writeln("\n-- format item escaping --");
writefln("not escaped: %s, escaped: %@s", "foo", "bar");
std::tuple<std::string, int, float, std::string> tup{
"hello world", 1337, 3.14f, "test"
};
/* you can expand tuples similarly to ranges, with %<CONTENTS%> where
* CONTENTS is a regular format string like if the tuple was formatted
* separately with each item of the tuple passed as a separate argument
*/
writeln("\n-- tuple format --");
writefln("the tuple contains %<%@s, %d, %f, %s%>.", tup);
writeln("\n-- tuple default format --");
writefln("auto tuple: %s", tup);
writeln("\n-- tuple default format (escaped) --");
writefln("auto tuple with escape: %@s", tup);
std::tuple<int, float, char const *> xt[] = {
std::make_tuple(5, 3.14f, "foo"),
std::make_tuple(3, 1.23f, "bar"),
std::make_tuple(9, 8.66f, "baz")
};
/* formatting a range of tuples, with each tuple expanded using #
*/
writeln("\n-- range of tuples format --");
writefln("[ %#(<%d|%f|%@s>%|, %) ]", xt);
/* formatting custom objects, the information about the format mark
* is passed into the to_format function and the object can read it
*/
writeln("\n-- custom object format --");
writefln("%s", Foo{});
writefln("%i", Foo{});
writefln("%@s", Foo{});
writefln("%@i", Foo{});
auto s = appender<std::string>();
/* formatting into a string sink (can be any output range, but
* appender makes sure the capacity is unlimited so it's safe)
*/
writeln("\n-- format into a string --");
format(s, "hello %s", "world");
writeln(s.get());
/* locale specific formatting */
writeln("\n-- number format with C locale --");
"\"%d\", \"%f\", \"%X\"", 123456789, 12345.6789123, 0x123456789ABCDEF
);
cout.imbue(std::locale{""});
writefln("\n-- number format with system locale --");
"\"%d\", \"%f\", \"%X\"", 123456789, 12345.6789123, 0x123456789ABCDEF
);
}
/* output:
-- range format --
[5|10|15|20]
-- range default format --
{5, 10, 15, 20}
-- range format of static array --
{ 2, 4, 8, 16, 32 }
-- range format of nested range --
[
[ 0 ],
[ 0, 1 ],
[ 0, 1, 2 ],
[ 0, 1, 2, 3 ],
[ 0, 1, 2, 3, 4 ],
[ 0, 1, 2, 3, 4, 5 ],
[ 0, 1, 2, 3, 4, 5, 6 ],
[ 0, 1, 2, 3, 4, 5, 6, 7 ],
[ 0, 1, 2, 3, 4, 5, 6, 7, 8 ],
[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ]
]
-- range format of hash table --
{ "baz": 15, "bar": 10, "foo": 5 }
-- range format of hash table (no escape) --
{ baz: 15, bar: 10, foo: 5 }
-- range format of hash table (no item expansion) --
{ <"baz", 15>, <"bar", 10>, <"foo", 5> }
-- format item escaping --
not escaped: foo, escaped: "bar"
-- tuple format --
the tuple contains "hello world", 1337, 3.140000, test.
-- tuple default format --
auto tuple: <hello world, 1337, 3.14, test>
-- tuple default format (escaped) --
auto tuple with escape: <"hello world", 1337, 3.14, "test">
-- range of tuples format --
[ <5|3.140000|"foo">, <3|1.230000|"bar">, <9|8.660000|"baz"> ]
-- custom object format --
Foo_s
Foo_i
Foo_s_esc
Foo_i_esc
-- format into a string --
hello world
-- number format with C locale --
"123456789", "12345.678912", "123456789ABCDEF"
-- number format with system locale --
"123 456 789", "12 345,678912", "123 456 789 ABC DEF"
*/

See the examples provided with the library for further information.

Typedef Documentation

◆ string_range

using ostd::string_range = typedef basic_char_range<char const>

An immutable slice over char.

This is used in most libostd APIs that read strings. More or less anything is convertible to it, including mutable slices, so it's a perfect fit as long as modifications are not necessary.

◆ u16string_range

using ostd::u16string_range = typedef basic_char_range<char16_t const>

An immutable slice over char16_t.

Included for basic UTF-16 compatibility.

◆ u32string_range

using ostd::u32string_range = typedef basic_char_range<char32_t const>

An immutable slice over char32_t.

Can represent UTF-32 strings.

◆ unicode_base_t

template<typename T >
using ostd::utf::unicode_base_t = typedef unicode_t<unit_bits<T> >

A normalized Unicode type for the given character type.

This will map to itself for all character types but wchar_t, which maps to utf::wchar_fixed_t.

◆ unicode_t

template<std::size_t N>
using ostd::utf::unicode_t = typedef typename detail::unicode_t_base<N>::type

The UTF unit type according to utf::unit_bits.

For 8 this will be char, for 16 char16_t, for 32 char32_t.

◆ wchar_fixed_t

using ostd::utf::wchar_fixed_t = typedef std::conditional_t< sizeof(wchar_t) == sizeof(char32_t), char32_t, std::conditional_t< sizeof(wchar_t) == sizeof(char16_t), char16_t, char > >

A Unicode type of the same size as wchar_t.

This can be an alias to either char32_t, char16_t or char representing UTF-32, UTF-16 or UTF-8 respectively. It represents a Unicode character type that in a platform specific way represents one of the 3 encodings. It follows the same order so if wchar_t can fit UTF-32, it's UTF-32, otherwise it tries UTF-16 and UTF-8 as the next best thing.

On most platforms and all platforms that we care about, this does the right thing; UTF-32 on most Unix-like systems (with 32-bit wchar_t), UTF-16 on Windows (with 16-bit wchar_t).

◆ wstring_range

using ostd::wstring_range = typedef basic_char_range<wchar_t const>

An immutable slice over wchar_t.

Included primarily for compatibility with other APIs.

Enumeration Type Documentation

◆ format_flags

An enumeration defining flags for C-style formatting marks.

Used inside ostd::format_spec. The C-style formatting mark has a flags section and each of these enum items represents one. They can be combined using the standard bitwise operators.

Enumerator
FMT_FLAG_DASH 

The dash (-) flag.

FMT_FLAG_ZERO 

The zero (0) flag.

FMT_FLAG_SPACE 

The space () flag.

FMT_FLAG_PLUS 

The plus (+) flag.

FMT_FLAG_HASH 

The hash (#) flag.

FMT_FLAG_AT 

The at (@) flag.

Function Documentation

◆ case_compare() [1/5]

template<typename T >
int ostd::basic_char_range< T >::case_compare ( basic_char_range< value_type const >  s) const
inlinenoexcept

Compares two slices in a case insensitive manner.

Works exactly the same as compare(), but in a case insensitive way, i.e. it lowercases the characters and compares them after that.

For UTF-8, it decodes the string on the fly, then lowercases the decoded code points and uses their difference (without encoding them back). If the decoding fails, the failing code unit is used as-is, so this function never fails. Identical treatment is given to UTF-16.

◆ case_compare() [2/5]

OSTD_EXPORT int ostd::utf::case_compare ( string_range  s1,
string_range  s2 
)
noexcept

Compare two UTF-8 strings as case insensitive.

The case insensitive comparison is done by advancing by code points and converting each code point to lowercase using utf::tolower() before doing the comparison, with invalid code units being compared as they are (so this function never fails).

See also
basic_char_range::case_compare()

◆ case_compare() [3/5]

OSTD_EXPORT int ostd::utf::case_compare ( u16string_range  s1,
u16string_range  s2 
)
noexcept

Compare two UTF-16 strings as case insensitive.

The case insensitive comparison is done by advancing by code points and converting each code point to lowercase using utf::tolower() before doing the comparison, with invalid code units being compared as they are (so this function never fails).

See also
basic_char_range::case_compare()

◆ case_compare() [4/5]

OSTD_EXPORT int ostd::utf::case_compare ( u32string_range  s1,
u32string_range  s2 
)
noexcept

Compare two UTF-32 strings as case insensitive.

The case insensitive comparison is done by converting each code point to lowercase using utf::tolower() before doing the comparison, with invalid code points being compared as they are (so this function never fails).

See also
basic_char_range::case_compare()

◆ case_compare() [5/5]

OSTD_EXPORT int ostd::utf::case_compare ( wstring_range  s1,
wstring_range  s2 
)
noexcept

Compare two wide strings as case insensitive.

The case insensitive comparison is done by advancing by code points and converting each code point to lowercase using utf::tolower() before doing the comparison, with invalid code units being compared as they are (so this function never fails).

The internal behavior of this function is platform specific depending on the size of wchar_t.

See also
basic_char_range::case_compare()

◆ compare() [1/4]

int ostd::utf::compare ( string_range  s1,
string_range  s2 
)
inlinenoexcept

Compare two UTF-8 strings.

Basically returns s1.compare(s2), so for detailed documentation please refer to basic_char_range::compare().

◆ compare() [2/4]

int ostd::utf::compare ( u16string_range  s1,
u16string_range  s2 
)
inlinenoexcept

Compare two UTF-16 strings.

Basically returns s1.compare(s2), so for detailed documentation please refer to basic_char_range::compare().

◆ compare() [3/4]

int ostd::utf::compare ( u32string_range  s1,
u32string_range  s2 
)
inlinenoexcept

Compare two UTF-32 strings.

Basically returns s1.compare(s2), so for detailed documentation please refer to basic_char_range::compare().

◆ compare() [4/4]

int ostd::utf::compare ( wstring_range  s1,
wstring_range  s2 
)
inlinenoexcept

Compare two wide strings.

Basically returns s1.compare(s2), so for detailed documentation please refer to basic_char_range::compare().

◆ encode()

template<typename C , typename OR , typename IR >
std::size_t ostd::utf::encode ( OR &  sink,
IR &  r 
)
inline

Encode a Unicode code point from a string in the given encoding.

This is the same as utf::encode() but takes the number of bits as in utf::unit_bits as an input instead of a type, mapping 8 to char, 16 to char16_t and 32 to char32_t.

◆ format() [1/2]

template<typename R , typename ... A>
R&& ostd::format ( R &&  writer,
string_range  fmt,
A const &...  args 
)
inline

Formats into an output range using a format string and arguments.

Uses the default constructed std::locale (the current global locale) for locale specific formatting. There is also a version that takes an explicit locale.

This is just a simple wrapper, equivalent to:

return ostd::format_spec{fmt}.format(std::forward<R>(writer), args...);
Examples:
format.cc.

◆ format() [2/2]

template<typename R , typename ... A>
R&& ostd::format ( R &&  writer,
std::locale const &  loc,
string_range  fmt,
A const &...  args 
)
inline

Formats into an output range using a format string and arguments.

This version uses loc as a locale. There is also a version that uses the global locale by default.

This is just a simple wrapper, equivalent to:

return ostd::format_spec{fmt, loc}.format(std::forward<R>(writer), args...);

◆ isalnum()

OSTD_EXPORT bool ostd::utf::isalnum ( char32_t  c)
noexcept

Check whether a code point is alphanumeric.

This is true for either utf::isalpha() or utf::isdigit(). Also equivalent to std::isalnum().

◆ isalpha()

OSTD_EXPORT bool ostd::utf::isalpha ( char32_t  c)
noexcept

Check whether a code point is alphabetic.

This is like std::isalpha() but strictly Unicode and works on the entire code point range. Returns true for alphabetic characters, false for others.

The categories considered alphabetic are L*.

◆ isblank()

OSTD_EXPORT bool ostd::utf::isblank ( char32_t  c)
noexcept

Check whether a code point is a blank.

This is like std::isblank() but strictly Unicode and works on the entire code point range. Returns true for blanks, false for others.

The blank characters are only space (U+20) and tab (U+9).

◆ iscntrl()

OSTD_EXPORT bool ostd::utf::iscntrl ( char32_t  c)
noexcept

Check whether a code point is a control character.

This is like std::iscntrl() but strictly Unicode and works on the entire code point range. Returns true for blanks, false for others.

The category considered control characters is Cc.

◆ isdigit()

OSTD_EXPORT bool ostd::utf::isdigit ( char32_t  c)
noexcept

Check whether a code point is a digit.

This is like std::isdigit() but strictly Unicode and works on the entire code point range. Returns true for digit characters, false for others.

The category considered a digit is Nd.

◆ isgraph()

OSTD_EXPORT bool ostd::utf::isgraph ( char32_t  c)
noexcept

Check whether a code point is graphic.

This is true when the input is not utf::isspace() and is utf::isprint(). Also equivalent to std::isgraph().

◆ islower()

OSTD_EXPORT bool ostd::utf::islower ( char32_t  c)
noexcept

Check whether a code point is lowercase.

This is like std::islower() but strictly Unicode and works on the entire code point range. Returns true for lowercase characters, false for others.

The category considered a lowercase is Ll.

◆ isprint()

OSTD_EXPORT bool ostd::utf::isprint ( char32_t  c)
noexcept

Check whether a code point is printable.

Equivalent to std::isprint() but for Unicode. This is true for all characters that are not utf::iscntrl() and that are not U+2028, U+2029, U+FFF9, U+FFFA, U+FFFB.

◆ ispunct()

OSTD_EXPORT bool ostd::utf::ispunct ( char32_t  c)
noexcept

Check whether a code point is punctuation.

This is like std::ispunct() but strictly Unicode and works on the entire code point range. Returns true for punctuation characters, false for others. Punctuation characters are those that satisfy utf::isgraph() but are not utf::isalnum().

◆ isspace()

OSTD_EXPORT bool ostd::utf::isspace ( char32_t  c)
noexcept

Check whether a code point is a whitespace.

This is like std::isspace() but strictly Unicode and works on the entire code point range. Returns true for whitespace, false for others.

The categories considered blanks are Z* with the B, S and WS bidirectional categories.

◆ istitle()

OSTD_EXPORT bool ostd::utf::istitle ( char32_t  c)
noexcept

Check whether a code point is titlecase.

This has no standard ctype equivalent. Returns true for titlecase characters, false for others.

The category considered a uppercase is Lt.

◆ isupper()

OSTD_EXPORT bool ostd::utf::isupper ( char32_t  c)
noexcept

Check whether a code point is uppercase.

This is like std::isipper but strictly Unicode and works on the entire code point range. Returns true for ippercase characters, false for others.

The category considered a uppercase is Lu.

◆ isvalid()

OSTD_EXPORT bool ostd::utf::isvalid ( char32_t  c)
noexcept

Check whether a code point is a valid character.

This is all code points within the range (utf::max_unicode) that are not surrogate code points (U+D800 to U+DFFF), non-characters (U+FDD0 to U+FDEF) and end-of-plane characters (U+FFFE and U+FFFF).

This is Unicode specific and has no standard ctype equivalent.

◆ isxdigit()

OSTD_EXPORT bool ostd::utf::isxdigit ( char32_t  c)
noexcept

Check whether a code point is a hexadecimal digit.

This only considers the ASCII character range, returning true for digits (U+30 to U+39) as well as letters A to F in lowercase and uppercase (U+41 to U+46, U+61 to U+66).

Behaves exactly the same as std::isxdigit() in the C locale, but unlike the former it never changes behavior, i.e. it cannot support codepage extensions, being Unicode only.

◆ iter_u() [1/2]

template<typename T >
template<std::size_t N>
auto ostd::basic_char_range< T >::iter_u ( ) const
inline

Iterate over the Unicode units of the given type.

Like utf::iter_u().

◆ iter_u() [2/2]

template<typename C , typename R >
auto ostd::utf::iter_u ( R &&  str)
inline

Iterate a Unicode string as a different encoding.

Like ostd::iter_u but taking bits to specify the encoding.

This returns an ostd::forward_range_tag that will iterate over the given Unicode character range as a different UTF encoding. The UTF encoding is specified using the given type C, being UTF-9 for char, UTF-16 for char16_t, UTF-32 for char32_t and platform specific for wchar_t.

This uses utf::unicode_t with N to call utf::iter_u().

◆ length() [1/10]

template<typename T >
std::size_t ostd::basic_char_range< T >::length ( ) const
inlinenoexcept

Gets the number of code points in the slice.

Effectively the same as utf::length().

◆ length() [2/10]

template<typename T >
std::size_t ostd::basic_char_range< T >::length ( basic_char_range< T > &  cont) const
inlinenoexcept

Gets the number of code points in the slice.

Effectively the same as utf::length().

◆ length() [3/10]

OSTD_EXPORT std::size_t ostd::utf::length ( string_range  r,
string_range cont 
)
noexcept

Get the number of Unicode code points in a string.

This function keeps reading Unicode code points while it can and once it can't it returns the number of valid ones with the rest of the input string range being in cont. That means if the entire string is a valid UTF-8 string, cont will be empty, otherwise it will begin at the first invalid UTF-8 code unit.

If you're sure the string is valid or you don't need to handle the error, you can use the more convenient overload below.

◆ length() [4/10]

OSTD_EXPORT std::size_t ostd::utf::length ( u16string_range  r,
u16string_range cont 
)
noexcept

Get the number of Unicode code points in a string.

This function keeps reading Unicode code points while it can and once it can't it returns the number of valid ones with the rest of the input string range being in cont. That means if the entire string is a valid UTF-16 string, cont will be empty, otherwise it will begin at the first invalid UTF-16 code unit.

If you're sure the string is valid or you don't need to handle the error, you can use the more convenient overload below.

◆ length() [5/10]

OSTD_EXPORT std::size_t ostd::utf::length ( u32string_range  r,
u32string_range cont 
)
noexcept

Get the number of Unicode code points in a string.

This function keeps reading Unicode code points while it can and once it can't it returns the number of valid ones with the rest of the input string range being in cont. That means if the entire string is a valid UTF-32 string, cont will be empty, otherwise it will begin at the first invalid code point.

If you're sure the string is valid or you don't need to handle the error, you can use the more convenient overload below.

◆ length() [6/10]

OSTD_EXPORT std::size_t ostd::utf::length ( wstring_range  r,
wstring_range cont 
)
noexcept

Get the number of Unicode code points in a string.

This function keeps reading Unicode code points while it can and once it can't it returns the number of valid ones with the rest of the input string range being in cont. That means if the entire string is a valid Unicode wide string, cont will be empty, otherwise it will begin at the first invalid code unit.

If you're sure the string is valid or you don't need to handle the error, you can use the more convenient overload below.

The behavior of this function is platform dependent as wide characters represent different things on different systems.

◆ length() [7/10]

OSTD_EXPORT std::size_t ostd::utf::length ( string_range  r)
noexcept

Get the number of Unicode code points in a UTF-8 string.

If an invalid UTF-8 sequence is encountered, it's considered 1 character and therefore the resulting length will be the number of valid code points plus the number of invalid code units as if they were replaced with valid code points.

If you need to stop at an invalid code unit and get the continuation string, use the overload above.

◆ length() [8/10]

OSTD_EXPORT std::size_t ostd::utf::length ( u16string_range  r)
noexcept

Get the number of Unicode code points in a UTF-16 string.

If an invalid UTF-16 sequence is encountered, it's considered 1 character and therefore the resulting length will be the number of valid code points plus the number of invalid code units as if they were replaced with valid code points.

If you need to stop at an invalid code unit and get the continuation string, use the overload above.

◆ length() [9/10]

OSTD_EXPORT std::size_t ostd::utf::length ( u32string_range  r)
noexcept

Get the number of Unicode code points in a UTF-32 string.

This, like the above overloads for multibyte encodings, treats invalid values as code points, so this function effectively just returns the size of the given range.

◆ length() [10/10]

OSTD_EXPORT std::size_t ostd::utf::length ( wstring_range  r)
noexcept

Get the number of Unicode code points in a wide string.

If an invalid sequence is encountered, it's considered 1 character and therefore the resulting length will be the number of valid code points plus the number of invalid code units as if they were replaced with valid code points.

If you need to stop at an invalid code unit and get the continuation string, use the overload above.

The behavior of this function is platform dependent as wide characters represent different things on different systems.

◆ operator"" _sr()

string_range ostd::literals::string_literals::operator"" _sr ( char const *  str,
std::size_t  len 
)
inlinenoexcept

A custom literal for string ranges.

You need to enable this explicitly by using this namespace. It's not enabled by default to ensure compatibility with existing code.

◆ replace()

template<typename C , typename R >
std::size_t ostd::utf::replace ( R &  sink)
inline

Write a Unicode replacement character into the sink.

Depending on the type input, this encodes the replacement character into the sink as either UTF-8, UTF-16 or UTF-32. ALl 4 character types are allowed.

This is the same as utf::replace() but takes the number of bits as in utf::unit_bits as an input instead of a type, mapping 8 to char, 16 to char16_t and 32 to char32_t.

◆ tolower()

OSTD_EXPORT char32_t ostd::utf::tolower ( char32_t  c)
noexcept

Convert a Unicode code point to lowercase.

Like std::tolower() but works with Unicode code points. If the code point is already lowercase or has no lowercase equivalent, this just returns the input unchanged, otherwise it returns the matching lowercase variant.

◆ toupper()

OSTD_EXPORT char32_t ostd::utf::toupper ( char32_t  c)
noexcept

Convert a Unicode code point to uppercase.

Like std::toupper() but works with Unicode code points. If the code point is already uppercase or has no uppercase equivalent, this just returns the input unchanged, otherwise it returns the matching uppercase variant.