This header file provides a UTF-8 compliant string class in C++. It offers an easy-to-use interface for manipulating and processing UTF-8 encoded strings, while ensuring proper handling of multibyte characters.
- Efficient handling of UTF-8 strings
- Basic string operations: concatenation, comparison, and access
- Support for UTF-8 character boundaries, ensuring correct indexing and substring operations
- Helper functions to check for uppercase letters, symbols, and codepoints
operator[](size_t index)
- Returns the character at the given UTF-8 index.substr(size_t pos, size_t len)
- Returns a substring starting from the UTF-8 character at positionpos
.size()
- Returns the number of UTF-8 characters in the string.operator+=
- Supports concatenation of UTF-8 strings and regularstd::string
values.find(const std::string& substring)
- Finds a substring within the UTF-8 string and returns its position.erase(size_t index)
- Removes a UTF-8 character at the specified index.codepoint(size_t index)
- Returns the Unicode codepoint for the character at the specified index.is_uppercase(size_t index)
- Checks if the character at the specified index is an uppercase letter.is_symbol(size_t index)
- Checks if the character at the specified index is a symbol.
This library ensures proper handling of UTF-8 encoded strings by correctly identifying the number of bytes in each character using the char_bytes()
helper function. It supports characters of varying byte lengths (1-4 bytes) and offers safety checks to prevent out-of-bound access or invalid character manipulation.
The library provides support for streaming UTF-8 strings using the operator<<
for std::ostream
.
#include "utf8_string.hpp"
int32_t main() { utf8::string str = "Hάel😎 lo, 世界!";
std::cout << str << '\n'; str.erase(10); std::string stdstr = "κόσμος "; str = stdstr + str + "mama mia"; std::cout << "Length: " << str.size() << '\n'; std::cout << "Character at index 11: " << str[11] << '\n'; std::cout << str << '\n'; size_t pos = str.find("😎"); if (pos != std::string::npos) { std::cout << "Found '😎 ' at position: " << pos << '\n'; } std::cout << "Is symbol? `" << str.at(11) << "` " << (str.is_symbol(11)? "YES\n" : "NO\n"); std::cout << "Is uppercase? `" << "Ц " << (utf8::is_uppercase("И")? "YES\n" : "NO\n");
}
This header is designed to be lightweight and does not require external dependencies. Simply include the header in your project:
#include "utf8_string.hpp"