Skip to content

A simple C++ header-only library, std::string wrapper, providing support for UTF-8 Unicode operations.

Notifications You must be signed in to change notification settings

Ioannis-Markos-Angelidakis/cpp_utf8

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 

Repository files navigation

UTF-8 String Handling Library

This header file provides a UTF-8 compliant string class in C++. It offers an easy-to-use interface for manipulating and processing UTF-8 encoded strings, while ensuring proper handling of multibyte characters.

Features

  • Efficient handling of UTF-8 strings
  • Basic string operations: concatenation, comparison, and access
  • Support for UTF-8 character boundaries, ensuring correct indexing and substring operations
  • Helper functions to check for uppercase letters, symbols, and codepoints

Key Methods and Operators

  • operator[](size_t index) - Returns the character at the given UTF-8 index.
  • substr(size_t pos, size_t len) - Returns a substring starting from the UTF-8 character at position pos.
  • size() - Returns the number of UTF-8 characters in the string.
  • operator+= - Supports concatenation of UTF-8 strings and regular std::string values.
  • find(const std::string& substring) - Finds a substring within the UTF-8 string and returns its position.
  • erase(size_t index) - Removes a UTF-8 character at the specified index.
  • codepoint(size_t index) - Returns the Unicode codepoint for the character at the specified index.
  • is_uppercase(size_t index) - Checks if the character at the specified index is an uppercase letter.
  • is_symbol(size_t index) - Checks if the character at the specified index is a symbol.

UTF-8 Compliance

This library ensures proper handling of UTF-8 encoded strings by correctly identifying the number of bytes in each character using the char_bytes() helper function. It supports characters of varying byte lengths (1-4 bytes) and offers safety checks to prevent out-of-bound access or invalid character manipulation.

Stream Insertion

The library provides support for streaming UTF-8 strings using the operator<< for std::ostream.

Example Usage


#include "utf8_string.hpp"

int32_t main() { utf8::string str = "Hάel😎 lo, 世界!";

std::cout << str << '\n';

str.erase(10);

std::string stdstr = "κόσμος ";

str =  stdstr + str + "mama mia";

std::cout << "Length: " << str.size() << '\n';
std::cout << "Character at index 11: " << str[11] << '\n';

std::cout << str << '\n';

size_t pos = str.find("😎");
if (pos != std::string::npos) {
    std::cout << "Found '😎 ' at position: " << pos << '\n';
}

std::cout << "Is symbol? `" << str.at(11) << "` " << (str.is_symbol(11)? "YES\n" : "NO\n");
std::cout << "Is uppercase? `" << "Ц " << (utf8::is_uppercase("И")? "YES\n" : "NO\n");

}

Compilation

This header is designed to be lightweight and does not require external dependencies. Simply include the header in your project:

#include "utf8_string.hpp"

About

A simple C++ header-only library, std::string wrapper, providing support for UTF-8 Unicode operations.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages