From d6c8fec7764981bcbb830a103aa3700879da5fd5 Mon Sep 17 00:00:00 2001 From: Thomas Goyne Date: Wed, 22 May 2024 14:24:38 -0700 Subject: [PATCH] Fix UB in Tokenizer --- CHANGELOG.md | 1 + src/realm/tokenizer.cpp | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index eb1f557040a..34ec3eb576a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,6 +10,7 @@ * Encrypted files on Windows had a maximum size of 2GB even on x64 due to internal usage of `off_t`, which is a 32-bit type on 64-bit Windows ([PR #7698](https://github.com/realm/realm-core/pull/7698), since the introduction of encryption support on Windows in v3.0.0). * The encryption code no longer behaves differently depending on the system page size, which should entirely eliminate a recurring source of bugs related to copying encrypted Realm files between platforms with different page sizes. One known outstanding bug was ([RNET-1141](https://github.com/realm/realm-dotnet/issues/3592)), where opening files on a system with a larger page size than the writing system would attempt to read sections of the file which had never been written to ([PR #7698](https://github.com/realm/realm-core/pull/7698)). * There were several complicated scenarios which could result in stale reads from encrypted files in multiprocess scenarios. These would typically lead to crash, either due to an assertion failure or DecryptionFailure being thrown ([PR #7698](https://github.com/realm/realm-core/pull/7698), since v13.9.0). +* Tokenizing strings for full-text search could pass values outside the range [-1, 255] to `isspace()`, which is undefined behavior ([PR #7698](https://github.com/realm/realm-core/pull/7698), since the introduction of FTS in v13.0.0). ### Breaking changes * None. diff --git a/src/realm/tokenizer.cpp b/src/realm/tokenizer.cpp index f6bc42604cc..401be2fc4c6 100644 --- a/src/realm/tokenizer.cpp +++ b/src/realm/tokenizer.cpp @@ -61,7 +61,7 @@ std::pair, std::set> Tokenizer::get_search_to } }; for (; m_cur_pos != m_end_pos; m_cur_pos++) { - if (isspace(*m_cur_pos)) { + if (isspace(static_cast(*m_cur_pos))) { add_token(); } else {