Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<regex>: Fix integer overflow in _Buf and implement geometric buffer expansion #5175

Merged
merged 6 commits into from
Jan 14, 2025

Conversation

muellerj2
Copy link
Contributor

Fixes an undetected integer overflow in _Buf::_Insert(_Elem) when increasing the buffer size.

It's unlikely anyone ever ran into this bug in practice, since the overflow could only happen following about $2^{28}$ reallocations, but if one were to wait long enough, _Buf would write to and read from unallocated memory.

Since fixing the overflow bug meant rewriting the size calculations anyway, I also quickly added three lines to implement geometric expansion of the buffer, ensuring that inserting a new character runs in amortized constant rather than linear time. The selected growth factor is 1.5, same as vector's. (The geometric expansion kicks in when more than 48 characters are inserted into a character buffer. I think most practically used regular expressions don't even get close to adding 48 characters to one of these buffers.)

Finally, this PR makes _Buf throw a regex_error with error code error_space on allocation failure, which I think is the more appropriate exception when running out of memory while parsing a regular expression and building the corresponding NFA. But feel free to change this to bad_alloc etc. if you prefer these exceptions instead.

No tests added since they would require about 8 GB of virtual memory and run for several minutes, but here is a small x64 test program to see that overflow is handled correctly now:

#include <cassert>
#include <regex>
#include <string>

using namespace std;

int main() {
    {
        string s(static_cast<size_t>(static_cast<unsigned int>(-1)), 'a');
        regex regex(s);
    }

    {
        string s(static_cast<size_t>(static_cast<unsigned int>(-1)) + 1, 'a');
        try {
            regex regex(s);
            assert(false);
        } catch (const regex_error& ex) {
            assert(ex.code() == regex_constants::error_space);
        }
    }
    return 0;
}

@muellerj2 muellerj2 requested a review from a team as a code owner December 9, 2024 13:58
@StephanTLavavej StephanTLavavej added the bug Something isn't working label Dec 9, 2024
@StephanTLavavej StephanTLavavej self-assigned this Dec 9, 2024
@StephanTLavavej StephanTLavavej added the regex Everyone's favorite header label Jan 8, 2025
stl/inc/regex Outdated Show resolved Hide resolved
stl/inc/regex Outdated Show resolved Hide resolved
stl/inc/regex Show resolved Hide resolved
@StephanTLavavej
Copy link
Member

StephanTLavavej commented Jan 12, 2025

Thanks! I am astounded that we had non-geometric growth here. 🙀

I pushed minor stylistic changes.

@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@StephanTLavavej StephanTLavavej merged commit 3b6d29c into microsoft:main Jan 14, 2025
39 checks passed
@StephanTLavavej
Copy link
Member

📈 😸 🎉

@muellerj2 muellerj2 deleted the regex-fix-integer-overflow branch January 14, 2025 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working regex Everyone's favorite header
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants