Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory leak in regex #51001

Open
llvmbot opened this issue Aug 28, 2021 · 8 comments
Open

memory leak in regex #51001

llvmbot opened this issue Aug 28, 2021 · 8 comments
Labels
bugzilla Issues migrated from bugzilla libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. regex Issues related to regex

Comments

@llvmbot
Copy link
Member

llvmbot commented Aug 28, 2021

Bugzilla Link 51659
Version 11.0
OS Linux
Reporter LLVM Bugzilla Contributor
CC @mclow

Extended Description

The following program leaks memory (using clang 11 on Debian Bullseye, Debian clang version 11.0.1-2):

paul@machine:~/code/stdfuzz/build$ cat problem.cpp
#include
int
main()
{
std::regex{ R"(()*)",
std::regex_constants::icase | std::regex_constants::nosubs |
std::regex::optimize | std::regex::collate | std::regex::grep };
}

paul@machine:/code/stdfuzz/build$ clang++-11 --stdlib=libc++ problem.cpp -fsanitize=leak -g
paul@simdjson:
/code/stdfuzz/build$ ./a.out

=================================================================
==18364==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 16 byte(s) in 1 object(s) allocated from:
#​0 0x4172e8 in operator new(unsigned long) (/home/paul/code/stdfuzz/build/a.out+0x4172e8)
#​1 0x44cb02 in std::__1::basic_regex<char, std::__1::regex_traits >::__push_loop(unsigned long, unsigned long, std::__1::__owns_one_state, unsigned long, unsigned long, bool) /usr/lib/llvm-11/bin/../include/c++/v1/regex:4699:23
#​2 0x44c962 in std::__1::basic_regex<char, std::__1::regex_traits >::__push_greedy_inf_repeat(unsigned long, std::__1::__owns_one_state
, unsigned int, unsigned int) /usr/lib/llvm-11/bin/../include/c++/v1/regex:2863:10
#​3 0x44ddbd in char const* std::__1::basic_regex<char, std::__1::regex_traits >::__parse_RE_dupl_symbol<char const*>(char const*, char const*, std::__1::__owns_one_state, unsigned int, unsigned int) /usr/lib/llvm-11/bin/../include/c++/v1/regex:3578:13
#​4 0x44dc4b in char const
std::__1::basic_regex<char, std::__1::regex_traits >::__parse_simple_RE<char const*>(char const*, char const*) /usr/lib/llvm-11/bin/../include/c++/v1/regex:3259:23
#​5 0x44db1c in char const* std::__1::basic_regex<char, std::__1::regex_traits >::__parse_RE_expression<char const*>(char const*, char const*) /usr/lib/llvm-11/bin/../include/c++/v1/regex:3239:35
#​6 0x436aff in char const* std::__1::basic_regex<char, std::__1::regex_traits >::__parse_basic_reg_exp<char const*>(char const*, char const*) /usr/lib/llvm-11/bin/../include/c++/v1/regex:3133:23
#​7 0x436cdb in char const* std::__1::basic_regex<char, std::__1::regex_traits >::__parse_grep<char const*>(char const*, char const*) /usr/lib/llvm-11/bin/../include/c++/v1/regex:4617:9
#​8 0x4366fd in char const* std::__1::basic_regex<char, std::__1::regex_traits >::__parse<char const*>(char const*, char const*) /usr/lib/llvm-11/bin/../include/c++/v1/regex:3107:19
#​9 0x4363e1 in void std::__1::basic_regex<char, std::__1::regex_traits >::__init<char const*>(char const*, char const*) /usr/lib/llvm-11/bin/../include/c++/v1/regex:3077:31
#​10 0x43617f in std::__1::basic_regex<char, std::__1::regex_traits >::basic_regex(char const*, std::__1::regex_constants::syntax_option_type) /usr/lib/llvm-11/bin/../include/c++/v1/regex:2556:9
#​11 0x43609f in main /home/paul/code/stdfuzz/build/problem.cpp:3:1
#​12 0x7f808b6bdd09 in __libc_start_main csu/../csu/libc-start.c:308:16

SUMMARY: LeakSanitizer: 16 byte(s) leaked in 1 allocation(s).

It reproduces on compiler explorer with clang 12 as well, clang trunk does not work at the moment there.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 11, 2021
@michaelbprice
Copy link

michaelbprice commented Dec 15, 2021

template <class _CharT, class _Traits>
void
basic_regex<_CharT, _Traits>::__push_loop(size_t __min, size_t __max,
        __owns_one_state<_CharT>* __s, size_t __mexp_begin, size_t __mexp_end,
        bool __greedy)
{
    unique_ptr<__empty_state<_CharT> > __e1(new __empty_state<_CharT>(__end_->first()));
    __end_->first() = nullptr; // <<<<<<<< LEAKS HERE
    unique_ptr<__loop<_CharT> > __e2(new __loop<_CharT>(__loop_count_,
                __s->first(), __e1.get(), __mexp_begin, __mexp_end, __greedy,
                __min, __max));
    __s->first() = nullptr;
    __e1.release();
    __end_->first() = new __repeat_one_loop<_CharT>(__e2.get());
    __end_ = __e2->second();
    __s->first() = __e2.release();
    ++__loop_count_;
}

It looks like a patch might be to modify __has_one_state::__first_ to be a unique_ptr and update call sites accordingly.

@michaelbprice
Copy link

I was considering how to add a regression test for this. Would the right place be in something like libcxx/test/std/re/re.leaks/issue_51001.cpp? When I tried putting a test there, llvm-lit wouldn't identify that I had added a test. Then there would be the matter of writing a lit config file there that could hopefully add the leaks sanitizer to the command line.

@mordante
Copy link
Member

mordante commented Apr 7, 2022

The re/re.foo naming matches the sections in the Standard. Something in libcxx/test/std/re/re.const/re.matchflag seems more appropriate.

Lit requires two extensions to identify the test. So it should be named foo.pass.cpp. (There are other options of pass which will execute different lit tests.)

@philnik777
Copy link
Contributor

What am I missing? I can't reproduce it with https://godbolt.org/z/87xdP1jYx.

@fhahn
Copy link
Contributor

fhahn commented May 12, 2022

@philnik777 it looks like the shared godbolt uses address sanitizer instead of leak sanitizer.

Here's an updated version that should use leak sanitizer: https://godbolt.org/z/Y3hvs5hah

It also doesn't reproduce there, so I am going ahead and close the issue. Please double check and re-open if this still reproduces on your end with a recent Clang/libc++ version.

@fhahn fhahn closed this as completed May 12, 2022
@pauldreik
Copy link

Hi, original bug submitter here. There has been a formatting change when transferred from bugzilla to github, two backslashes got lost which are important. The problem is still there:

godbolt

@fhahn or @philnik777 could you please reopen this?

@pauldreik
Copy link

perhaps @mordante could reopen this?

@EugeneZelenko EugeneZelenko reopened this Aug 27, 2023
@philnik777
Copy link
Contributor

Actual reproducer:

#include <regex>

int main() {
  std::regex{ R"(\(\)*)",
  std::regex_constants::icase | std::regex_constants::nosubs |
  std::regex::optimize | std::regex::collate | std::regex::grep };
}

@philnik777 philnik777 added the regex Issues related to regex label Aug 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. regex Issues related to regex
Projects
None yet
Development

No branches or pull requests

7 participants