Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logic cleanups for basic_string #3830

Closed
wants to merge 29 commits into from
Closed

Conversation

achabense
Copy link
Contributor

@achabense achabense commented Jun 24, 2023

(Updated)

  1. Makes string(const T*) and string(const T*, const Alloc&) adjacent. They work as a whole to resolve to string(const T*, const Alloc&={}).
  2. Rectify usages of small/large string mode. These changes will eliminate all direct comparisons with _BUF_SIZE in basic_string.
  • Introduces _SMALL_STRING_CAPACITY, which equals _BUF_SIZE - 1, and replace every _BUF_SIZE - 1 in basic_string with it.
  • Introduces _Entails_large_string(_Size/_Cap), which decides whether newly planned size/capacity requires large string mode.
  • Refines _Large_string_engaged logic, and fixes a relevant misuse in operator=(const string&) (which will break invariant, and should be _Entails_large_string(_Mysize) instead).
  • Introduces _LEAST_ALLOCATION_SIZE for _Move_assign_from_buffer and _Release_to_buffer.
  • These changes will eliminate all direct comparisons with _BUF_SIZE in basic_string.
  1. Rectify the relation between capacity and allocation size.
  • Introduces _Allocate_for_capacity, which does stl check and lifetime activation, and takes capacity as input. This will eliminate all (al.allocate(res+1) ... _Start_element_lifetimes) pattern.
  • Introduces _Deallocate_for_capacity counterpart.
  • These two changes will remove all direct usages of al.deallocate, and only one wild al.allocate(without lifetime startup) is left in the interaction part with basic_stringbuf.

2 and 3 will make it easier to keep the following invariant:
capacity always== _SMALL_STRING_CAPACITY in small mode && capacity always> _SMALL_STRING_CAPACITY in large mode

@achabense achabense requested a review from a team as a code owner June 24, 2023 09:19
@achabense

This comment was marked as resolved.

@achabense

This comment was marked as resolved.

make the behavior in `operator=(const basic_string&::if (_Al != _Right_al)` normal
@achabense

This comment was marked as outdated.

_Tidy_deallocate();
_Traits::copy(_Mypair._Myval2._Bx._Buf, _Right_ptr, _Right_size + 1);
_Mypair._Myval2._Mysize = _Right_size;
_Mypair._Myval2._Myres = _SMALL_STRING_CAPACITY;
Copy link
Contributor Author

@achabense achabense Jun 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think dropping _Memcpy_val_from optimization (was in _Copy_assign_val_from_small) is ok here.
Otherwise we have to deal with three cases:
right is in small mode(res) and in non-constexpr context (and !defined(_INSERT_STRING_ANNOTATION)) -> can do _Memcpy_val_from
right is small(size) -> copy str without allocation
right is large(size) -> allocate and copy str

Hm... seems not hard to add the optimization back?

@@ -3165,21 +3149,24 @@ public:
auto&& _Right_alproxy = _GET_PROXY_ALLOCATOR(_Alty, _Right_al);
_Container_proxy_ptr<_Alty> _New_proxy(_Right_alproxy, _Leave_proxy_unbound{}); // throws

if (_Right._Mypair._Myval2._Large_string_engaged()) {
const auto _New_size = _Right._Mypair._Myval2._Mysize;
const auto _New_capacity = _Calculate_growth(_New_size, 0, _Right.max_size());
Copy link
Contributor Author

@achabense achabense Jun 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part was broken. It is a misuse of _Large_string_engaged, and can cause allocation smaller than BUF_SIZE. (e.g. consider when large_string_engaged(), but size() == 0)

@achabense achabense marked this pull request as draft June 25, 2023 16:18
@achabense achabense marked this pull request as ready for review June 26, 2023 03:27
@StephanTLavavej

This comment was marked as resolved.

@StephanTLavavej

This comment was marked as resolved.

@StephanTLavavej StephanTLavavej added the enhancement Something can be improved label Jun 30, 2023
const _Elem* const _Right_ptr = _Right._Mypair._Myval2._Myptr();
if (_Entails_large_string(_Right_size)) {
const auto _New_capacity =
_Calculate_growth(_Right_size, _SMALL_STRING_CAPACITY, _Right.max_size());
Copy link
Contributor Author

@achabense achabense Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this fix is adopted, the second parameter of _Calculate_growth(x,y,z) will be uniformly _SMALL_STRING_CAPACITY and further rectification will be possible.

There is also a special use of _Calculate_growth(newsize) in _Construct_from_iter. According to impl logic it is performing capacity*=1.5 logic. What about replacing the grow logic here with _Reallocate_grow_by or push_back?

const size_type _New_capacity = _Calculate_growth(_My_data._Mysize);

stl/inc/xstring Outdated
@@ -2242,10 +2242,10 @@ public:
return _Result;
}

_NODISCARD static _CONSTEXPR20 bool _Large_string_engaged(size_type _Strcap) noexcept {
// decides whether string is in large mode; has no implication on largeness of string's size
_STL_INTERNAL_CHECK(_Strcap >= _SMALL_STRING_CAPACITY);
Copy link
Contributor Author

@achabense achabense Jul 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was this chec that were causing lots of test failures, as

  1. _Myres is(was) initialized to 0 on construction.
  2. In several places(like _Construct), _Large_string_engaged is called early before assignment of _Myres.

After making _Myres initialized to _SMALL_STRING_CAPACITY, I made some checks passed, however there were some errors in clang checks due to constexpr evaluation hit maximum step limit.

Here I decide to remove this check to pass those clang tests. However, I'm uncertain wether this check is helpful. Does the check make sense? Should I add it back?

size_type _Myres = 0; // current storage reserved for string
// invariant: _Myres >= _Mysize, and _Myres >= _SMALL_STRING_CAPACITY
size_type _Mysize = 0; // current length of string (size)
size_type _Myres = _SMALL_STRING_CAPACITY; // current storage reserved for string (capacity)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaving _Myres initialized to _SMALL_STRING_CAPACITY, as _String_val begins with SSO mode. This change will make some assignments redundant: (Candidates: (1) (2) (3); they are unchanged now)

@achabense achabense requested a review from StephanTLavavej July 3, 2023 17:25
@StephanTLavavej StephanTLavavej self-assigned this Jul 4, 2023
@StephanTLavavej
Copy link
Member

I am generally concerned about the magnitude of these changes; there are a lot of transformations happening here, and having them all in a single PR (without well-structured fine-grained commits) makes the validity of each transformation difficult to review. I think it's reviewable but I'll need some time.

@StephanTLavavej
Copy link
Member

I think I can group up these changes. Do I need to reorganize commits and reupload this pr?

If you have time, that would be great! I started doing that, but I ran out of time today (and will need to work on a Patch Tuesday toolset update tomorrow before I can resume reviewing this PR). I pushed a conflict-free merge with main plus a couple of changes for issues that I identified; please pick up these changes (assuming you agree with them, of course!).

You can see main...StephanTLavavej:STL:fine-grained for an idea of what commits would be easiest to review. When each commit is focused on a single transformation whose correctness can be verified at each location, or is focused on a single fix, then they can be reviewed sequentially much easier without the risk of missing anything.

We usually discourage force-pushes for PRs that are under review, since GitHub doesn't make it easy to see what changed. In this case, because I've paused reviewing and the other maintainers are all busy, it's your call - you can either force-push this PR, or you can close this PR and open a new one. Either way, I recommend verifying that your commit restructuring isn't introducing unintentional changes - you can simply git diff OLD_BRANCH NEW_BRANCH to verify that they're ending up with the same content.

Thanks! 😻 Your changes look like a great refactoring, I just want to be really careful about reviewing it given how complicated string is, and how absolutely critical its correctness is.

@StephanTLavavej StephanTLavavej removed their assignment Jul 11, 2023
@achabense
Copy link
Contributor Author

Ok, I will do that. I'm going to close this pr and make a new one.

@achabense achabense closed this Jul 11, 2023
@achabense achabense deleted the _For_string2 branch August 11, 2023 02:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Something can be improved
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants