Skip to content

Conversation

@davidwendt
Copy link
Contributor

Description

Fixes cudf::strings::zfill to match Python's zfill behavior. This will match Pandas 1.5 zfill as well.
The new behavior correctly skips the leading sign character when applying the '0' character fill.
Updates gtests and added more test data.
The pytest was updated to xfail for test data with leading sign characters until Pandas 1.5 is supported.
The Java tests did not include any test data with sign characters.

Closes #11632

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@davidwendt davidwendt added bug Something isn't working 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. strings strings issues (C++ and Python) breaking Breaking change labels Aug 31, 2022
@davidwendt davidwendt self-assigned this Aug 31, 2022
@github-actions github-actions bot added the Python Affects Python cuDF API. label Aug 31, 2022
@codecov
Copy link

codecov bot commented Aug 31, 2022

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.10@8ad0290). Click here to learn what that means.
Patch has no changes to coverable lines.

Additional details and impacted files
@@               Coverage Diff               @@
##             branch-22.10   #11634   +/-   ##
===============================================
  Coverage                ?   86.41%           
===============================================
  Files                   ?      145           
  Lines                   ?    22993           
  Branches                ?        0           
===============================================
  Hits                    ?    19869           
  Misses                  ?     3124           
  Partials                ?        0           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@davidwendt davidwendt added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Sep 1, 2022
@davidwendt davidwendt marked this pull request as ready for review September 1, 2022 16:57
@davidwendt davidwendt requested review from a team as code owners September 1, 2022 16:57
Comment on lines +182 to +186
// if the string starts with a sign, output the sign first
if (!d_str.empty() && (*in_ptr == '-' || *in_ptr == '+')) {
*out_ptr++ = *in_ptr++;
d_str = string_view{in_ptr, d_str.size_bytes() - 1};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is the actual meat of this PR, right? The rest just looks like ancillary cleanup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. Sometimes I cannot help myself. Especially in cleaning up code I wrote awhile back.

["³", "⅕", ""],
pytest.param(
["hello", "there", "world", "+1234", "-1234", None, "accént", ""],
marks=pytest.mark.xfail(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once this PR is updated, @galipremsagar can you remove this in #11617?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@galipremsagar
Copy link
Contributor

@davidwendt is this ready to merge?

@davidwendt
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit dc0d8d1 into rapidsai:branch-22.10 Sep 2, 2022
@davidwendt davidwendt deleted the zfill-like-python branch September 2, 2022 15:11
@galipremsagar galipremsagar mentioned this pull request Sep 12, 2022
14 tasks
rapids-bot bot pushed a commit that referenced this pull request Sep 21, 2022
This PR introduces `pandas-1.5` support in `cudf`. The changes include:

- [x] Requires `group_keys` support in `groupby` for `dask_cudf` to work: #11659
- [x] Requires `zfill` updates to match `pandas-1.5` behavior: #11634
- [x] `where` API: Ability to inspect a scalar value if it can be fit into the existing dtype, similar to: pandas-dev/pandas#48373
- [x] Switches `ValueError` to `TypeError` when an unknown category is being set to a `CategoricalColumn`
- [x] Handles breaking change of an `ArrowIntervalType` related import that has resulted in `cudf` to error on import itself.
- [x] Fix an issue with `IntervalColumn.to_pandas`.
- [x] Raises error when an object of `boolean` dtype is being set to a `NumericalColumn`.
- [x] Raises error when `pat` is None in `Series.str.startswith` & `Series.str.endswith`.
- [x] Add `IntervalDtype.to_pandas` with appropriate versioning.
- [x] Handle `get_window_bounds` signature changes.
- [x] Fix and version a bunch of pytests.

```python
branch-22.10:

== 4275 failed, 79837 passed, 2049 skipped, 1193 xfailed, 1923 xpassed, 6597 warnings, 4 errors in 1103.52s (0:18:23) ==
== 803 failed, 106 passed, 14 skipped, 14 xfailed, 324 warnings, 17 errors in 148.46s (0:02:28) ==

This PR:

== 84041 passed, 2049 skipped, 1199 xfailed, 1710 xpassed, 6599 warnings in 359.27s (0:05:59) ==
== 954 passed, 14 skipped, 7 xfailed, 3 xpassed, 580 warnings in 54.75s ==
```

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Ashwin Srinath (https://github.com/shwina)
  - Matthew Roeschke (https://github.com/mroeschke)
  - Mark Sadang (https://github.com/msadang)

URL: #11617
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3 - Ready for Review Ready for review by team breaking Breaking change bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. Python Affects Python cuDF API. strings strings issues (C++ and Python)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] zfill needs to align with python standard library in some cases

4 participants