Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove references to Unicode objects being ready #130790

Open
picnixz opened this issue Mar 3, 2025 · 4 comments
Open

Remove references to Unicode objects being ready #130790

picnixz opened this issue Mar 3, 2025 · 4 comments
Labels
extension-modules C modules in the Modules dir interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@picnixz
Copy link
Member

picnixz commented Mar 3, 2025

We have some code referrencing readiness of Unicode objects but this property is deprecated (see #129894 (comment)). Following @encukou's advice, we should address each module with a separate PR so that experts can review them separately.

unicodedata.c

/* result is guaranteed to be ready, as it is compact. */
kind = PyUnicode_KIND(result);
data = PyUnicode_DATA(result);

result = nfd_nfkd(self, input, k);
if (!result)
return NULL;
/* result will be "ready". */
kind = PyUnicode_KIND(result);
data = PyUnicode_DATA(result);
len = PyUnicode_GET_LENGTH(result);

_io/textio.c

kind = PyUnicode_KIND(modified);
out = PyUnicode_DATA(modified);
PyUnicode_WRITE(kind, out, 0, '\r');
memcpy(out + kind, PyUnicode_DATA(output), kind * output_len);
Py_SETREF(output, modified); /* output remains ready */
self->pendingcr = 0;
output_len++;

cpython/Modules/_io/textio.c

Lines 1821 to 1824 in a85eeb9

/* decoded_chars is guaranteed to be "ready". */
avail = (PyUnicode_GET_LENGTH(self->decoded_chars)
- self->decoded_chars_used);

Parser files

/* Verify that the identifier follows PEP 3131.
All identifier strings are guaranteed to be "ready" unicode objects.
*/
static int
verify_identifier(struct tok_state *tok)

cpython/Parser/pegen.c

Lines 505 to 513 in a85eeb9

PyObject *
_PyPegen_new_identifier(Parser *p, const char *n)
{
PyObject *id = PyUnicode_DecodeUTF8(n, (Py_ssize_t)strlen(n), NULL);
if (!id) {
goto error;
}
/* PyUnicode_DecodeUTF8 should always return a ready string. */
assert(PyUnicode_IS_READY(id));

tracemalloc.c

if (!PyUnicode_IS_READY(filename)) {
/* Don't make a Unicode string ready to avoid reentrant calls
to tracemalloc_alloc() or tracemalloc_realloc() */
#ifdef TRACE_DEBUG
tracemalloc_error("filename is not a ready unicode string");
#endif
return;
}

This one seems to be dead code (cc @vstinner)

Linked PRs

@picnixz picnixz added interpreter-core (Objects, Python, Grammar, and Parser dirs) extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error labels Mar 3, 2025
@vstinner
Copy link
Member

vstinner commented Mar 3, 2025

If it's just a few lines in each C file, a single PR would be more convenient.

@sergey-miryanov
Copy link
Contributor

@picnixz Do you plan to work on this, or can I take it?

@picnixz
Copy link
Member Author

picnixz commented Mar 3, 2025

I was planning to do it tomorrow, but feel free to take it. It's just that I don't know if we need to split into multiple PRs or in one. I haven't looked at the code exactly to see whether we can safely remove the comment/code so there might be some conditions that still need to be checked.

@sergey-miryanov
Copy link
Contributor

@picnixz I allowed a little bit more to myself and updated the comments to public API. Is this ok?

@picnixz picnixz added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Mar 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension-modules C modules in the Modules dir interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants