Remove references to Unicode objects being ready #130790

picnixz · 2025-03-03T13:07:52Z

We have some code referrencing readiness of Unicode objects but this property is deprecated (see #129894 (comment)). Following @encukou's advice, we should address each module with a separate PR so that experts can review them separately.

`unicodedata.c`

cpython/Modules/unicodedata.c

Lines 594 to 596 in a85eeb9

    
           /* result is guaranteed to be ready, as it is compact. */ 
        
           kind = PyUnicode_KIND(result); 
        
           data = PyUnicode_DATA(result);

cpython/Modules/unicodedata.c

Lines 655 to 661 in a85eeb9

    
           result = nfd_nfkd(self, input, k); 
        
           if (!result) 
        
               return NULL; 
        
           /* result will be "ready". */ 
        
           kind = PyUnicode_KIND(result); 
        
           data = PyUnicode_DATA(result); 
        
           len = PyUnicode_GET_LENGTH(result);

`_io/textio.c`

cpython/Modules/_io/textio.c

Lines 357 to 363 in a85eeb9

    
           kind = PyUnicode_KIND(modified); 
        
           out = PyUnicode_DATA(modified); 
        
           PyUnicode_WRITE(kind, out, 0, '\r'); 
        
           memcpy(out + kind, PyUnicode_DATA(output), kind * output_len); 
        
           Py_SETREF(output, modified); /* output remains ready */ 
        
           self->pendingcr = 0; 
        
           output_len++;

cpython/Modules/_io/textio.c

Lines 1821 to 1824 in a85eeb9

    
           /* decoded_chars is guaranteed to be "ready". */ 
        
           avail = (PyUnicode_GET_LENGTH(self->decoded_chars) 
        
                    - self->decoded_chars_used);

`Parser` files

cpython/Parser/lexer/lexer.c

Lines 311 to 315 in a85eeb9

    
           /* Verify that the identifier follows PEP 3131. 
        
              All identifier strings are guaranteed to be "ready" unicode objects. 
        
            */ 
        
           static int 
        
           verify_identifier(struct tok_state *tok)

cpython/Parser/pegen.c

Lines 505 to 513 in a85eeb9

    
           PyObject * 
        
           _PyPegen_new_identifier(Parser *p, const char *n) 
        
           { 
        
               PyObject *id = PyUnicode_DecodeUTF8(n, (Py_ssize_t)strlen(n), NULL); 
        
               if (!id) { 
        
                   goto error; 
        
               } 
        
               /* PyUnicode_DecodeUTF8 should always return a ready string. */ 
        
               assert(PyUnicode_IS_READY(id));

`tracemalloc.c`

cpython/Python/tracemalloc.c

Lines 252 to 259 in a85eeb9

    
               if (!PyUnicode_IS_READY(filename)) { 
        
                   /* Don't make a Unicode string ready to avoid reentrant calls 
        
                      to tracemalloc_alloc() or tracemalloc_realloc() */ 
        
           #ifdef TRACE_DEBUG 
        
                   tracemalloc_error("filename is not a ready unicode string"); 
        
           #endif 
        
                   return; 
        
               }

This one seems to be dead code (cc @vstinner)

Linked PRs

gh-130790: Remove references about unicode's readiness from comments #130801

The text was updated successfully, but these errors were encountered:

vstinner · 2025-03-03T15:07:16Z

If it's just a few lines in each C file, a single PR would be more convenient.

sergey-miryanov · 2025-03-03T17:32:23Z

@picnixz Do you plan to work on this, or can I take it?

picnixz · 2025-03-03T17:35:38Z

I was planning to do it tomorrow, but feel free to take it. It's just that I don't know if we need to split into multiple PRs or in one. I haven't looked at the code exactly to see whether we can safely remove the comment/code so there might be some conditions that still need to be checked.

sergey-miryanov · 2025-03-03T18:36:18Z

@picnixz I allowed a little bit more to myself and updated the comments to public API. Is this ok?

…130801)

picnixz added interpreter-core (Objects, Python, Grammar, and Parser dirs) extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error labels Mar 3, 2025

bedevere-app bot mentioned this issue Mar 3, 2025

gh-130790: Remove references about unicode's readiness from comments #130801

Merged

picnixz added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Mar 3, 2025

vstinner pushed a commit that referenced this issue Mar 3, 2025

gh-130790: Remove references about unicode's readiness from comments (#…

3a7f17c

…130801)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove references to Unicode objects being ready #130790

Remove references to Unicode objects being ready #130790

picnixz commented Mar 3, 2025 •

edited by bedevere-app bot

Loading

vstinner commented Mar 3, 2025

sergey-miryanov commented Mar 3, 2025

picnixz commented Mar 3, 2025

sergey-miryanov commented Mar 3, 2025

Remove references to Unicode objects being ready #130790

Remove references to Unicode objects being ready #130790

Comments

picnixz commented Mar 3, 2025 • edited by bedevere-app bot Loading

unicodedata.c

_io/textio.c

Parser files

tracemalloc.c

Linked PRs

vstinner commented Mar 3, 2025

sergey-miryanov commented Mar 3, 2025

picnixz commented Mar 3, 2025

sergey-miryanov commented Mar 3, 2025

picnixz commented Mar 3, 2025 •

edited by bedevere-app bot

Loading

`unicodedata.c`

`_io/textio.c`

`Parser` files

`tracemalloc.c`