Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unit test for glob() and partially fix Python 3.13 case. #476

Open
wants to merge 5 commits into
base: python-3-13
Choose a base branch
from

Conversation

richardxia
Copy link

This is a WIP, but I'm submitting it in case if the unit tests are still useful. As reported in #470 (comment), the glob() method no longer works in Python 3.13, returning an empty result regardless of the actual contents in Artifactory.

I haven't gotten this 100% working yet, but I've gotten a little bit closer to fixing it in Python 3.13. It looks like the pathlib.Path class changed a lot in Python 3.13 w.r.t. globbing behavior, where in Python 3.12 and lower, there was a full reimplementation of globbing inside the pathlib module, but in Python 3.13, it now delegates to the actual glob package.

One specific issue is that pathlib._abc.PurePathBase depends on the abstract glob._Globber class, but pathlib.PurePath overrides it to depend on glob._StringGlobber. _StringGlobber is problematic because it explicitly uses os.lstat and os.scandir instead of delegating to the Path subclass. What this means is that in Python 3.13, when you call ArtifactoryPath.glob(), you eventually end up calling os.scandir(), which fails because the path does not actually exist on disk locally.

The reason that ArtifactoryPath.glob() returns the empty list rather than throwing an exception is because in Python 3.13, any OSErrors that occur while calling glob() are ignored.

I have a partial fix, where I modified ArtifactoryPath to explicitly override the glob() method and delegate to pathlib._abc.PathBase's implementation rather than pathlib.Path's implementation (at least in Python 3.13). In the unit test I added, I tried to create mock Artifactory API response data that would correspond to a file tree that looks like:

.index/
com/
    foo
    bar

My unit test tries to run glob("**/*"), and in all Pythons < 3.13, it returns the four results of .index/, com/, com/foo, and com/bar. However, in Python 3.13, it currently is only returning the two files and not the directories on their own. I am not sure if my mock data is incorrect yet, or if there is some other Path API that I need to override.

I'm not sure if I'll have time to work on this for the rest of the day (US Pacific time), but I should have time to continue looking in the coming days. In the meantime, I encourage anyone else to take a look at the failures as well.

artifactory.py Outdated
if IS_PYTHON_3_13_OR_NEWER:
import glob

_globber = glob._Globber
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this used ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Python 3.13, the pathlib._abc.PurePathBase class uses it in the _glob_selector() method, and the _glob_selector() method is called by the glob() method. The pathlib.Path class also calls _glob_selector() in its own definition of the glob() method.

PurePathBase defines the variable in https://github.com/python/cpython/blob/v3.13.2/Lib/pathlib/_abc.py#L121, and Path overrides it in https://github.com/python/cpython/blob/v3.13.2/Lib/pathlib/_local.py#L105.

If we don't want to depend on a private API like this, then I'm afraid the only option in Python 3.13 would be to completely reimplement the glob() method from scratch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please add the comment here then ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I've added a comment, but I also ended up figuring out the remaining issue, which required me to change this to a custom subclass of _Globber, so I've changed line a bit more significantly in 1bcbd59

@richardxia
Copy link
Author

I believe I've figured out the remaining issue on Python 3.13, and I've pushed up a fix in 1bcbd59.

Unfortunately, this required subclassing the private glob._Globber class and overriding one of the methods with a copy of the original implementation but with a slight tweak to the math. I've tried to document this in some very detailed comments, but summarize the issue, the upstream implementation of _Globber doesn't work properly for non-string paths, since it wants to be able to add a trailing slash for the purposes of computing where a regex match should start looking for child files/directories, and when you try to add that slash to a Path subclass, it ends up stripping it away. I had to copy the upstream implementation of the method but modify to add 1 to the regex start position in order for the rest of the code to work properly.

Please let me know if this approach is OK or if there is some other way we should resolve this problem. On the plus side, the unit test I added is now passing for Python 3.13, so this at least works for that test case.

Copy link
Contributor

@offa offa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have test this PR on Python 3.13 and 3.12. The sorting of the results is different (shouldn't be relevant), but the result list contains the same entries. Thanks! 👍

Copy link
Member

@allburov allburov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It becomes more complex and complex to support all the version :(
Need to think and check how others do it 🤔

# isn't an issue in the actaul use of _Globber in Python, since it converts all paths to
# strings, and the add_slash() will literally append a slash character to the string
# path.
class _ArtifactoryGlobber(glob._Globber if IS_PYTHON_3_13_OR_NEWER else object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we inherit it from object always - will it act as usual?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class will be completely unused in the case of Python < 3.13. The conditional parent class is only to avoid hitting an import error in Python 3.12 and lower.

# strings, and the add_slash() will literally append a slash character to the string
# path.
class _ArtifactoryGlobber(glob._Globber if IS_PYTHON_3_13_OR_NEWER else object):
def recursive_selector(self, part, parts):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment where from you get the code for reference in the comment pls?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, just added it in 70cf96c

@richardxia
Copy link
Author

Thanks again for all the review. I believe I've addressed all comments, so please let me know if there's anything I else I can do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants