Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor: performance improvements #17

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

JonathanFeenstra
Copy link

@JonathanFeenstra JonathanFeenstra commented Oct 5, 2024

This refactor makes checking for plugins with form version 43 or lower a bit more efficient by:

  • Removing the unnecessary check to ensure the managed game is Skyrim Special Edition (this is already handled by the game dependency requirement)
  • Storing the full list of invalid plugins during the activeProblems check, so it doesn't need to be done again while creating the fullDescription
  • Using os.path for simple file operations like checking for existence and getting the base name, instead of creating full pathlib.Path objects to do so

I also added an extra 0 to the VersionInfo as the sub-sub-minor version and changed the release type to final (resolves #7) to make it less confusing. Due to a bug in the VersionInfo constructor overloads, this actually results in the same version as the current one. I suspect this is because the release type is converted to an int in Python and mobase.ReleaseType.PRE_ALPHA has a value of 0, so the release type is then misinterpreted as the sub-sub-minor version.

Of course, the actual performance impact is minimal, but I did a comparison out of curiosity:

Before:

before

After:

after

It does appear to have improved as a result of the refactor, especially fullDescription.

@AnyOldName3
Copy link
Member

Getting rid of Pathlib is something we'd consider bad. If Pathlib's too slow, then using Python in the first place is too slow. As this isn't performance-critical code, it's most important that it's readable and maintainable, and micro-optimisations usually make those properties worse.

Excluding the doing-the-check-twice part, the difference is only tens of microseconds, which is too small to get a statistically significant result from with the kinds of timers available to user processes without thousands of repeats, and for something like this, the time taken is likely to be dominated by non-code things like moving the relevant data from disk to memory and memory to cache as it won't be cached already, which would change if you were to try doing it loads of times in a loop. This isn't to say that os.path isn't faster, it's just that it'll be such a small difference that it won't be possible to measure.

For sequences like

        path = Path(file)
        if path.is_file():
            with path.open(mode="rb") as fp:

versus

        if os.path.exists(file):
            with open(file, mode="rb") as fp:

there's a good chance Pathlib would be faster if there's any difference. The part where Pathlib will suffer (creating the Path object and normalising it) only happens once, whereas both os.path.exists and open will need to convert the string from Python's internal representation (7-bit ASCII if all the characters fit, or if not, 16-bit UCS2 if all the characters fit, or if not, 32-bit UCS4) to Windows' internal representation (UTF-16 all the time), and that'll take a comparable amount of time to creating the Path object. There's not really any point thinking about it, as the real cost will be dominated by the actual disk access (unless you're checking the same file thousands of times in a loop), and that'll be the same for both approaches.

@JonathanFeenstra JonathanFeenstra force-pushed the refactor-performance-improvements branch from e6d5f15 to bfdf3bb Compare October 5, 2024 15:18
@JonathanFeenstra
Copy link
Author

I see, I haven't really looked into the differences between the pathlib and os.path operations in-depth. I just had the impression that creating Path objects would be slower since they store more information than strings and I don't see much of a difference in terms of readability/maintainability. Anyway, my main goal with this PR was only to remove the unnecessary game check, the rest is just some extra refactoring and is not that important to me. The time measurements were just to quickly check if my changes had any noticeable impact at first glance. With such small numbers, a full benchmark isn't worth the effort.

@JonathanFeenstra JonathanFeenstra force-pushed the refactor-performance-improvements branch from bfdf3bb to fb0d9be Compare October 5, 2024 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

We're not pre-alpha
2 participants