Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve preprocessing performance #10534

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

parsonsmatt
Copy link
Collaborator

@parsonsmatt parsonsmatt commented Nov 6, 2024

This PR changes the behavior of the preprocessFile logic. Currently, it attempts to find a file matching a module name for every known preprocessor, before attempting to find a file with a plain Haskell suffix. The overwhelmingly common case should be a plain Haskell file. This results in a ton of extraneous lookups.

On Mercury's package with 11k modules, the existing preprocessFile takes around 1.41s according to the --verbosity="debug +timestamps" output. With this patch, we're at 0.59s.

Curiously, the impact is greater in situ - once configure is out of the way, plain cabal takes about 5s to do a no-op ghci eval, while this patch takes about 3.1s, almost 2s faster. Given that about 2 of the remaining seconds are actually running GHCi, that may be pretty close to optimal.


Please read Github PR Conventions and then fill in one of these two templates.


Template Α: This PR modifies behaviour or interface

Include the following checklist in your PR:

  • Patches conform to the coding conventions.
  • Any changes that could be relevant to users have been recorded in the changelog.
  • The documentation has been updated, if necessary.
  • Manual QA notes have been included.
  • Tests have been added. (Ask for help if you don’t know how to write them! Ask for an exemption if tests are too complex for too little coverage!)

Template B: This PR does not modify behaviour or interface

E.g. the PR only touches documentation or tests, does refactorings, etc.

Include the following checklist in your PR:

  • Patches conform to the coding conventions.
  • Is this a PR that fixes CI? If so, it will need to be backported to older cabal release branches (ask maintainers for directions).

@@ -614,8 +614,11 @@ generateCode
-> Verbosity
-> IO (SymbolicPath Pkg (Dir Source), [ModuleName.ModuleName])
generateCode codeGens nm pdesc bi lbi clbi verbosity = do
debug verbosity $ "generateCode: " <> prettyShow (package pdesc)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of extra debug statements here to help me identify where we were losing time. I can remove those if desired.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please can you remove these, I think you can see in the cost centre profile there is a hot-spot in findCwdWithExtension, which is also what you identified.

@@ -292,10 +300,11 @@ preprocessFile
-- ^ fail on missing file
-> IO ()
preprocessFile mbWorkDir searchLoc buildLoc forSDist baseFile verbosity builtinSuffixes handlers failOnMissing = do
debug verbosity $ "preprocessFile: " <> prettyShow baseFile
bsrcFiles <- findFileCwdWithExtension mbWorkDir builtinSuffixes (searchLoc ++ [buildAsSrcLoc]) baseFile
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, we try and find a regular Haskell file.

We also add buildAsSrcLoc] to the end of the list, since the locations are tried in-order. Most likely, we're going to be in the first directory we're searching in, so this should eliminate a lot of file lookups.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is a change in behaviour? Before, if you had a .hs and a .y file in the directory, would it prefer to regenerate the .hs from the .y file?

It seems that it would then prefer the generated .y file rather than the .hs file in the directory, because it searched the build directory first?

Could we have some tests to check this behaviour?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question - what is the expected behavior of having multiple files with different extensions in directory? Rather, what should happen if we have a directory project like this:

src/Foo.hs
src/Foo.y

The old code would ignore src/Foo.hs in favor of src/Foo.y. The new code ignores src/Foo.y in favor of src/Foo.hs.

This is definitely a change in behavior, but it would require a project to be poorly structured - somehow having Foo.hs and Foo.y together, with the expectation that Foo.hs is never used.

@parsonsmatt parsonsmatt force-pushed the mattp/preprocessing-speed branch from b6ae84f to e7f4251 Compare November 6, 2024 23:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants