-
Notifications
You must be signed in to change notification settings - Fork 723
Comment parser #11252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Comment parser #11252
Conversation
This token is not needed, we will later use the position information to pad each token.
... which is not handling at all for the time being
|
Thank you for your comment @andreabedini :)
That looks very interesting, but how would I deal with files that are just comments? To the point of view of readFields they should be valid yet we would have no I do think your model is very interesting so if you have the time to, please show working PR against mine so we can simply merge it in 🙏
Could you elaborate which packages are these? I would love to have more insight on how people solve similar problems. Are there other design issues that needs to be addressed ? |
Are they valid Cabal files if there is nothing but comments? I don't think so.
For instance, annotateFieldsWithSource :: ByteString -> [Field Position] -> [Field ByteString]which annotates each field with its source including all adjacent comments. It can be modified to return |
| match _ = Nothing | ||
|
|
||
| -- | Collect comments into a map. The second field of the output will have no comment | ||
| extractComments :: Ord ann => [Field ann] -> (Map.Map ann ByteString, [Field ann]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you possibly outline how the output of this function is supposed to be used? Now that we detached comments from fields, how do we reconstruct the original document? How do we do it if [Field ann] is programmatically updated (say, adding or removing elements)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you possibly outline how the output of this function is supposed to be used?
This function recursively extracts the comments from the fields (the updated design variant does the same thing, albeit being more concise).
It is used to extract the comments out right before running the parseFieldGrammar parser in src/Distribution/PackageDescription/Parsec.hs. This allows us to not change the grammar and be able to directly inject the comments into the GenericPackageDescription.
Now that we detached comments from fields, how do we reconstruct the original document?
We can't yet. Following Jappie's proposal, we need to store the exact position of each field as a map (called exactPositions) indexed by the path in the rose tree (typed [NameSpace], a list of path segments). And then we will have the right information to construct the exact printer.
How do we do it if [Field ann] is programmatically updated (say, adding or removing elements)?
To quote Jappie:
The issue for addition is that you now have to invent exact positions. for removal, if it involves a line, you've to fix up all following lines, (and it has to know something was removed).
I haven't thought about this thoroughly because it's out of scope of this PR, but it should be very much feasible.
Pretty sure you need at minimum |
|
Hi friends, thanks for all your responses. Leana needs some time to read up on the exact proposal to see how it all fits together before replying. |
|
Cabal is one of the toughest code bases I ever worked on, so I'm quite amazed by Leana making progress so quickly! |
We need to look into how to wire the output for it to hold the comments in the right position.
|
Thank you Bodigrim and Jappie for your kind words! That means a lot to me, I'm glad to be on the right track. I have started (and completed) to rewrite my PR using Andrea's approach.
Good to know. Currently the top level parser drops the comments consumed if there are no fields to attach them to. Let me know what you think about the change :) |
This fixes builds for old GHC
|
Here are the benchmark results. The baseline has been rerun because I did these ones on a VPS machine, and they are not comparable to the last ones I ran on my machine. # baseline
leana@Ubuntu-2404-noble-amd64-base:~/cabal$ hyperfine './validate.sh --partial-hackage-tests'
Benchmark 1: ./validate.sh --partial-hackage-tests
Time (mean ± σ): 253.353 s ± 9.520 s [User: 196.642 s, System: 60.881 s]
Range (min … max): 241.649 s … 271.207 s 10 runs
# this PR
leana@Ubuntu-2404-noble-amd64-base:~/cabal$ hyperfine --setup "./validate.sh --partial-hackage-tests" "./validate.sh --partial-hackage-tests"
Benchmark 1: ./validate.sh --partial-hackage-tests
Time (mean ± σ): 253.163 s ± 7.451 s [User: 196.432 s, System: 58.507 s]
Range (min … max): 239.373 s … 266.656 s 10 runs |
In my prototype I have replaced Where the extra annotation is for anything coming after the last field.
In addition to @Bodigrim's
I am available to discuss and support her effort. @leana8959 I'll reach out privately.
I warmly second this! |
|
@leana8959, @andreabedini: how is the private communication going? We are interested too! Could we help somehow? |
|
for clarity: These parser changes are ready for review as far as leana and me are concerned, meanwhile we've moved over to import stanza retention in GenericPackageDescription (they currently get merged into the stanza's). This is an independent change of the parser changes here. After that we can start on exact print propper. |
Please don't. A lot of code wants an elaborated (= stripped down of syntactic convenience) representation of package description, and GPD serves that now. Your parsing changes leak down the pipeline where they shouldn't. E.g. things like solver works with GPD, and it really shouldn't care about whether import stanzas were used to declare a package or not. |
|
This PR isn't about that, |
This is the first part of the exact print parser. In this PR I changed the lexer so instead of dropping the comments it emits them to the parser which is further stored in
GenericPackageDescription.Please let me know your thoughts!
Checklist below:
This PR modifies behaviour or interface
Include the following checklist in your PR:
significance: significantin the changelog file.