[BC] Split parser into php_only parser and php parser (combined php/html) #180

calebdw · 2023-06-26T16:51:56Z

Hello!

This is a rough draft at splitting the parser into two separate parsers: one for just PHP (which allows top level statements, start/end tags are optional), and one that parsers both PHP and HTML (which reuses the PHP-only parser to avoid duplication).

This is necessary because most modern PHP projects do not have monolithic HTML/PHP files but rather most likely follow some sort of MVC pattern/framework (I'm approaching this with Laravel in mind) where the HTML is stored in separate files (e.g., "views") and a templating engine (e.g., Blade, twig, etc.) is used to process the templates into cached php/html files. However, we need to be able to inject PHP into nodes that do not have the php tags---see #168 (provides both Blade and markdown examples), and EmranMR/tree-sitter-blade#5. This is currently not possible with the existing parser as the text is too hungry and there's not a good way of determining if something should be text or a top level statement.

I've tried to follow the same split-parser pattern that the markdown parser uses---at this point I could certainly use some feedback / help on things I'm not too familiar with (like ensuring all of the bindings are properly setup). Things like node/parser names can certainly be updated if necessary (e.g., php_only could be renamed to phpo, php_inline, or php_base...whichever makes the most sense).

Thoughts @maxbrunsfeld? Is this the best path forward (in order to accept top level statements and being able to inject PHP into nodes that do not contain php start/end tags) or is there pattern you'd like to see?

Thanks!

Checklist:

All tests pass in CI
There are sufficient tests for the new fix/feature
Grammar rules have not been renamed unless absolutely necessary (x rules renamed)
The conflicts section hasn't grown too much (x new conflicts)
The parser size hasn't grown too much (master: STATE_COUNT, PR: STATE_COUNT) (check the value of STATE_COUNT in src/parser.c)

cfroystad · 2023-07-04T11:31:51Z

Just popping by to say that I applaud the effort, but it will be a few weeks before I'll find the time to do a proper review (someone else might be faster of course). Just to manage expectations 🙂

EmranMR · 2023-08-21T09:27:49Z

Hi @calebdw I am not a C Programmer and I have no idea how the Makefile works or gets structured. But, I reckon, we need one in each folder? one for tree-sitter-php and one for tree-sitter-php-only?

The current Makfile also utilises the git repo URL to declare the name. This will default the PARSER_NAME for both folders php, producing conflicting name for the .dylib?

Regardless, I couldn't get this Makefile work using the compile_parser.sh provided by Nova anyways. It output an unusual name. So I just used their complimentary template Makefile instead, hardcoded the php_only as PARSER_NAME and it finally output me a functioning .dylib . 👍

Both of which can be found here for examination (The link downloads the zip file containing Makefile & compile_parser.sh supplied by Panic/Nova)

Nova tree-sitter Compiling Docs: here

cfroystad · 2023-10-21T07:46:07Z

Sorry for the extended delay. Life and all of that happened...

I really like that this approach will allow supporting more use cases and make it easier to reuse the parser across project. I suppose this could make life easier for the docblock parser as well.

What gives me pause is that it would be really nice to be able to do this without creating a new parser, since it's both basically PHP. However, I don't currently see a way around it. Though, I should be noted that I'm in no way an authority on the injection system in tree-sitter.

Another thing to consider is that this is a big change in how the parser is used, and will require quite a few projects to adapt.

I wonder if not the approach used by tree-sitter-typescript would be more beneficial here? That way I think we can share a common grammar for the base and have two different parsers where one would behave identically to the current parser (without needing two parse passes to obtain the result) and one that can be used inline.

calebdw · 2023-10-22T02:28:15Z

@cfroystad, no worries! My wife just had a baby so I totally understand :)

I agree that it would be best to not have to create another parser but I'm not sure if it's possible... I tried to allow top level statements similar to the Go parser (#177) but it turns out that it's really difficult to tell if something should be text or PHP without using the PHP tags---which is really no surprise since PHP was originally a templating language.

Thanks for pointing to the TS grammer! I haven't seen that pattern before---basically the grammar file is parameterized on the parser name which seems like it should work here...the only difference between the two parsers is the handling of the text.

I'll try to implement something similar in a separate branch

calebdw · 2023-10-22T02:43:45Z

@cfroystad, do you have any preference on the name of the new parser? Some thoughts are:

php_only/phpo
php_base
php_inline

cfroystad · 2023-10-22T18:07:43Z

Congratulations on the baby! 🙂

hm, naming is always difficult... First I leaned towards php_base, but it's not really a base. Then I thought php_inline sounded good, but it's also supporting php not being inline. I think php_only would be best.

In an ideal world, I'd like to name the "php_only" version "php" and the one including html "php_html" or something like that. But php_only would be my preference all things considered.

Thank you for working on this! It's much apprechiated!

calebdw · 2023-10-23T15:34:58Z

Closing---moving discussion to #192

calebdw · 2023-10-23T15:40:10Z

The current Makfile also utilises the git repo URL to declare the name. This will default the PARSER_NAME for both folders php, producing conflicting name for the .dylib?

@EmranMR, I never updated the Makefile in this branch---however, the git repo name is only used if a parser name is not provided. In the new branch you can use the following:

PARSER_NAME=php_only make

calebdw added 8 commits June 24, 2023 14:22

Split parser into php_only and php

5ff34b6

Fix php_only tests

c30f790

Fixed php parser

9c8e2aa

Add notes for swift bindings

c41072b

rebuilt parser

65d56b4

clean up php grammar

db34db8

update php node to php_only

7e18b24

update rust bindings

71b5dad

aryx requested review from maxbrunsfeld, patrickt and cfroystad June 28, 2023 14:03

EmranMR mentioned this pull request Jul 2, 2023

Neovim highlights and injections EmranMR/tree-sitter-blade#7

Closed

EmranMR mentioned this pull request Aug 19, 2023

little issue with sintax color EmranMR/tree-sitter-blade#26

Closed

This comment was marked as off-topic.

Sign in to view

calebdw mentioned this pull request Oct 23, 2023

Split parser attempt #2 #192

Merged

5 tasks

calebdw closed this Oct 23, 2023

HerringtonDarkholme mentioned this pull request Jan 28, 2024

[feature] use php-only-language in php pattern ast-grep/ast-grep#900

Open

calebdw deleted the split_parsers branch June 3, 2024 23:13

calebdw mentioned this pull request Aug 25, 2024

bug: opening PHP required for highlighting to work #253

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BC] Split parser into php_only parser and php parser (combined php/html) #180

[BC] Split parser into php_only parser and php parser (combined php/html) #180

calebdw commented Jun 26, 2023 •

edited

Loading

cfroystad commented Jul 4, 2023

EmranMR commented Aug 21, 2023

This comment was marked as off-topic.

cfroystad commented Oct 21, 2023

calebdw commented Oct 22, 2023

calebdw commented Oct 22, 2023

cfroystad commented Oct 22, 2023

calebdw commented Oct 23, 2023

calebdw commented Oct 23, 2023

[BC] Split parser into php_only parser and php parser (combined php/html) #180

[BC] Split parser into php_only parser and php parser (combined php/html) #180

Conversation

calebdw commented Jun 26, 2023 • edited Loading

cfroystad commented Jul 4, 2023

EmranMR commented Aug 21, 2023

This comment was marked as off-topic.

cfroystad commented Oct 21, 2023

calebdw commented Oct 22, 2023

calebdw commented Oct 22, 2023

cfroystad commented Oct 22, 2023

calebdw commented Oct 23, 2023

calebdw commented Oct 23, 2023

calebdw commented Jun 26, 2023 •

edited

Loading