Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using tree-sitter for syntax highlighting? #658

Open
Gleek opened this issue Apr 7, 2021 · 10 comments
Open

Consider using tree-sitter for syntax highlighting? #658

Gleek opened this issue Apr 7, 2021 · 10 comments

Comments

@Gleek
Copy link

Gleek commented Apr 7, 2021

emacs-tree-sitter works by using tree-sitter grammar files to incrementally do syntax highlighting. It also has a support for php using php-grammar. Considering that the grammar files are higher type language than regex, these would provide more accurate syntax highlighting than it's regex counterparts.

In my testing it also gives a much smoother experience for large files than the default php-mode highlighting.

On my machine for large files php-syntax-propertize-hash-line-comment > (move-beginning-of-line 2) > (line-move) takes a lot of time while normal typing
And php-syntax-propertize-extend-region takes minutes sometimes when adding a stray quote.

These problems don't exist on tree-sitter and disabling these problematic functions to only use tree-sitter gives about ~50ms typing latency, irrespective of the file size. This isn't real-time as well but these aren't clean benchmarks and I had other applications running in the background at the time. For reference typing latency in fundamental mode was ~25ms on the same file. For default php-mode even though the latency in small files is low large files showed about 600ms to 2.5secs. This does not include typing in quotes (') which completely freezes emacs for multiple seconds, if not a minute or two.

These are the changes I did to the php-mode function after enabling tree-sitter to get the results above.

(defun return-false(&rest _)
  "Return nil no matter what the inputs here.
Useful to override functions to become empty"
  nil)

(setq php-syntax-propertize-functions nil)
(advice-add 'php-syntax-propertize-extend-region :override #'return-false)
(remove-hook 'syntax-propertize-extend-region-functions #'php-syntax-propertize-extend-region)

I'm unaware of the feasibility or the complexity involved in integrating these two packages, but thought we can start the discussion around this, considering the benefits it might yield.

Debug info

--- PHP-MODE DEBUG BEGIN ---
versions: GNU Emacs 28.0.50 (build 2, x86_64-apple-darwin19.6.0, NS appkit-1894.60 Version 10.15.7 (Build 19H114))
 of 2021-01-31; PHP Mode 1.24.0; Cc Mode 5.35.1)
package-version: 20210310.1724
major-mode: php-mode
minor-modes: (shell-dirtrack-mode lsp-diagnostics-mode lsp-modeline-workspace-status-mode lsp-modeline-diagnostics-mode lsp-modeline-code-actions-mode lsp-ui-mode lsp-ui-doc-mode lsp-completion-mode dap-tooltip-mode dap-ui-many-windows-mode dap-ui-controls-mode dap-ui-mode treemacs-filewatch-mode treemacs-follow-mode treemacs-git-mode treemacs-fringe-indicator-mode dap-auto-configure-mode dap-mode lsp-managed-mode lsp-mode ws-butler-mode yas-minor-mode auto-insert-mode org-wild-notifier-mode ivy-rich-mode tree-sitter-hl-mode tree-sitter-mode ivy-mode smooth-scroll-mode show-paren-mode which-key-mode smartparens-mode undo-tree-mode persistent-scratch-autosave-mode save-place-mode git-gutter-mode eros-mode highlight-numbers-mode company-box-mode company-mode flycheck-posframe-mode origami-mode hl-line-mode display-line-numbers-mode whitespace-mode projectile-mode flycheck-mode subword-mode selected-minor-mode +popup-mode recentf-mode doom-modeline-mode solaire-mode key-chord-mode tooltip-mode eldoc-mode electric-indent-mode mouse-wheel-mode tab-bar-mode file-name-shadow-mode font-lock-mode auto-composition-mode auto-encryption-mode auto-compression-mode size-indication-mode column-number-mode line-number-mode transient-mark-mode abbrev-mode)
variables: ((indent-tabs-mode nil) (tab-width 4))
custom variables: ((php-executable /usr/local/bin/php) (php-site-url https://php.net/) (php-manual-url en) (php-search-url nil) (php-class-suffix-when-insert ::) (php-namespace-suffix-when-insert \) (php-default-major-mode php-mode) (php-html-template-major-mode web-mode) (php-blade-template-major-mode web-mode) (php-template-mode-alist ((\.blade . web-mode) (\.phpt\' . php-mode) (\.phtml\' . web-mode))) (php-mode-maybe-hook nil) (php-default-builtin-web-server-port 3939) (php-re-detect-html-tag php-re-detect-html-tag-default) (php-search-documentation-browser-function nil))
c-indentation-style: symfony2
c-style-variables: ((c-basic-offset 4) (c-comment-only-line-offset 0) (c-indent-comment-alist ((anchored-comment column . 0) (end-block space . 1) (cpp-end-block space . 2))) (c-indent-comments-syntactically-p t) (c-block-comment-prefix * ) (c-comment-prefix-regexp ((pike-mode . //+!?\|\**) (awk-mode . #+) (other . //+\|\**))) (c-cleanup-list (scope-operator)) (c-hanging-braces-alist ((brace-list-open) (brace-entry-open) (statement-cont) (substatement-open after) (block-close . c-snug-do-while) (extern-lang-open after) (namespace-open after) (module-open after) (composition-open after) (inexpr-class-open after) (inexpr-class-close before) (arglist-cont-nonempty))) (c-hanging-colons-alist nil) (c-hanging-semi&comma-criteria (c-semi&comma-inside-parenlist)) (c-backslash-column 48) (c-backslash-max-column 72) (c-special-indent-hook nil) (c-label-minimum-indentation 1))
c-doc-comment-style: ((java-mode . javadoc) (pike-mode . autodoc) (c-mode . gtkdoc) (c++-mode . gtkdoc))
c-offsets-alist: ((inexpr-class . 0) (inexpr-statement . +) (lambda-intro-cont . +) (inlambda . 0) (template-args-cont c-lineup-template-args +) (incomposition . +) (inmodule . +) (innamespace . +) (inextern-lang . +) (composition-close . 0) (module-close . 0) (namespace-close . 0) (extern-lang-close . 0) (composition-open . 0) (module-open . 0) (namespace-open . 0) (extern-lang-open . 0) (objc-method-call-cont c-lineup-ObjC-method-call-colons c-lineup-ObjC-method-call +) (objc-method-args-cont . c-lineup-ObjC-method-args) (objc-method-intro . [0]) (friend . 0) (cpp-define-intro c-lineup-cpp-define +) (cpp-macro-cont . +) (cpp-macro . [0]) (inclass . +) (stream-op . c-lineup-streamop) (arglist-cont-nonempty first php-lineup-cascaded-calls php-c-lineup-arglist) (arglist-cont first php-lineup-cascaded-calls 0) (comment-intro . 0) (catch-clause . 0) (else-clause . 0) (do-while-closure . 0) (access-label . -) (case-label . +) (substatement . +) (statement-case-intro . +) (statement . 0) (brace-entry-open . 0) (brace-list-entry . 0) (brace-list-close . 0) (block-close . 0) (block-open . 0) (inher-cont . c-lineup-multi-inher) (inher-intro . +) (member-init-cont . c-lineup-multi-inher) (member-init-intro . +) (annotation-var-cont . +) (annotation-top-cont . 0) (topmost-intro . 0) (knr-argdecl . 0) (func-decl-cont . +) (inline-close . 0) (class-close . 0) (class-open . 0) (defun-block-intro . +) (defun-close . 0) (defun-open . 0) (c . c-lineup-C-comments) (string . c-lineup-dont-change) (topmost-intro-cont first php-lineup-cascaded-calls +) (brace-list-intro . +) (brace-list-open . 0) (inline-open . 0) (arglist-close . php-lineup-arglist-close) (arglist-intro . php-lineup-arglist-intro) (statement-cont . php-lineup-hanging-semicolon) (statement-case-open . 0) (label . +) (substatement-label . 2) (substatement-open . 0) (knr-argdecl-intro . +) (statement-block-intro . +))
buffer: (:length 11655)
@cjohansson
Copy link
Member

If you use this plugin https://github.com/cjohansson/emacs-phps-mode syntax highlightning is done asynchronously and according to PHP 8.0 lex analyzer but also in pure elisp

@zonuexe
Copy link
Member

zonuexe commented Apr 23, 2021

@Gleek Thank you for suggestion.

Next week is a Japanese holiday, so I'll consider other syntax highlighting issues as well.

@phil-s
Copy link
Contributor

phil-s commented Nov 12, 2021

It's highly probable that https://archive.casouri.cat/note/2021/emacs-tree-sitter/ will be part of Emacs 29.

I suggest testing with that, and offering feedback if you're able.

See also https://www.reddit.com/r/emacs/comments/pxpq8d/rfc_emacs_treesitter_integration/

@claytonrcarter
Copy link

I'm a bit new to the ecosystem, but my understanding is that tree-sitter will in fact be part of Emacs 29. In fact, emacs-devel is pushing to update built-in major modes for emacs 29: https://lists.gnu.org/archive/html/emacs-devel/2022-10/msg00707.html

What does php-mode need to do to prepare for compatibility w/ emacs 29? If I understand correctly, php-mode will continue to work as is, but it won't take advantage of what tree-sitter offers.

I'm a bit familiar with tree-sitter from other editors, and I'd be delighted to help out on this, if so desired.

@claytonrcarter
Copy link

BTW one thing I recall from my past work with tree-sitter-php is that it literally only supports PHP, so any support for HTML included in a PHP file would be lost (probably not a huge deal), as would highlighting of phpDoc comments (which probably is a big deal) and anything else that's not strictly PHP.

tree-sitter handles such things by handing them off to other tree-sitter parsers via what they call "injections". These are pretty easy to work with, as I recall, but require that buffers can work highlighting via multiple modes. (Again, I'm new around here, so maybe this won't be an issue, but I see several other open issues about mmm/poly-mode, etc, so maybe it will be an issue?) Thanks again!

@KaranAhlawat
Copy link

Any updates on this? Any plans and such? Tree-sitter based parsing also plays well with packages like Combobulate.

@piotrkwiecinski
Copy link
Contributor

@KaranAhlawat the initial was around tree-sitter started here https://github.com/emacs-php/php-ts-mode but it's going to take some time.

@KaranAhlawat
Copy link

This is great news! And I also understand it takes time and effort, both of which aren't free. I'm developing a TS mode myself, for Scala.

@yugaego
Copy link

yugaego commented Jun 14, 2024

For reference, the work is being done to ship php-ts-mode as part of Emacs 30. The progress can be tracked at Emacs Bug Tracker.

@phil-s
Copy link
Contributor

phil-s commented Jan 7, 2025

I'm trying out php-ts-mode in Emacs 30.0.93 at present, and first impressions are good.

I do have a generated PHP file comprising a single 10,000 line function, and using that as a test case I see that performance suffers significantly in some situations -- the observed case being typing RET with electric-indent-mode enabled, which takes about 1 second. This is some consequence of it all being one function -- if I turn that same file into many functions, the performance issue goes away.

That edge-case aside, it seems pretty speedy.

Commentary says:

This package provides `php-ts-mode' which is a major mode
for editing PHP files with embedded HTML, JavaScript, CSS and phpdoc.
Tree Sitter is used to parse each of these languages.

Please note that this package requires `html-ts-mode', which
registers itself as the major mode for editing HTML.

This package is compatible and has been tested with the following
tree-sitter grammars:

Features

  • Indent
  • IMenu
  • Navigation
  • Which-function
  • Flymake
  • Tree-sitter parser installation helper
  • PHP built-in server support
  • Shell interaction: execute PHP code in a inferior PHP process

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants