Improve file detection with signature check capabilities #2819

JoeKar · 2023-05-04T22:06:40Z

This allows more complex detection upon regex rules for a certain amount of lines and not just in the first. A similar approach is chosen by VIM as well.

Fix #2041, Fix #2166

JoeKar · 2023-05-15T16:44:25Z

@zyedidia
Did you already had some time to have a look at this?

JoeKar · 2023-06-08T21:15:53Z

@zyedidia:
In case this would be merged sometime I highly recommend to merge the plugin detectindent (many thanks to @dmaluka) into micro directly (with option to de-/activate it). This would have the advantage to detect the indentation in the same cycle instead of parsing the content a second and/or third time.

JoeKar · 2023-06-22T18:48:48Z

@zyedidia:
One more example failing without this PR:
A patch created with git format-patch isn't detect as patch, but as mail instead (starts with From ) due to alphabetical order of syntax files and the header-check of the same priority as the filename-check.
Therefore this PR gives the filename-check precedence.

JoeKar · 2023-08-18T21:02:33Z

Fix #2894

JoeKar · 2023-09-07T17:52:51Z

Unfortunately this PR didn't hit the v2.0.12, but maybe there is a chance for the v2.0.13.

@dmaluka:
Can I kindly ask for a review from your perspective too? :)

And in case you're in the right mood, there are more...
https://github.com/zyedidia/micro/pulls/JoeKar
:P

internal/buffer/buffer.go

pkg/highlight/ftdetect.go

runtime/syntax/objc.yaml

dmaluka · 2023-09-08T14:17:54Z

runtime/help/options.md

@@ -392,6 +392,11 @@ Here are the available options:

 	default value: `4`

+* `detectlimit`: if this is not set to 0, it will limit the automatic parsed


IMHO I'd be rather reluctant to adding new options unless we really know there is a real need for it. (I.e. in this particular case, I'd rather stay with the hardcoded value of 100 for the time being.) Or the user will end up with a zillion options and get lost in them.

To be honest I'm more a friend of having the choice rather been patronised. Just imagine a file with a very large comment block or other fancy header (larger than the e.g. hard coded 100) in front of the important decision mark. At the end the guess could be wrong. That was the reason for the option to give the choice to decide between more precision or faster parse time.

Yeah, I can easily imagine a large comment block in the beginning of a file. Good point.

Having a choice is great, but I also think we should try our best to make software do the right thing from the beginning, rather than leave this burden to the user.

In this regard, now I'm thinking (independently of the question whether this line limit should be configurable or not): wouldn't it be a good idea to increase the default value from 100 to e.g. 500 right away? I actually doubt it would cause any real slowdown in practice (but if we concerned about that, we can always verify it with profiling, micro now has -profile command-line flag for that).

BTW do you know how does it work in vim? i.e. how many lines are checked and how is it configured?

Vim does it the same, but the limit(s) per different checks is/are hard coded (see https://github.com/vim/vim/blob/master/runtime/autoload/dist/ft.vim). The idea was good, but I thought going one step ahead is maybe better and we don't want to be Vim...it's micro. ;)

internal/buffer/buffer.go

pkg/highlight/ftdetect.go

runtime/help/options.md

internal/config/settings.go

runtime/help/options.md

pkg/highlight/parser.go

dmaluka · 2023-09-09T13:34:14Z

@JoeKar regarding your proposal regarding detectindent:

@zyedidia: In case this would be merged sometime I highly recommend to merge the plugin detectindent (many thanks to @dmaluka) into micro directly (with option to de-/activate it). This would have the advantage to detect the indentation in the same cycle instead of parsing the content a second and/or third time.

This would mean essentially reimplementing it in Go to make it tightly integrated into micro. I'm not sure it's a good idea: micro's source code is already pretty complicated as it is, so in cases when we are able to separate some complexity away via a plugin (not at the cost of performance, correctness etc), I think we should take advantage of this possibility. It seems that detectindent is actually such a case.

But if you have some observations suggesting that having detectindent integrated with filetype detect would have a real performance advantage (e.g. if you can already see a noticeable slowdown caused by detectindent in some cases, or if you have profiling data showing that detectindent time is a considerable fraction of micro startup time or buffer open time), please let me know.

pkg/highlight/parser.go

internal/buffer/buffer.go

runtime/help/options.md

runtime/syntax/make_headers.go

dmaluka · 2023-09-09T15:39:06Z

While reviewing this PR I've found a couple of small bugs in regex error handling for syntax files. Fixed them in #2913.

JoeKar · 2023-09-09T21:06:28Z

Thanks a lot for your enduring review! :)

JoeKar · 2023-09-09T21:17:15Z

But if you have some observations suggesting that having detectindent integrated with filetype detect would have a real performance advantage (e.g. if you can already see a noticeable slowdown caused by detectindent in some cases, or if you have profiling data showing that detectindent time is a considerable fraction of micro startup time or buffer open time), please let me know.

No, not till now.

It only came to my mind since it does an additional loop on opening the buffer, which could be prevent in case it would use the same loop. So, just an optimistic optimization saving some CPU cycles. Most probably you're right, that the necessary effort isn't worth the result/try...at least so far. And we definitely have larger performance impacts with the syntax highlighting itself, which should receive the higher priority in case of possible optimizations.

dmaluka · 2023-09-10T09:29:07Z

Now that #2913 has been merged, you need to rebase your PR.

This allows more complex detection upon regex rules for a certain amount of lines.

…ture check

JoeKar · 2023-11-29T19:26:44Z

Fixes #3054

JoeKar force-pushed the fix/file-detection branch 3 times, most recently from e179540 to 8c8e092 Compare June 6, 2023 20:07

JoeKar mentioned this pull request Aug 18, 2023

pascal source recognized as json filetype #2894

Closed

JoeKar force-pushed the fix/file-detection branch from 8c8e092 to 6e4230f Compare August 18, 2023 21:08