Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve file detection with signature check capabilities #2819

Merged
merged 6 commits into from
Mar 14, 2024

Conversation

JoeKar
Copy link
Collaborator

@JoeKar JoeKar commented May 4, 2023

This allows more complex detection upon regex rules for a certain amount of lines and not just in the first. A similar approach is chosen by VIM as well.

Fix #2041, Fix #2166

@JoeKar
Copy link
Collaborator Author

JoeKar commented May 15, 2023

@zyedidia
Did you already had some time to have a look at this?

@JoeKar JoeKar force-pushed the fix/file-detection branch 3 times, most recently from e179540 to 8c8e092 Compare June 6, 2023 20:07
@JoeKar
Copy link
Collaborator Author

JoeKar commented Jun 8, 2023

@zyedidia:
In case this would be merged sometime I highly recommend to merge the plugin detectindent (many thanks to @dmaluka) into micro directly (with option to de-/activate it). This would have the advantage to detect the indentation in the same cycle instead of parsing the content a second and/or third time.

@JoeKar
Copy link
Collaborator Author

JoeKar commented Jun 22, 2023

@zyedidia:
One more example failing without this PR:
A patch created with git format-patch isn't detect as patch, but as mail instead (starts with From ) due to alphabetical order of syntax files and the header-check of the same priority as the filename-check.
Therefore this PR gives the filename-check precedence.

@JoeKar
Copy link
Collaborator Author

JoeKar commented Aug 18, 2023

Fix #2894

@JoeKar
Copy link
Collaborator Author

JoeKar commented Sep 7, 2023

Unfortunately this PR didn't hit the v2.0.12, but maybe there is a chance for the v2.0.13.

@dmaluka:
Can I kindly ask for a review from your perspective too? :)

And in case you're in the right mood, there are more...
https://github.com/zyedidia/micro/pulls/JoeKar
:P

internal/buffer/buffer.go Outdated Show resolved Hide resolved
internal/buffer/buffer.go Show resolved Hide resolved
internal/buffer/buffer.go Outdated Show resolved Hide resolved
internal/buffer/buffer.go Outdated Show resolved Hide resolved
internal/buffer/buffer.go Outdated Show resolved Hide resolved
pkg/highlight/ftdetect.go Outdated Show resolved Hide resolved
runtime/syntax/objc.yaml Outdated Show resolved Hide resolved
@@ -392,6 +392,11 @@ Here are the available options:

default value: `4`

* `detectlimit`: if this is not set to 0, it will limit the automatic parsed
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO I'd be rather reluctant to adding new options unless we really know there is a real need for it. (I.e. in this particular case, I'd rather stay with the hardcoded value of 100 for the time being.) Or the user will end up with a zillion options and get lost in them.

Copy link
Collaborator Author

@JoeKar JoeKar Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest I'm more a friend of having the choice rather been patronised. Just imagine a file with a very large comment block or other fancy header (larger than the e.g. hard coded 100) in front of the important decision mark. At the end the guess could be wrong. That was the reason for the option to give the choice to decide between more precision or faster parse time.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can easily imagine a large comment block in the beginning of a file. Good point.

Having a choice is great, but I also think we should try our best to make software do the right thing from the beginning, rather than leave this burden to the user.

In this regard, now I'm thinking (independently of the question whether this line limit should be configurable or not): wouldn't it be a good idea to increase the default value from 100 to e.g. 500 right away? I actually doubt it would cause any real slowdown in practice (but if we concerned about that, we can always verify it with profiling, micro now has -profile command-line flag for that).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW do you know how does it work in vim? i.e. how many lines are checked and how is it configured?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vim does it the same, but the limit(s) per different checks is/are hard coded (see https://github.com/vim/vim/blob/master/runtime/autoload/dist/ft.vim). The idea was good, but I thought going one step ahead is maybe better and we don't want to be Vim...it's micro. ;)

pkg/highlight/ftdetect.go Outdated Show resolved Hide resolved
pkg/highlight/ftdetect.go Outdated Show resolved Hide resolved
pkg/highlight/ftdetect.go Outdated Show resolved Hide resolved
runtime/help/options.md Outdated Show resolved Hide resolved
internal/config/settings.go Outdated Show resolved Hide resolved
runtime/help/options.md Outdated Show resolved Hide resolved
pkg/highlight/parser.go Outdated Show resolved Hide resolved
@dmaluka
Copy link
Collaborator

dmaluka commented Sep 9, 2023

@JoeKar regarding your proposal regarding detectindent:

@zyedidia: In case this would be merged sometime I highly recommend to merge the plugin detectindent (many thanks to @dmaluka) into micro directly (with option to de-/activate it). This would have the advantage to detect the indentation in the same cycle instead of parsing the content a second and/or third time.

This would mean essentially reimplementing it in Go to make it tightly integrated into micro. I'm not sure it's a good idea: micro's source code is already pretty complicated as it is, so in cases when we are able to separate some complexity away via a plugin (not at the cost of performance, correctness etc), I think we should take advantage of this possibility. It seems that detectindent is actually such a case.

But if you have some observations suggesting that having detectindent integrated with filetype detect would have a real performance advantage (e.g. if you can already see a noticeable slowdown caused by detectindent in some cases, or if you have profiling data showing that detectindent time is a considerable fraction of micro startup time or buffer open time), please let me know.

pkg/highlight/parser.go Outdated Show resolved Hide resolved
pkg/highlight/parser.go Outdated Show resolved Hide resolved
internal/buffer/buffer.go Outdated Show resolved Hide resolved
runtime/help/options.md Outdated Show resolved Hide resolved
runtime/help/options.md Outdated Show resolved Hide resolved
runtime/help/options.md Outdated Show resolved Hide resolved
runtime/syntax/make_headers.go Show resolved Hide resolved
runtime/syntax/make_headers.go Show resolved Hide resolved
@dmaluka
Copy link
Collaborator

dmaluka commented Sep 9, 2023

While reviewing this PR I've found a couple of small bugs in regex error handling for syntax files. Fixed them in #2913.

@JoeKar
Copy link
Collaborator Author

JoeKar commented Sep 9, 2023

Thanks a lot for your enduring review! :)

@JoeKar
Copy link
Collaborator Author

JoeKar commented Sep 9, 2023

But if you have some observations suggesting that having detectindent integrated with filetype detect would have a real performance advantage (e.g. if you can already see a noticeable slowdown caused by detectindent in some cases, or if you have profiling data showing that detectindent time is a considerable fraction of micro startup time or buffer open time), please let me know.

No, not till now.

It only came to my mind since it does an additional loop on opening the buffer, which could be prevent in case it would use the same loop. So, just an optimistic optimization saving some CPU cycles. Most probably you're right, that the necessary effort isn't worth the result/try...at least so far. And we definitely have larger performance impacts with the syntax highlighting itself, which should receive the higher priority in case of possible optimizations.

@dmaluka
Copy link
Collaborator

dmaluka commented Sep 10, 2023

Now that #2913 has been merged, you need to rebase your PR.

@JoeKar
Copy link
Collaborator Author

JoeKar commented Nov 29, 2023

Fixes #3054

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deterministic file type detection Nondeterministic filetype detection (c, c++, objective-c) for *.h files
2 participants