Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve indent-style auto-detection heuristic on small files #9556

Open
the-mikedavis opened this issue Feb 7, 2024 · 5 comments
Open

Improve indent-style auto-detection heuristic on small files #9556

the-mikedavis opened this issue Feb 7, 2024 · 5 comments
Labels
A-core Area: Helix core improvements A-indent Area: Indentation C-enhancement Category: Improvements E-easy Call for participation: Experience needed to fix: Easy / not much

Comments

@the-mikedavis
Copy link
Member

the-mikedavis commented Feb 7, 2024

As @pascalkuthe mentions (#9082 (comment)), the indent-style auto-detection (helix_core::indent::auto_detect_indent_style) can be finicky for small files:

Indent style detection for very short files ... doens't work well. ... I think the exact ratios need to be adjusted in our detection heuristic (potentially we need to add some absolute requirements in addition to values, ...).

@the-mikedavis the-mikedavis added C-enhancement Category: Improvements A-core Area: Helix core improvements labels Feb 7, 2024
@pascalkuthe pascalkuthe added the A-indent Area: Indentation label Feb 7, 2024
@ldelossa

This comment was marked as off-topic.

@winterqt

This comment was marked as off-topic.

@pascalkuthe

This comment was marked as off-topic.

@kirawi kirawi added the E-easy Call for participation: Experience needed to fix: Easy / not much label Mar 31, 2024
@kanielrkirby
Copy link
Contributor

Hey there, I've been looking into this for a little bit, was wanting to try to help improve the heuristic. I had a couple ideas I wanted to bounce off of you all.

The first thought I had was to just simply add a higher baseline to return None for (and therefore just use whatever the languages.toml recommends or tabs when there's only a few entries for the heuristic).

Another thought could be preferring lower values over higher ones (something like this: histogram[i] *= (MAX_INDENT + i) / MAX_INDENT, while keeping tabs as *= 2), so that there is some way to order preference in cases where there's only a huge indent of 12 and an indent of something more reasonable, like 2 or 4 (that might require setting up the histogram with floats, or multiplied by something to allow more granularity though).

Lastly, we could, of course, change the indent counting functionality, though I'm not sure this is a data problem. The only potential improvement I could think of would be some improvement towards counting indents that are the same and next to each other, or indents that are after a larger indent, but even that is a bit iffy, considering this is almost certainly desirable in larger files.

I bring up all this to ask, is this a good direction for a PR, or were you thinking of something else?

(trying to make my first PR in Helix, and this looked like a good problem! Love the project by the way, it feels very interesting to look through Helix source in Helix haha)

@janos-r
Copy link
Contributor

janos-r commented Sep 17, 2024

I think this would benefit from a clear example of a bug.
After allot of testing, i never got a crash or a total nonsense indent_style response.

To summarize a few points:

  • This does not concern formatting.
  • This does not concerns o O and Ret. They correctly follow the current indent line, not indent-style.
  • The indent-style is decided when opening the file / reload, not save.
  • The indent-style is used for Tab and indent < >

The rules as far as I deduced from testing:
If all the found indentation is consistent, it will be used instead of the default:

- zero
   - three
      - six
# results in indent-style 3
- zero
   - three
       - seven
# results in the default indent-style 2

I think this is a very good solution when not using formatting, like when editing another persons' file.
I couldn't reproduce a bug and I am worried that trying to fix something that is not broken will crate more problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-core Area: Helix core improvements A-indent Area: Indentation C-enhancement Category: Improvements E-easy Call for participation: Experience needed to fix: Easy / not much
Projects
None yet
Development

No branches or pull requests

7 participants