Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Account for front matter when calculating sourcepos #494

Merged
merged 3 commits into from
Nov 29, 2024

Conversation

SamWilsn
Copy link
Contributor

Hopefully this is sufficient to account for the front matter when calculating sourcepos for subsequent nodes.

start: nodes::LineColumn { line: 1, column: 1 },
end: nodes::LineColumn {
line: lines,
column: delimiter.chars().count(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line I'm particularly unsure about. Are columns supposed to be bytes, characters, or actual columns (i.e. accounting for ZWJ, combining diacritics, etc.)?

Copy link
Owner

@kivikakk kivikakk Nov 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This question gave me pause too. Instead of me trying to infer it from some of my own code, let's try to get it from the source (pun intended (video unrelated)). If we ask cmark for sourcepos of some text with minimally non-byte-oriented text …

$ echo 'テスト test' | build/src/cmark --sourcepos
<p data-sourcepos="1:1-1:14">テスト test</p>

… we see it doesn't make any attempt to interpret UTF-8 whatsoever. This is a bit ¯\_(ツ)_/¯ But it accords with our current behaviour, which makes sense given we modelled on it totally:

$ echo 'テスト test' | cargo run -- --sourcepos
<p data-sourcepos="1:1-1:14">テスト test</p>

So, for consistency (and ease of your implementation), I'd say let's start with bytes, and I'll open an issue to think about improving this (#495). There's a few places in parser/mod.rs where we actually use self.column and treat it as a byte count, and while I'm not sure it'd come into play here exactly (they're mostly around leading indent), it's another vote for bytes.

@SamWilsn SamWilsn marked this pull request as draft November 29, 2024 15:00
@SamWilsn
Copy link
Contributor Author

Narrator: It was not sufficient.

@SamWilsn SamWilsn force-pushed the sourcepos-front-matter branch from 1807051 to 73925f5 Compare November 29, 2024 16:00
@SamWilsn SamWilsn marked this pull request as ready for review November 29, 2024 16:01
Copy link
Owner

@kivikakk kivikakk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much!

start: nodes::LineColumn { line: 1, column: 1 },
end: nodes::LineColumn {
line: lines,
column: delimiter.chars().count(),
Copy link
Owner

@kivikakk kivikakk Nov 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This question gave me pause too. Instead of me trying to infer it from some of my own code, let's try to get it from the source (pun intended (video unrelated)). If we ask cmark for sourcepos of some text with minimally non-byte-oriented text …

$ echo 'テスト test' | build/src/cmark --sourcepos
<p data-sourcepos="1:1-1:14">テスト test</p>

… we see it doesn't make any attempt to interpret UTF-8 whatsoever. This is a bit ¯\_(ツ)_/¯ But it accords with our current behaviour, which makes sense given we modelled on it totally:

$ echo 'テスト test' | cargo run -- --sourcepos
<p data-sourcepos="1:1-1:14">テスト test</p>

So, for consistency (and ease of your implementation), I'd say let's start with bytes, and I'll open an issue to think about improving this (#495). There's a few places in parser/mod.rs where we actually use self.column and treat it as a byte count, and while I'm not sure it'd come into play here exactly (they're mostly around leading indent), it's another vote for bytes.

@kivikakk
Copy link
Owner

I'll apply the bytes change (and test adjustment) and merge. :)

@kivikakk kivikakk enabled auto-merge November 29, 2024 20:59
@kivikakk kivikakk merged commit 0c6e6e6 into kivikakk:main Nov 29, 2024
19 checks passed
@SamWilsn SamWilsn deleted the sourcepos-front-matter branch November 29, 2024 21:32
@SamWilsn
Copy link
Contributor Author

it doesn't make any attempt to interpret UTF-8 whatsoever

I totally understand!

This is just me dreading my next step and I absolutely don't expect any effort on your behalf, but it's going to be a pain to adapt this to work with annotate-snippets, which does do unicode interpretation 😿

@kivikakk
Copy link
Owner

kivikakk commented Dec 4, 2024

Yes, fair T_T I hope it's not too painful — let me know if I can (try to!) elucidate anything else for you about the base design.

Also, if you ever want a release made with your changes in it, just give me a ping!

@SamWilsn
Copy link
Contributor Author

SamWilsn commented Dec 4, 2024

Appreciate it! I have a few more pull requests to put in, so no rush on a new release.

@ryanpeach
Copy link

@kivikakk if this is working, my PR ryanpeach/mdlinker#58 would love to see a new release.

@kivikakk
Copy link
Owner

Sure thing!

ryanpeach added a commit to ryanpeach/mdlinker that referenced this pull request Dec 17, 2024
ryanpeach added a commit to ryanpeach/mdlinker that referenced this pull request Dec 17, 2024
ryanpeach added a commit to ryanpeach/mdlinker that referenced this pull request Dec 17, 2024
@ryanpeach
Copy link

Working for me :)

ryanpeach added a commit to ryanpeach/mdlinker that referenced this pull request Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants