-
-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Account for front matter when calculating sourcepos
#494
Conversation
src/parser/mod.rs
Outdated
start: nodes::LineColumn { line: 1, column: 1 }, | ||
end: nodes::LineColumn { | ||
line: lines, | ||
column: delimiter.chars().count(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line I'm particularly unsure about. Are columns supposed to be bytes, characters, or actual columns (i.e. accounting for ZWJ, combining diacritics, etc.)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This question gave me pause too. Instead of me trying to infer it from some of my own code, let's try to get it from the source (pun intended (video unrelated)). If we ask cmark for sourcepos of some text with minimally non-byte-oriented text …
$ echo 'テスト test' | build/src/cmark --sourcepos
<p data-sourcepos="1:1-1:14">テスト test</p>
… we see it doesn't make any attempt to interpret UTF-8 whatsoever. This is a bit ¯\_(ツ)_/¯ But it accords with our current behaviour, which makes sense given we modelled on it totally:
$ echo 'テスト test' | cargo run -- --sourcepos
<p data-sourcepos="1:1-1:14">テスト test</p>
So, for consistency (and ease of your implementation), I'd say let's start with bytes, and I'll open an issue to think about improving this (#495). There's a few places in parser/mod.rs
where we actually use self.column
and treat it as a byte count, and while I'm not sure it'd come into play here exactly (they're mostly around leading indent), it's another vote for bytes.
Narrator: It was not sufficient. |
1807051
to
73925f5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much!
src/parser/mod.rs
Outdated
start: nodes::LineColumn { line: 1, column: 1 }, | ||
end: nodes::LineColumn { | ||
line: lines, | ||
column: delimiter.chars().count(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This question gave me pause too. Instead of me trying to infer it from some of my own code, let's try to get it from the source (pun intended (video unrelated)). If we ask cmark for sourcepos of some text with minimally non-byte-oriented text …
$ echo 'テスト test' | build/src/cmark --sourcepos
<p data-sourcepos="1:1-1:14">テスト test</p>
… we see it doesn't make any attempt to interpret UTF-8 whatsoever. This is a bit ¯\_(ツ)_/¯ But it accords with our current behaviour, which makes sense given we modelled on it totally:
$ echo 'テスト test' | cargo run -- --sourcepos
<p data-sourcepos="1:1-1:14">テスト test</p>
So, for consistency (and ease of your implementation), I'd say let's start with bytes, and I'll open an issue to think about improving this (#495). There's a few places in parser/mod.rs
where we actually use self.column
and treat it as a byte count, and while I'm not sure it'd come into play here exactly (they're mostly around leading indent), it's another vote for bytes.
I'll apply the bytes change (and test adjustment) and merge. :) |
See discussion at kivikakk#494 (comment).
I totally understand! This is just me dreading my next step and I absolutely don't expect any effort on your behalf, but it's going to be a pain to adapt this to work with |
Yes, fair T_T I hope it's not too painful — let me know if I can (try to!) elucidate anything else for you about the base design. Also, if you ever want a release made with your changes in it, just give me a ping! |
Appreciate it! I have a few more pull requests to put in, so no rush on a new release. |
@kivikakk if this is working, my PR ryanpeach/mdlinker#58 would love to see a new release. |
Sure thing! |
Working for me :) |
Hopefully this is sufficient to account for the front matter when calculating
sourcepos
for subsequent nodes.