-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(tsx): Extract positions based on utf-16 encoding #1037
Conversation
🦋 Changeset detectedLatest commit: 2b2a8da The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does feel a bit non-performant, but if there isn't any noticable slowdown in benchmarks, then it's probably fine then.
Also, this needs a changeset before merging.
Yeah, I was surprised to not find too much perf difference. To be clear there is one, but I could only get real differences in files with 10k+ characters, multiple script and style tags late into the file and obscene amount of emojis. I'll take a quick look to see if there's a way to re-use the line offset table from the sourcemapping logic, it'd speed up things a bunch, otherwise I'm not too bothered, this still ends up being faster for the language server because the previous logic it did to get script and style tags was expensive |
The power of Go and native binaries 😄 |
Refactored to use the sourcemapping line offsets instead, it's much faster! Sourcemapping is still the ultimate bottleneck, though. |
|
||
// One of the more performance-sensitive parts of the compiler is the handling of multibytes characters, this benchmark is an extreme case of that. | ||
func BenchmarkPrintToTSX(b *testing.B) { | ||
source := `🌘🔅🔘🍮🔭🔁💝🐄👋 🍘👽💽🌉🌒🔝 📇🍨💿🎷🎯💅📱🎭👞🎫💝🍢🕡 augue tincidunt 👠💋🌵💌🍌🏄🕂🍹📣🍟 🐞🍲👷🗻🏢🐫👣🐹🔷🎢👭🍗👃 🐴🐈📐💄 et, 👽👽🔙🐒🔙🍀 🐞🍐🎵💕🍂🍭🎬🎅 ac 🏃👳📑🐶👝🔷 🔕👄🐾👡🍢💗 🌓🔙🌔🏈🔒🔄🎹🎐 🎾🐝🌁📃💞📔🔕🐕 vel 🌟🏁💴🎾🔷📪💼👣📚 👟💻🗾🎋💁🏬🐮💑 🍈🌸🍓🍥💦 et vivamus 🍑🍫🔉🔹💽🍙 rhoncus eu 📰💫💀🌺 🔋💾🔱🔷🐟 convallis facilisi vitae mollis 📨🔚💮🔃🎀🍶 🌠📑🎹🌑📫🗽🐊💁🔼🕓 ac 🌂🌳🌺🎭🍧🐑 🔭🔉💉🍗👠🍦 🔄👔🎭🍇 🏤🍂🌝👜🔺 ornare 🔽🌰🎃💝👩 🐬🌻🍩👺🎆📣 risus 🌼👧🌒🍄💄🌆🐖👐📠 🕙🍈👞🏢👅🎽🏫👃🌾🍘🕑🎼💆🎳 🐻📵🔂🍩🕦👕🐶 💚🏮📟📈🌄👱🔚 sem 🔵💱💭💫 libero bibendum 🌿🍮🎧🍴💉 🐵🔷🍒🍜 sed metus, aliquam 🐷🐬📇👔🎴🔻🏊📴🍂🎽 🏀💡🍺🌾 💣🍇🐼🌀🐟🍂🐰 sed luctus 👾🍠👻💬📋🐈👀🌕💥🐹 consectetur commodo at 📓🐣🔮🐍🔺🍐🗻🍃🍠🔬 🏁🎬🐔🌙 🎿🎊🍳💓👹🐏 👦🐩👶💻 🔉💏📧🔲🍈📹💫🎧 🌟🍯🔭🎿 🎹🗻💳🔄🕁💂 🐩👠🌗🍹 facilisis 👍🍑💤🐕🐞🐻🏩 posuere 🎑🍢🍑🏨📗👣 ultrices. Vestibulum 🌠👰💼📐🍺💫 👘🐖🔕🔤🐖💶🐢📵👍🍪 🔸🎿🔤🍡💡🌄📁👉🎎🎆🎢🔒 📴👉🌱🐍🌓🎆🏩🐀👓💹🎴 🔴🌒🔘👡🕜 🌝🐉🐑🏮🎸🐳💉🎄 🍳🐲🔭📆🐎🔼🎐🐩💾🍈🎶💅🐜🍀. 🍭🍄🌳🍏💍👈🌆 🔦🏦👵🌹🏊🎐📳🌃📘🌷💎📓🎼 🎳🔁🔂🗽 💅🏰🐊👂🌴 💍🕗🐘🔼🔰🏀 🎂📻🕛🍔🎄 vitae 📎🐆🍚🌲🐰 ipsum 👘👎📅🎈🌆💮🔁🎤 💺💉💉👐🔛🐛🐺🍸📭 💠👞🍨💖 integer 🐌📀🍂🍍🐼 volutpat, condimentum elit 🎌🌕🏊👼 🌛🕜🌚🕤🍎 🏢💨👓👤 duis amet 📹🕚🔏🏧🌷🍄👦 🔠🐬👬🍩📫🏧🎷🔢💆🎭🔳👈 🔇👿💈👧🐍🍱🔉 in 🌐🎲🌠🌟🐀👛🌞🎧 sed. Tristique malesuada id 🏆🍕🕚🎯🌶 📺🐘🏀🐮🔶🏀🍥👬💞🎃🌙🎥🔦🍗 👘👣🐂🎪🔂🎾 👎👮💈🗻🌿💰📩🔋💭💃 🍍🕑🔋👠🍆🍈👰🌅 orci nam 🔀🕠🏄🍣👘🐲💘🐥 🏥🎊🎯👾📅 👛🌽🏫🔃👋🐀👶💥🔳 🕥👪👑👯 🐓💡🔠💼🔳💲 👬🔁🍜💘🔌🌎🏃🌶🍏🍯🎺 🔟📨🔜🎱💓🔛 accumsan 🎢🐭🍉🍳🕜🏆🔗📝👿 🍗👑📜🎁🍇🍕🌛 🎓🌰👣📆🌋 👌🍱💏🔇🐆🌞📝💻🌺 dictumst 📡🍝🌐💤🐅🔔🐟📥🌓🌒🍅📨🏡👺🍬🐟 🔹💳🍨💡🎋💕🕜🏦👟🕒🕣💅💗 🏤🔈📀📜📙🍌 📫🕘👍💾📱 purus 🕒🐕👵🏄💗🐤🕝 pharetra adipiscing elit, non 🏬👸🍉👑 🎠🌇👰💄 🌕🗼🏮💇🐗 📌🏤🌲👹📌 🎐🕓📎🏤💾 tellus 💆🍨👂🕟💚🎋🌠🐳 🍘🍫🎵📚📟🔛🐑🏭🔄👌📏👚🏄🐞 💏🍊👾🍘👰📄💯🔑 proin 🐡🔝🐳🎻🐶🍜 🍕🌵📯🔖💅🕔💘🏁🌾💨🔲🔼🍜🍘🕂 in suscipit 🍎🎄🎬🔙👪💣🍣💯🕧 lectus 🌾🍚🍺🐆💉🎡👷 📖🔅👼🕞 🌄🌃📦🔲💘🌶💁🍯💿 senectus 👻💈🗽🍈 📜🍉🍒🍐🔖👙🔀🐅👙 🐬📷🎨👹🎬 🐷🌐🔽💨🎿🌌🌒 🎎🍊🔕🍁. 🍈🌄📧🏊📛👗🕟 🍋💌🔁🐛💫🔰👃🕑 ridiculus mattis 🕓🍌👘🌹 👅💏🍨🔯🍂🕃🌝👠🏁🔔 pellentesque elit eu, 💝🍧💯🔄📈📛🐻📆🔱📴🔸🍻🐘🎀 viverra 🌆🍉🍉🌆 📪🍕🏊🌜📺🔆🐢🎃 mi sed tellus luctus 🍜🍃🔶🗽 laoreet dui tristique 🔆🌸🐛🔣🏤📘🕜🍃🐋 pretium ultrices 🔳🏇🍏🎭🍀 🍱🐠👬📮🌗🌆💺🔆👨🌱 et 👸📁📩🔌👯👏👫👳💸 🐎💉🔱💦👴 🏪🍶🐸🔤 💔🏤👺👻🏤🎥🐽🔦👌🔡📚💡🔁🎻 🎵👡🕀🎩 🍸🐒🍭🕠🔢🎧🍚💪 🌸🍰💏🏁🎥🐕🔬💶 💃🌽💕🐭 💯🍁📭🐚📛📓🌏🔯📦 🕝👾📒📳💵 🏠💺🕔🕁📃 adipiscing nulla congue 🔖💲🕚🎰🔋👬 sem 👽👵👜👇 vehicula 📑🌎🍁💈 consectetur nulla ullamcorper enim, 🏫🍃🐢🔄🍝🕢👖💨👺💰👎💳🏠🏤 vel fermentum porttitor lacus 📱💧🔜🔰👱🎰🐉🔓🕧🍌🕙 🎴🍫🐱🐟🌖🕞🏥🍝📜🔃🏈🔡🐁🔗 💊🍐📝👭🎢🐳🐯📷🐀🔃. 🐦🍚🔵🕜🏩 id 🌔🔤💿🌻🍹 🌎🎈📢🔬🐖💢👸🎭 🍤🔵🔓🕕📪💢🔁👽💴 🐛👶🔢🎹🏄👜📌🍭🔼🎫🍯🎦💎🐦 🐨💧🍴🌊🔁 consequat pretium 👍🏬🏰👖🍥📺👛🌕 🏄🔱🔩🐏🌞🌺🐌👑 🏰🕢💱👖🐶🐥📯🎶💧🍭🔀🎾🏩🍑 massa, est 👶🍭💅🕔🍳💗🍞💚🍩🐷🍘🌄🐇 🐼👀👀🎅 📄🎁🐞💼 placerat erat 💧🌉💻🔣🐥📠🍪📻🐃🔻👠🕘🍞🍈🍔📴 dolor 📁💹🎐🍂🍉 neque, 👗🍁🍪🌵🕛🍄 🕝👔🔼🎡🔺📬 sed 💨💧🏢🐼👘🌳🎼👹🐉🕕 enim 🔏🏁👊💀🔓 🏫🐈💀🌳🔒👵🔘🎺💴🍑 👭🔲🍰💿🔪🌊 🌅🕔🎑📛🔶🔘🎑🍯 🐲📺👬📊🍒 elit, 🌽🎱💇🎥 ultricies 📡💲🐦🌁🍚🌵🍵 🎮📧🔟🍴👕🔏🌴🔊👳🗼💒🌴👍📞🔳👜🔥🐝🐾🐧🍊 🌰🗾🌂🐄 🔛🐀🍏🍞🔔📖💉📼🐥🍱 🐜🎺🍵💿👂🌋🌸 🔞🐤🔟🍀📙🏩👑 porta id 🎄💺🍙👶👪🔪👪🐭💚📗🎅🍐👡🎭 🐦🎌🐆💫 quis nulla dictumst non 📫🍵🔫🕠 🔯🔃🐷📖💙👓💍 🎬🕃🐥🏫🍛🌆🐗👅📷 📚🍭🌞🌎🕐👜👳 🏧💬💴🎆🐄🔠📄🐰📝🌈 🎩🌇🌟📙 suspendisse 🔡👫🍐🏩🌉🕚📘🐮 👐🏁👮🎭🏣 👰📤🍙🏈👓🐰🍦 lacinia diam eu vestibulum donec faucibus 🔝🎴🌷💩🍡🍜💙 🎐🔉🌛📠🏢📄🍆🎆. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this file in my editor, and well, performance was disastrous, but it's always been, even before this PR, so, it's not too bad (also, it was always mapped incorrectly, before this PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still looks good to me 👍
Changes
Previously when extracting script and style tags, we tried to somewhat make it work with multibytes characters by counting them and skipping, not only was this cumbersome, it kinda didn't work because our loop would run multiple times over kinda the same characters, anyway it was annoying.
In this PR, I changed it so that we use the lineoffsets table from the sourcemapping logic to get the offsets. This is somewhat slower, especially in some extreme cases, but in most cases there's no difference, and at least it's now correct.
I also updated the frontmatter and body ranges extraction to use this method, as they suffered from the same problem
In theory, this is a breaking change, but the truth is that the numbers it'd spit out would be unusable in JS unless you did a lot of conversion yourself, now the numbers can be used as-is. Also, I doubt anyone other than me is using them...
Fixes withastro/language-tools#921
Testing
Tests should pass + updated some + added more
Docs
N/A. Though I did a JSdoc comment on the type to say that it's UTF-16 based.