Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Refactor) Use CommonMark for markdown parsing #4305

Merged
merged 5 commits into from
Nov 12, 2024

Conversation

IonBazan
Copy link
Contributor

@IonBazan IonBazan commented Nov 2, 2024

Adds some tests to cover most of Markdown and MarkdownExtra services. This should be helpful when migrating to another engine in the future.

Fixes #4283 I'm not sure if using htmlentities() is correct here though - I think it should only be used when safe mode is on (see tests).

Helpers coverage looks as follows now:
image

@Roardom
Copy link
Collaborator

Roardom commented Nov 2, 2024

Thank you!

I'm not entirely sure about the mb_convert_encoding replacement. I found this: https://php.watch/versions/8.2/mbstring-qprint-base64-uuencode-html-entities-deprecated#html which lists some proposed replacements. However, it should be noted that markdown is currently a staff-only feature and anyone with a staff rank can already change user passwords and such so them being allowed to XSS as well isn't a huge worry.

@IonBazan
Copy link
Contributor Author

IonBazan commented Nov 2, 2024

I was more thinking about not breaking existing pages created which are already using custom HTML tags. Looks like htmlentities() escapes them so that we are not able to preserve them. Unfortunately I don't have access to any production/real-life data to test this out so maybe I will just revert it to original and add a test containing some mixed HTML and Markdown tags to ensure we won't break anything when refactoring.

@IonBazan
Copy link
Contributor Author

IonBazan commented Nov 3, 2024

OK looks like I got it using some ugly hack but it produces the same results as the original version now. Let me know if this breaks anything for you.

@IonBazan IonBazan force-pushed the feature/markdown-entities branch from c584019 to bf43aa2 Compare November 3, 2024 10:10
Copy link
Collaborator

@Roardom Roardom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed your code is very similar to what they used in the yet-to-be-merged upstream code: https://www.github.com/erusev/parsedown-extra/pull/176.

So in that case, looks good!

Thank you!

@IonBazan
Copy link
Contributor Author

IonBazan commented Nov 3, 2024

One more thing before we merge - I added a handy https://github.com/GrahamCampbell/Laravel-Markdown wrapper for CommonMark and added tests comparing both engines capabilities. You can see that the results are pretty much the same (most failures are due to different &nbsp; encoding, slightly different final HTML, some class/id names inconsistency and additional encoding of <code> blocks which is actually safer if you are pasting HTML there.
The only feature that CommonMark is not supporting is Abbreviations but it can be easily added by installing https://github.com/8fold/commonmark-abbreviations or implementing it on our own.

If you think this approach is good enough, I can remove our current implementation and use CommonMark instead, after updating these failing tests. Let me know if you like this idea.

Please note that CommonMark is already required by Laravel itself (it's used in email views) so the only package I'm adding is a well-maintained wrapper:

Prod Packages Operation Base Target Link
graham-campbell/markdown New - v15.2.0 Compare

@Roardom
Copy link
Collaborator

Roardom commented Nov 3, 2024

I haven't tested this yet, but in my own wiki articles, I use the feature for adding {#id} after headings to make links in my table of contents. Is this supported by commonmark? (I'm fine if there's a different syntax I need to edit, as long as there's an alternative).

If you like, I can temporarily add you to a private repo where I store a backup of my own personal wikis if you want some real data to test on. However, I do use a very tiny amount of bbcode and sometimes html in some of them which isn't handled here.

@IonBazan
Copy link
Contributor Author

IonBazan commented Nov 3, 2024

That would be very helpful - feel free to use my username or the email I used for the commits to send the invitation to the repo.

Regarding BBCode, it's actually applied separately by another parser, you can check Page or Wiki class to see how it works. Let me check if we can get the linked ids to work, but there is a way to generate a table of contents and also auto-generate the ids for each header just like GitHub does.

@IonBazan
Copy link
Contributor Author

IonBazan commented Nov 3, 2024

Just checked that the feature you need is already enabled thanks to AttributesExtension and it was also included in the tests I wrote. You can find more info here: https://commonmark.thephpleague.com/2.5/extensions/attributes/

Seems that the Abbreviations syntax supported by the extension I mentioned earlier is different from previously used standard. Therefore I suggest users use <abbr> HTML tags instead, as it's already supported.

Since there are no massive breaking changes besides of this feature, I suggest we go with replacing the MD library and updating the tests. I can try keeping both implementations green with some conditional expectations or duplicating these tests so that we don't lose coverage.

@Roardom
Copy link
Collaborator

Roardom commented Nov 4, 2024

Ah, you're correct. I think the wrapper would be a good idea as well. The only downside I can see happening is that it usually takes a few weeks/months after a new laravel version is released before that package is updated.

I've also sent you that invitation.

Thanks again!

@IonBazan IonBazan changed the title (Tests) Cover Markdown with tests (Refactor) Use CommonMark for markdown parsing Nov 4, 2024
@HDVinnie
Copy link
Collaborator

HDVinnie commented Nov 6, 2024

@IonBazan thanks for yet another contribution. Reviewed it code wise and looks good to me. I'll have time Friday to test locally and if all goes well merge.

@IonBazan
Copy link
Contributor Author

IonBazan commented Nov 7, 2024

Thanks, I think we should be good to remove the previous engine once we confirm everything is working as expected too in the next release.

@HDVinnie HDVinnie merged commit 8b481d2 into HDInnovations:8.x.x Nov 12, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants