-
-
Notifications
You must be signed in to change notification settings - Fork 4.7k
feat: Support URL extraction from markdown link syntax #55850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
5d3cc65 to
a6198fe
Compare
4c6b8e7 to
0d71347
Compare
Remove temporary debug logging that was added during investigation of URL matching issues with square brackets in query parameters. The root cause was identified in Nextcloud core: the URL_REGEX pattern did not include square brackets in the allowed character class. This has been fixed in nextcloud/server#55850
|
cc @nickvergessen is this relevant for Talk? |
|
Since LexioJ I maintaining a bot, I assume it does :P |
| @@ -30,8 +30,9 @@ interface IURLGenerator { | |||
| * | |||
| * @since 25.0.0 | |||
| * @since 29.0.0 changed to match localhost and hostnames with ports | |||
| * @since 33.0.0 changed to match URLs in markdown link syntax and square brackets in query parameters | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since line also needs to be on URL_REGEX to reflect the change of the value there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should also add that those are now part of the match and therefor regex group 2+3 are the actual link going forward and the full match in itself does not have to be a full valid link anymore.
|
Checking with mobile app devs if the regex changes anything in clients |
| @@ -51,10 +51,20 @@ public function __construct( | |||
| */ | |||
| public function extractReferences(string $text): array { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should definitely add tests for this method, are you able to do that, or should someone assist/take over?
0d71347 to
10bcb89
Compare
|
I removed the dist/ from the diff for now, so the code page loads and allows leaving comments |
10bcb89 to
6ae37ea
Compare
Fixes nextcloud#55849 Update URL_REGEX to match URLs embedded in markdown link syntax [label](url). Previously, the regex required whitespace before/after URLs, which didn't match markdown links where URLs are preceded by ]( and followed by ). Also support square brackets in URL query parameters (e.g., ?foo[bar]=baz). Changes: - Add \]\( as valid URL prefix (markdown link start) - Add \) as valid URL suffix (markdown link end) - Add support for [ ] characters in URL paths - Update both backend (IURLGenerator.php) and frontend (comments.js) regex - Fix ReferenceManager to extract clean URLs using capture groups - Maintain backward compatibility with plain URL extraction Signed-off-by: Alexander Askin <[email protected]>
6ae37ea to
5793ad5
Compare
|
Hello there, We hope that the review process is going smooth and is helpful for you. We want to ensure your pull request is reviewed to your satisfaction. If you have a moment, our community management team would very much appreciate your feedback on your experience with this PR review process. Your feedback is valuable to us as we continuously strive to improve our community developer experience. Please take a moment to complete our short survey by clicking on the following link: https://cloud.nextcloud.com/apps/forms/s/i9Ago4EQRZ7TWxjfmeEpPkf6 Thank you for contributing to Nextcloud and we hope to hear from you soon! (If you believe you should not receive this message, you can add yourself to the blocklist.) |
|
We'll discuss this and make the final call for it in the week of 17-21. November in our Talk Team week, as we have other related topics on our list. |
This PR does not primary address Talk - it would allow generally usage like in Notes, Collectives, Deck, etc. Smart-Picker Providers (like integration_github) can then generate Links to be more compact and „meaningful“ without sacrificing functionality like picking an element and put the link this way: For Talk it would be great if you could discuss/decide on nextcloud/spreed#16114 as well 😉 |
Totally aware, but there is more to it and as mentioned before, the current change breaks existing clients. So we will have a look if we can incorporate it somehow in a backwards compatible manner or we need to add a new constant. |
Summary
Fixes #55849
This PR updates the
URL_REGEXpattern to support URL extraction from markdown link syntax[label](url), while maintaining backward compatibility with plain URL extraction.Problem
The current
URL_REGEXpattern requires whitespace or newlines before and after URLs:'(\s|\n|^)(https?:\/\/)...*(\s|\n|$)'This pattern fails to match URLs embedded in markdown link syntax where URLs are preceded by
](and followed by), preventing Reference Providers (GitHub, GitLab, Jira, etc.) from generating rich previews for markdown-formatted links.Examples of affected use cases:
[GH #55845](https://github.com/nextcloud/server/issues/55845)- GitHub issues[Ticket #12345](https://support.example.com/#ticket/zoom/12345)- Zammad Support tickets[PROJ-123](https://jira.example.com/browse/PROJ-123)- JIRA issuesSolution
Updated the regex pattern to also accept markdown link syntax:
Before:
'(\s|\n|^)(https?:\/\/)([-A-Z0-9+_.]+...)(\s|\n|$)'After:
Changes
URL_REGEX_NO_MODIFIERSconstant to match markdown syntax@since 31.0.0version tag documenting the changeBackward Compatibility
✅ Existing plain URL extraction continues to work
✅ New: Markdown links now also extracted
✅ No breaking changes for reference providers
✅ No API changes - only regex pattern enhancement
Testing
Before this fix:
After this fix:
Impact
ReferenceManager::extractReferences()Related Code
The regex is used primarily in:
lib/private/Collaboration/Reference/ReferenceManager.phpline 53:extractReferences()methodNextcloud Version: Targets v31+
Type: Bug fix / Enhancement