Skip to content

Conversation

@LexioJ
Copy link

@LexioJ LexioJ commented Oct 18, 2025

Summary

Fixes #55849

This PR updates the URL_REGEX pattern to support URL extraction from markdown link syntax [label](url), while maintaining backward compatibility with plain URL extraction.

Problem

The current URL_REGEX pattern requires whitespace or newlines before and after URLs:

'(\s|\n|^)(https?:\/\/)...*(\s|\n|$)'

This pattern fails to match URLs embedded in markdown link syntax where URLs are preceded by ]( and followed by ), preventing Reference Providers (GitHub, GitLab, Jira, etc.) from generating rich previews for markdown-formatted links.

Examples of affected use cases:

  • [GH #55845](https://github.com/nextcloud/server/issues/55845) - GitHub issues
  • [Ticket #12345](https://support.example.com/#ticket/zoom/12345) - Zammad Support tickets
  • [PROJ-123](https://jira.example.com/browse/PROJ-123) - JIRA issues
  • Any markdown link in Text app, Talk, or Comments

Solution

Updated the regex pattern to also accept markdown link syntax:

Before:

'(\s|\n|^)(https?:\/\/)([-A-Z0-9+_.]+...)(\s|\n|$)'

After:

'(\s|\n|^|\]\()(https?:\/\/)([-A-Z0-9+_.]+...)(\s|\n|$|\))'
//        ^^^^ added markdown start                  ^^^ added closing paren

Changes

  • lib/public/IURLGenerator.php: Updated URL_REGEX_NO_MODIFIERS constant to match markdown syntax
  • core/src/OCP/comments.js: Updated frontend regex to match backend changes (as per comment requirement)
  • Added @since 31.0.0 version tag documenting the change

Backward Compatibility

✅ Existing plain URL extraction continues to work
✅ New: Markdown links now also extracted
✅ No breaking changes for reference providers
✅ No API changes - only regex pattern enhancement

Testing

Before this fix:

Plain URL: "Check https://github.com/nextcloud/server/issues/55845"
✅ Extracted successfully

Markdown: "Check [GH #55845](https://github.com/nextcloud/server/issues/55845)"  
❌ Not extracted

After this fix:

Plain URL: "Check https://github.com/nextcloud/server/issues/55845"
✅ Still works

Markdown: "Check [GH #55845](https://github.com/nextcloud/server/issues/55845)"
✅ Now extracted

Impact

  • Users: Can now use more readable markdown-formatted links while still getting rich reference previews
  • Reference Providers: Will automatically work with markdown links without any code changes
  • Affected components: Text app, Talk, Comments, any component using ReferenceManager::extractReferences()

Related Code

The regex is used primarily in:

  • lib/private/Collaboration/Reference/ReferenceManager.php line 53: extractReferences() method
  • All registered Reference Providers that rely on URL extraction

Nextcloud Version: Targets v31+
Type: Bug fix / Enhancement

@LexioJ LexioJ requested review from a team as code owners October 18, 2025 20:49
@LexioJ LexioJ requested review from CarlSchwan, leftybournes, nfebe, susnux and szaimen and removed request for a team October 18, 2025 20:49
@LexioJ LexioJ force-pushed the fix/markdown-url-regex branch 2 times, most recently from 5d3cc65 to a6198fe Compare October 18, 2025 22:06
@szaimen szaimen removed their request for review October 18, 2025 22:43
@LexioJ LexioJ force-pushed the fix/markdown-url-regex branch 2 times, most recently from 4c6b8e7 to 0d71347 Compare October 21, 2025 05:27
LexioJ added a commit to LexioJ/integration_itop that referenced this pull request Oct 21, 2025
Remove temporary debug logging that was added during investigation of URL
matching issues with square brackets in query parameters.

The root cause was identified in Nextcloud core: the URL_REGEX pattern
did not include square brackets in the allowed character class. This has
been fixed in nextcloud/server#55850
@susnux susnux added enhancement 3. to review Waiting for reviews labels Oct 21, 2025
@susnux susnux added this to the Nextcloud 33 milestone Oct 21, 2025
@susnux susnux changed the title fix: Support URL extraction from markdown link syntax feat: Support URL extraction from markdown link syntax Oct 21, 2025
@susnux
Copy link
Contributor

susnux commented Oct 21, 2025

cc @nickvergessen is this relevant for Talk?

@nickvergessen
Copy link
Member

Since LexioJ I maintaining a bot, I assume it does :P

@@ -30,8 +30,9 @@ interface IURLGenerator {
*
* @since 25.0.0
* @since 29.0.0 changed to match localhost and hostnames with ports
* @since 33.0.0 changed to match URLs in markdown link syntax and square brackets in query parameters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since line also needs to be on URL_REGEX to reflect the change of the value there

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also add that those are now part of the match and therefor regex group 2+3 are the actual link going forward and the full match in itself does not have to be a full valid link anymore.

@nickvergessen
Copy link
Member

Checking with mobile app devs if the regex changes anything in clients

@@ -51,10 +51,20 @@ public function __construct(
*/
public function extractReferences(string $text): array {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should definitely add tests for this method, are you able to do that, or should someone assist/take over?

@nickvergessen nickvergessen force-pushed the fix/markdown-url-regex branch from 0d71347 to 10bcb89 Compare October 23, 2025 10:24
@nickvergessen
Copy link
Member

I removed the dist/ from the diff for now, so the code page loads and allows leaving comments

@LexioJ LexioJ force-pushed the fix/markdown-url-regex branch from 10bcb89 to 6ae37ea Compare October 27, 2025 05:53
Fixes nextcloud#55849

Update URL_REGEX to match URLs embedded in markdown link syntax [label](url).
Previously, the regex required whitespace before/after URLs, which didn't
match markdown links where URLs are preceded by ]( and followed by ).

Also support square brackets in URL query parameters (e.g., ?foo[bar]=baz).

Changes:
- Add \]\( as valid URL prefix (markdown link start)
- Add \) as valid URL suffix (markdown link end)
- Add support for [ ] characters in URL paths
- Update both backend (IURLGenerator.php) and frontend (comments.js) regex
- Fix ReferenceManager to extract clean URLs using capture groups
- Maintain backward compatibility with plain URL extraction

Signed-off-by: Alexander Askin <[email protected]>
@LexioJ LexioJ force-pushed the fix/markdown-url-regex branch from 6ae37ea to 5793ad5 Compare November 1, 2025 06:52
@github-actions
Copy link
Contributor

github-actions bot commented Nov 2, 2025

Hello there,
Thank you so much for taking the time and effort to create a pull request to our Nextcloud project.

We hope that the review process is going smooth and is helpful for you. We want to ensure your pull request is reviewed to your satisfaction. If you have a moment, our community management team would very much appreciate your feedback on your experience with this PR review process.

Your feedback is valuable to us as we continuously strive to improve our community developer experience. Please take a moment to complete our short survey by clicking on the following link: https://cloud.nextcloud.com/apps/forms/s/i9Ago4EQRZ7TWxjfmeEpPkf6

Thank you for contributing to Nextcloud and we hope to hear from you soon!

(If you believe you should not receive this message, you can add yourself to the blocklist.)

@nickvergessen
Copy link
Member

We'll discuss this and make the final call for it in the week of 17-21. November in our Talk Team week, as we have other related topics on our list.

@LexioJ
Copy link
Author

LexioJ commented Nov 4, 2025

We'll discuss this and make the final call for it in the week of 17-21. November in our Talk Team week, as we have other related topics on our list.

This PR does not primary address Talk - it would allow generally usage like in Notes, Collectives, Deck, etc. Smart-Picker Providers (like integration_github) can then generate Links to be more compact and „meaningful“ without sacrificing functionality like picking an element and put the link this way:
Check GH #xyz

For Talk it would be great if you could discuss/decide on nextcloud/spreed#16114 as well 😉

@nickvergessen
Copy link
Member

This PR does not primary address Talk - it would allow generally usage like in Notes, Collectives, Deck, etc. Smart-Picker Providers (like integration_github) can then generate Links to be more compact

Totally aware, but there is more to it and as mentioned before, the current change breaks existing clients. So we will have a look if we can incorporate it somehow in a backwards compatible manner or we need to add a new constant.
At the same time there is more related things in parallel that impact the same code point that are more important to address.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

URL_REGEX doesn't match URLs in markdown link syntax

3 participants