Skip to content

Removes unnecessary newlines from rich text as JSON delivery API output#19391

Merged
AndyButland merged 2 commits intov13/devfrom
v13/bugfix/remove-unnecessary-newlines-from-delivery-api-rich-text-output
May 23, 2025
Merged

Removes unnecessary newlines from rich text as JSON delivery API output#19391
AndyButland merged 2 commits intov13/devfrom
v13/bugfix/remove-unnecessary-newlines-from-delivery-api-rich-text-output

Conversation

@AndyButland
Copy link
Contributor

Prerequisites

  • I have added steps to test this contribution in the description below

Resolves #19388

Description

This PR removes unnecessary newlines between HTML elements in the JSON output for rich text in the delivery API, retaining the spaces between inline elements applied via #17983.

Testing

Add mark-up to a rich text editor with line breaks.

Verify the output when requesting an the content item with the rich text property that unnecessary line breaks are removed, but spaces are retained between inline elements. See the unit tests for samples.

The delivery API needs to be enabled with the following configuration:

  "DeliveryApi": {
    "Enabled": true,
    "RichTextOutputAsJson": true,

Copilot AI review requested due to automatic review settings May 22, 2025 14:23
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refines the rich text JSON output by stripping out newline-only text nodes while preserving spaces, and adds tests to verify this behavior.

  • Updated the HTML node filtering logic in the rich text parser to remove newline-only #text nodes.
  • Introduced IsNonEmptyElement helper to distinguish significant text from pure newlines.
  • Added and refactored unit tests to cover whitespace handling around inline and block elements.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
tests/.../RichTextParserTests.cs Refactored test to use TestParagraph constant and added a new test for newline removal around HTML elements
src/Umbraco.Infrastructure/DeliveryApi/ApiRichTextElementParser.cs Enhanced ParseElement filtering logic and added IsNonEmptyElement helper method
Comments suppressed due to low confidence (1)

src/Umbraco.Infrastructure/DeliveryApi/ApiRichTextElementParser.cs:127

  • [nitpick] Rename IsNonEmptyElement to something more specific like IsSignificantTextNode or ContainsNonNewlineText to clarify that it targets text nodes with actual content beyond newlines.
private static bool IsNonEmptyElement(HtmlNode htmlNode) =>

@nikolajlauridsen
Copy link
Contributor

Hmm maybe I'm misunderstanding something, but I seem to still get newlines:

@nikolajlauridsen
Copy link
Contributor

nikolajlauridsen commented May 23, 2025

Maybe I'm misunderstanding something, but I seem to still get unececary newlines? I also seem to get a lot of junk non-breaking space characters

With this richtext:

image

I still get newlines in my output:

    "bodyText": {
      "tag": "#root",
      "attributes": {},
      "elements": [
        {
          "tag": "p",
          "attributes": {},
          "elements": [
            {
              "text": "Hello world!",
              "tag": "#text"
            }
          ]
        },
        {
          "text": "\n",
          "tag": "#text"
        },
        {
          "tag": "p",
          "attributes": {},
          "elements": [
            {
              "text": " ",
              "tag": "#text"
            }
          ]
        },
        {
          "text": "\n",
          "tag": "#text"
        },
        {
          "tag": "p",
          "attributes": {},
          "elements": [
            {
              "tag": "strong",
              "attributes": {},
              "elements": [
                {
                  "text": "Blah blah",
                  "tag": "#text"
                }
              ]
            }
          ]
        },
        {
          "text": "\n",
          "tag": "#text"
        },
        {
          "tag": "p",
          "attributes": {},
          "elements": [
            {
              "text": " ",
              "tag": "#text"
            }
          ]
        },
        {
          "text": "\n",
          "tag": "#text"
        },
        {
          "tag": "h2",
          "attributes": {},
          "elements": [
            {
              "text": "Hi",
              "tag": "#text"
            }
          ]
        },
        {
          "text": "\n",
          "tag": "#text"
        },
        {
          "tag": "p",
          "attributes": {},
          "elements": [
            {
              "text": " ",
              "tag": "#text"
            }
          ]
        },
        {
          "text": "\n",
          "tag": "#text"
        },
        {
          "tag": "p",
          "attributes": {},
          "elements": [
            {
              "tag": "img",
              "attributes": {
                "src": "/media/okfarann/images.jpg",
                "alt": "",
                "width": "230",
                "height": "219"
              },
              "elements": []
            }
          ]
        },
        {
          "text": "\n",
          "tag": "#text"
        },
        {
          "tag": "p",
          "attributes": {},
          "elements": [
            {
              "text": " ",
              "tag": "#text"
            }
          ]
        }
      ],
      "blocks": []
    }

@AndyButland
Copy link
Contributor Author

Thanks @nikolajlauridsen - I think I've tightened it up with the latest update. Would you mind running your test case again please? If you still see issues, could you share the source HTML from your rich text editor?

@nikolajlauridsen
Copy link
Contributor

Looks better now 😄

Still has the   but the RTE HTML also has one for each empty line, but maybe that's okay, since it's actually part of the output

<p>Hello world!</p>
<p> </p> <--- Results in &nbsp;
<p><strong>Blah blah</strong></p>
<p> </p>
<h2>Hi</h2>
<p> </p>
<p><img src="/media/okfarann/images.jpg?rmode=max&amp;width=230&amp;height=219" alt="" width="230" height="219" data-udi="umb://media/81088a74b2ad4aeb8ba0a4f742a580f1"></p>
<p> </p>
"bodyText": {
      "tag": "#root",
      "attributes": {},
      "elements": [
        {
          "tag": "p",
          "attributes": {},
          "elements": [
            {
              "text": "Hello world!",
              "tag": "#text"
            }
          ]
        },
        {
          "tag": "p",
          "attributes": {},
          "elements": [
            {
              "text": "&nbsp;",
              "tag": "#text"
            }
          ]
        },
        {
          "tag": "p",
          "attributes": {},
          "elements": [
            {
              "tag": "strong",
              "attributes": {},
              "elements": [
                {
                  "text": "Blah blah",
                  "tag": "#text"
                }
              ]
            }
          ]
        },
        {
          "tag": "p",
          "attributes": {},
          "elements": [
            {
              "text": "&nbsp;",
              "tag": "#text"
            }
          ]
        },
        {
          "tag": "h2",
          "attributes": {},
          "elements": [
            {
              "text": "Hi",
              "tag": "#text"
            }
          ]
        },
        {
          "tag": "p",
          "attributes": {},
          "elements": [
            {
              "text": "&nbsp;",
              "tag": "#text"
            }
          ]
        },
        {
          "tag": "p",
          "attributes": {},
          "elements": [
            {
              "tag": "img",
              "attributes": {
                "src": "/media/okfarann/images.jpg",
                "alt": "",
                "width": "230",
                "height": "219"
              },
              "elements": []
            }
          ]
        },
        {
          "tag": "p",
          "attributes": {},
          "elements": [
            {
              "text": "&nbsp;",
              "tag": "#text"
            }
          ]
        }
      ],
      "blocks": []
    }

@AndyButland
Copy link
Contributor Author

Thanks - and yes, I think that's OK, as you would expect to retain that if you were rendering this HTML in a client application.

@AndyButland AndyButland merged commit 4d8ca45 into v13/dev May 23, 2025
19 checks passed
@AndyButland AndyButland deleted the v13/bugfix/remove-unnecessary-newlines-from-delivery-api-rich-text-output branch May 23, 2025 10:19
AndyButland added a commit that referenced this pull request May 23, 2025
…ut (#19391)

* Removes unnecessary newlines from rich text as JSON delivery API output.

* Fix case from PR feedback.
# Conflicts:
#	src/Umbraco.Infrastructure/DeliveryApi/ApiRichTextElementParser.cs
#	tests/Umbraco.Tests.UnitTests/Umbraco.Core/DeliveryApi/RichTextParserTests.cs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants