Removes unnecessary newlines from rich text as JSON delivery API output#19391
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR refines the rich text JSON output by stripping out newline-only text nodes while preserving spaces, and adds tests to verify this behavior.
- Updated the HTML node filtering logic in the rich text parser to remove newline-only
#textnodes. - Introduced
IsNonEmptyElementhelper to distinguish significant text from pure newlines. - Added and refactored unit tests to cover whitespace handling around inline and block elements.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| tests/.../RichTextParserTests.cs | Refactored test to use TestParagraph constant and added a new test for newline removal around HTML elements |
| src/Umbraco.Infrastructure/DeliveryApi/ApiRichTextElementParser.cs | Enhanced ParseElement filtering logic and added IsNonEmptyElement helper method |
Comments suppressed due to low confidence (1)
src/Umbraco.Infrastructure/DeliveryApi/ApiRichTextElementParser.cs:127
- [nitpick] Rename
IsNonEmptyElementto something more specific likeIsSignificantTextNodeorContainsNonNewlineTextto clarify that it targets text nodes with actual content beyond newlines.
private static bool IsNonEmptyElement(HtmlNode htmlNode) =>
|
Hmm maybe I'm misunderstanding something, but I seem to still get newlines: |
|
Maybe I'm misunderstanding something, but I seem to still get unececary newlines? I also seem to get a lot of junk non-breaking space characters With this richtext: I still get newlines in my output: "bodyText": {
"tag": "#root",
"attributes": {},
"elements": [
{
"tag": "p",
"attributes": {},
"elements": [
{
"text": "Hello world!",
"tag": "#text"
}
]
},
{
"text": "\n",
"tag": "#text"
},
{
"tag": "p",
"attributes": {},
"elements": [
{
"text": " ",
"tag": "#text"
}
]
},
{
"text": "\n",
"tag": "#text"
},
{
"tag": "p",
"attributes": {},
"elements": [
{
"tag": "strong",
"attributes": {},
"elements": [
{
"text": "Blah blah",
"tag": "#text"
}
]
}
]
},
{
"text": "\n",
"tag": "#text"
},
{
"tag": "p",
"attributes": {},
"elements": [
{
"text": " ",
"tag": "#text"
}
]
},
{
"text": "\n",
"tag": "#text"
},
{
"tag": "h2",
"attributes": {},
"elements": [
{
"text": "Hi",
"tag": "#text"
}
]
},
{
"text": "\n",
"tag": "#text"
},
{
"tag": "p",
"attributes": {},
"elements": [
{
"text": " ",
"tag": "#text"
}
]
},
{
"text": "\n",
"tag": "#text"
},
{
"tag": "p",
"attributes": {},
"elements": [
{
"tag": "img",
"attributes": {
"src": "/media/okfarann/images.jpg",
"alt": "",
"width": "230",
"height": "219"
},
"elements": []
}
]
},
{
"text": "\n",
"tag": "#text"
},
{
"tag": "p",
"attributes": {},
"elements": [
{
"text": " ",
"tag": "#text"
}
]
}
],
"blocks": []
} |
|
Thanks @nikolajlauridsen - I think I've tightened it up with the latest update. Would you mind running your test case again please? If you still see issues, could you share the source HTML from your rich text editor? |
|
Looks better now 😄 Still has the <p>Hello world!</p>
<p> </p> <--- Results in
<p><strong>Blah blah</strong></p>
<p> </p>
<h2>Hi</h2>
<p> </p>
<p><img src="/media/okfarann/images.jpg?rmode=max&width=230&height=219" alt="" width="230" height="219" data-udi="umb://media/81088a74b2ad4aeb8ba0a4f742a580f1"></p>
<p> </p>"bodyText": {
"tag": "#root",
"attributes": {},
"elements": [
{
"tag": "p",
"attributes": {},
"elements": [
{
"text": "Hello world!",
"tag": "#text"
}
]
},
{
"tag": "p",
"attributes": {},
"elements": [
{
"text": " ",
"tag": "#text"
}
]
},
{
"tag": "p",
"attributes": {},
"elements": [
{
"tag": "strong",
"attributes": {},
"elements": [
{
"text": "Blah blah",
"tag": "#text"
}
]
}
]
},
{
"tag": "p",
"attributes": {},
"elements": [
{
"text": " ",
"tag": "#text"
}
]
},
{
"tag": "h2",
"attributes": {},
"elements": [
{
"text": "Hi",
"tag": "#text"
}
]
},
{
"tag": "p",
"attributes": {},
"elements": [
{
"text": " ",
"tag": "#text"
}
]
},
{
"tag": "p",
"attributes": {},
"elements": [
{
"tag": "img",
"attributes": {
"src": "/media/okfarann/images.jpg",
"alt": "",
"width": "230",
"height": "219"
},
"elements": []
}
]
},
{
"tag": "p",
"attributes": {},
"elements": [
{
"text": " ",
"tag": "#text"
}
]
}
],
"blocks": []
} |
|
Thanks - and yes, I think that's OK, as you would expect to retain that if you were rendering this HTML in a client application. |
…ut (#19391) * Removes unnecessary newlines from rich text as JSON delivery API output. * Fix case from PR feedback. # Conflicts: # src/Umbraco.Infrastructure/DeliveryApi/ApiRichTextElementParser.cs # tests/Umbraco.Tests.UnitTests/Umbraco.Core/DeliveryApi/RichTextParserTests.cs

Prerequisites
Resolves #19388
Description
This PR removes unnecessary newlines between HTML elements in the JSON output for rich text in the delivery API, retaining the spaces between inline elements applied via #17983.
Testing
Add mark-up to a rich text editor with line breaks.
Verify the output when requesting an the content item with the rich text property that unnecessary line breaks are removed, but spaces are retained between inline elements. See the unit tests for samples.
The delivery API needs to be enabled with the following configuration: