Skip to content

Conversation

@langleyd
Copy link
Member

@langleyd langleyd commented Jul 16, 2025

What's in this PR?

  • Adds support for pasting rich content from external sources. I've tried to keep it readable/sensible to follow commit-by-commit.
  • The main function added to the api(replace_html) has an argument of type HTMLSource that can be Matrix(i.e. what s generated by the RTE), GoogleDoc (there is a bit of special casing to clean it's content up) or UnknownExternal(all other external sources).
  • For HTMLSource.Matrix we throw errors if we see unexpectedly formatted html(as it was produced by RTE we don't expect to see it malformed.)
  • For external sources we will skip over nodes we do not expect/support. Generally we will continue to process the unknown nodes children(Otherwise you could have a div surrounding the pasted content and the whole thing gets skipped) except for some specific cases(if we see tag other than a <li> within a <ol> or <ul> it's invalid and should be ignored).
  • Previously to this PR, I don't think we were enforcing assert-invariants from the dom output when parsing html. We do now and I've fixed up those tests. I've also added some more validation to the parsing and some more post processing to try and ensure we have as valid a dom we get from parsing external html as we do from creating the content within the RTE (Namely, containers don't a mix of inline and block nodes, adjacent text nodes get joined, adjacent blocks of the same type get joined).

What does it look like?

Screen.Recording.2025-07-23.at.15.33.41.mov

langleyd added 15 commits July 16, 2025 12:05
and bring it's error handling in line with sys parser so that it returns an error if external_html_source is false and it encounters tags it doesn't support.
…ren, not the li/ol's).

Also adds post processing to cleanup sibling text nodes.
… where there are inline + block containers in the same parent.
Namely, fixing the cursor position after insert, and adding more parsing post processing to keep the dom in an expected state with wrap_inline_nodes_into_paragraphs_if_needed and join_nodes_in_container.
Also fixing some tests.
As we are now parsing from non-matrix sources, adding some rigour around validation of a valid dom. E.g. making sure nodes other than list items are not added to lists or that list items are not added to containers other than lists.
… == "bold".

This fixes the pasting bold content from google docs(and possibly other sources).
@codecov-commenter
Copy link

codecov-commenter commented Jul 23, 2025

Codecov Report

Attention: Patch coverage is 91.87592% with 55 lines in your changes missing coverage. Please review.

Project coverage is 89.69%. Comparing base (7ceaef6) to head (e118081).

Files with missing lines Patch % Lines
crates/wysiwyg/src/dom/parser/parse.rs 94.07% 25 Missing ⚠️
bindings/wysiwyg-wasm/src/lib.rs 0.00% 16 Missing ⚠️
crates/wysiwyg/src/dom/range.rs 0.00% 12 Missing ⚠️
platforms/web/lib/composer.ts 88.88% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #71      +/-   ##
==========================================
+ Coverage   89.66%   89.69%   +0.02%     
==========================================
  Files         170      171       +1     
  Lines       20848    21390     +542     
  Branches      291      294       +3     
==========================================
+ Hits        18693    19185     +492     
- Misses       2152     2202      +50     
  Partials        3        3              
Flag Coverage Δ
uitests 84.49% <ø> (ø)
uitests-ios 84.49% <ø> (ø)
unittests 88.51% <91.87%> (+0.06%) ⬆️
unittests-ios 87.54% <ø> (ø)
unittests-react 83.14% <88.88%> (+0.04%) ⬆️
unittests-rust 89.26% <91.95%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@langleyd langleyd marked this pull request as ready for review July 23, 2025 15:16
@langleyd langleyd changed the title Add support for pasting rich text content Add support for pasting rich text content (Rust + Web) Jul 23, 2025
@jmartinesp jmartinesp self-requested a review July 23, 2025 15:23
Copy link
Member

@jmartinesp jmartinesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the Android and iOS checks, but then I remembered they won't really matter much since replace_html isn't used there 😅 . Since the code is quite large and complex I've taken a quick look and I trust the tests will check the right cases 🤞 .

}
}

#[cfg(test)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have some comments stating which structure these huge HTML chunks should represent? It's quite difficult to understand if the tests are doing the right thing since we can't compare the tree structures.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea most of this chunk is actually the same as the one in parse.rs so I'll use format here to make things clearer and avoid the copy paste and add some comments.

indoc! {
r#"
├>ol
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're merging the several ordered list nodes in the MS docs HTML?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, this was happening an offshot of the join_nodes_in_container being called in in a number of places like here. And the behaviour wasn't totally unwanted, as it's kind of weird that ms docs was splitting these into separate lists anyway(in the UI I had created 1 list, I think in their html they have all items as separate lists use margin to do indenting).

Do you think this is ok? Or should we avoid calling join_nodes_in_container over zealously?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's probably ok, as you say the resulting structure makes more sense.

@sonarqubecloud
Copy link

@langleyd langleyd merged commit f822b68 into main Jul 23, 2025
9 checks passed
@langleyd langleyd deleted the langleyd/copy-paste-richtext branch July 23, 2025 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants