Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PRO]: Conversion API - Preserve whitespaces on Export #5804

Open
1 task done
mbukovy opened this issue Nov 5, 2024 · 2 comments
Open
1 task done

[PRO]: Conversion API - Preserve whitespaces on Export #5804

mbukovy opened this issue Nov 5, 2024 · 2 comments
Labels
Category: Pro The issue or pull request is related to the pro packages of Tiptap. Type: Bug The issue or pullrequest is related to a bug

Comments

@mbukovy
Copy link

mbukovy commented Nov 5, 2024

Affected Packages

Conversion API

Version(s)

current

Description of the Bug

Hello, we're using your conversion API to import and export the docx documents and there's a problem exporting newlines and spaces. This is the example from your demo at Export page

image

and this is how it looks exported to Word

image

Browser Used

Chrome

Code Example (Preferred)

No response

Expected Behavior

Preserve white spaces on export so it looks the same as in the editor

Additional Context (Optional)

No response

Dependency Updates

  • Yes, I've updated all my dependencies.
@mbukovy mbukovy added Category: Pro The issue or pull request is related to the pro packages of Tiptap. Type: Bug The issue or pullrequest is related to a bug labels Nov 5, 2024
@StephanMeijer
Copy link

StephanMeijer commented Nov 7, 2024

I am not affiliated with Tiptap, but have done some work on Pandoc.

This is expected behaviour of Pandoc. Basically, it is a limitation of the underlying software Tiptap uses.


input.html

<p>there</p>
<p></p>
<p></p>
<p></p>
<p></p>
<p>are newlines</p>
<p></p>
<p></p>
<p>and sppaces         and spaces     and spaces</p>
<p></p>
<p></p>
<p></p>
<p></p>
<p>everywhere</p>

Command

$ pandoc input.html
<p>there</p>
<p>are newlines</p>
<p>and sppaces and spaces and spaces</p>
<p>everywhere</p>

@StephanMeijer
Copy link

StephanMeijer commented Nov 7, 2024

Additional info. In HTML, <p>and sppaces and. spaces. and spaces</p> is equivalent to <p>and spaces and. spaces. and spaces</p>. The same basically for Markdown. In OOXML, empty text inside a <w:t/t> is respeced. So when Tiptap converts from Tiptap's AST to HTML or Markdown, the HTML Reader will just filter out the meaningless whitespace (and probably also the empty paragraphs). The Docx Writer won't have any of the context anymore for doing anything with whitespace preservation, as all that information is (for good reason) lost in the transformation from HTML/Markdown to Pandoc's AST, which they call "native".

You could reproduce this by:

$ pandoc input.html -o out.native   
$ cat out.native 
[ Para [ Str "there" ]
, Para [ Str "are" , Space , Str "newlines" ]
, Para
    [ Str "and"
    , Space
    , Str "sppaces"
    , Space
    , Str "and"
    , Space
    , Str "spaces"
    , Space
    , Str "and"
    , Space
    , Str "spaces"
    ]
, Para [ Str "everywhere" ]
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Category: Pro The issue or pull request is related to the pro packages of Tiptap. Type: Bug The issue or pullrequest is related to a bug
Projects
None yet
Development

No branches or pull requests

2 participants