Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I remove whitespace between tags? #138

Open
davesag opened this issue Dec 2, 2020 · 5 comments
Open

How do I remove whitespace between tags? #138

davesag opened this issue Dec 2, 2020 · 5 comments

Comments

@davesag
Copy link

davesag commented Dec 2, 2020

I am using The Markup component as follows

const NewsItem = ({ title, subtitle, content }) => (
  <Card key={title} title={title} subtitle={subtitle}>
    <Markup content={content.trim()} />
  </Card>
)

The content I get from an API call can contain whitespace, for example:

<table>
  <tr><th>Company</th><th>Limit (M)</th><th>Risk Appetite (M)</th><th>Risk Grade</th></tr>
  <tr><td>A</td><td>500</td><td>700</td><td>3+</td></tr>
  <tr><td>B</td><td>150</td><td>140</td><td>3-</td></tr>
  <tr><td>C</td><td>437</td><td>500</td><td>4-</td></tr>
</table>

In my console I see the following warning

console.error
  Warning: validateDOMNesting(...): Whitespace text nodes cannot appear as a child of <table>. Make sure you don't have any extra whitespace between tags on each line of your source code.
      at table
      at Element (/Users/davesag/src/my-project/node_modules/interweave/lib/index.js:53:30)

Is there a simple way I can tell Markup to strip out the incoming whitespace between the <table> tag and the <tr> tags?

@milesj
Copy link
Owner

milesj commented Dec 2, 2020

This is an interesting error that I haven't seen before. I'm not sure newlines are the problem since I test it here: https://github.com/milesj/interweave/blob/master/packages/core/tests/HTML.test.tsx#L759

Is there perhaps special hidden whitespace characters? Like &nbsp; or something?

@davesag
Copy link
Author

davesag commented Dec 3, 2020

This is an interesting error that I haven't seen before. I'm not sure newlines are the problem since I test it here: https://github.com/milesj/interweave/blob/master/packages/core/tests/HTML.test.tsx#L759

Is there perhaps special hidden whitespace characters? Like &nbsp; or something?

Interesting. I found the error went away if I manually removed the white space, and I double checked the data to ensure that white space was simply and not  , or other hidden whitespace.

Maybe the issue is the fact that the table rows are not nested within a <thead> or <tbody> tag like in your test?

@milesj
Copy link
Owner

milesj commented Dec 3, 2020

@davesag I've tested without thead/tbody also and it passed. That warning also comes from React itself, not Interweave, so I'd need to dig further into why this happens.

@davesag
Copy link
Author

davesag commented Dec 3, 2020

@davesag I've tested without thead/tbody also and it passed. That warning also comes from React itself, not Interweave, so I'd need to dig further into why this happens.

Okay thanks. Yes my tests actually do pass but I was getting the warning in the test output. I've simply removed all the errant white space from my test data for now to prevent that, but it looks like something I'll need to deal with given the unpredictability of the HTML I could get from the news api I am calling. (content not under my control). I appreciate your looking into it but perhaps I can just suppress the warning instead :-)

@devinhalladay
Copy link

I have been having similar issues — my markup may return arbitrary whitespace (hard-coded tabs or space) as well as errant whitespace (some spaces between elements, /n, etc.). Here's the solution I used in my small readability API, in case it's helpful for anyone:

import htmlclean from 'htmlclean';
import { JSDOM } from 'jsdom';
import { NextApiRequest, NextApiResponse } from 'next';

import { Readability } from '@mozilla/readability';

export type ReaderResponse = {
  markup: string;
  text: string;
  source: string;
};

export default async (
  request: NextApiRequest,
  response: NextApiResponse<ReaderResponse>
) => {
  /**
   * Fetch the article HTML as text from source URL
   */
  const url = request.query.url as string;
  const article = await fetch(url);
  const markup = await article.text();

  /**
   * Create a JSDOM object, which provides a virtual DOM interface to the HTML.
   * The document object will be accessible via `doc.window.document`.
   */
  const doc = new JSDOM(markup);

  /**
   * Create a Readability object, which will parse the HTML and extract the
   * most likely article content.
   * `reader.content` will return stringified HTML, which can be used to
   * render the article content as markup.
   * `reader.textContent` will return the text itself, with all HTML stripped.
   */
  const reader = new Readability(doc.window.document).parse();

  /**
   * Convert the Readability HTML back to a JSDOM object for manipulation.
   */
  const readerDoc = new JSDOM(reader.content);

  /**
   * Passing the document fragment which contains our article content,
   * clean it in order to strip errant whitespace, such as newlines,
   * hard-coded indents and spaces, and space between HTML tags.
   */
  const minifiedMarkup: string = htmlclean(
    readerDoc.window.document.body.innerHTML
  );

  response.status(200).send({
    markup: minifiedMarkup,
    text: reader.textContent,
    source: reader.content,
  });
};

I then render the minified and whitespace-cleaned markup with a pretty barebones <Interweave /> component.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants