Skip to content

Commit da9606a

Browse files
toufic-madampash
authored andcommitted
docs: Add parsing custom HTML to README.md (#326)
1 parent b3e2a0f commit da9606a

File tree

1 file changed

+15
-0
lines changed

1 file changed

+15
-0
lines changed

Diff for: README.md

+15
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,8 @@ If Mercury is unable to find a field, that field will return `null`.
6464

6565
#### `parse()` Options
6666

67+
##### Content Formats
68+
6769
By default, Mercury Parser returns the `content` field as HTML. However, you can override this behavior by passing in options to the `parse` function, specifying whether or not to scrape all pages of an article, and what type of output to return (valid values are `'html'`, `'markdown'`, and `'text'`). For example:
6870

6971
```javascript
@@ -78,6 +80,19 @@ This returns the the page's `content` as GitHub-flavored Markdown:
7880
"content": "...**Thunder** is the [stage name](https://en.wikipedia.org/wiki/Stage_name) for the..."
7981
```
8082

83+
##### Pre-fetched HTML
84+
85+
You can use Mercury Parser to parse custom or pre-fetched HTML by passing an HTML string to the `parse` function as follows:
86+
87+
```javascript
88+
Mercury.parse(url, {
89+
html:
90+
'<html><body><article><h1>Thunder (mascot)</h1><p>Thunder is the stage name for the horse who is the official live animal mascot for the Denver Broncos</p></article></body></html>',
91+
}).then(result => console.log(result));
92+
```
93+
94+
Note that the URL argument is still supplied, in order to identify the web site and use its custom parser, if it has any, though it will not be used for fetching content.
95+
8196
#### The command-line parser
8297

8398
Mercury Parser also ships with a CLI, meaning you can use the Mercury Parser

0 commit comments

Comments
 (0)