You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: ability to add custom extractors via api (#484)
* feat: ability to add custom extractors via api
* docs: updating readme
* fix: example.com was being used in another test
* fix: timezone was messing up date_published test
* fix: using a unique site for testing
* fix: updated custom extractor api
* docs: updating readme
* fix: removing unused fixture
* fix: updating test description
* feat: ability to add custom extractors via cli
Copy file name to clipboardExpand all lines: src/extractors/custom/README.md
+59
Original file line number
Diff line number
Diff line change
@@ -349,3 +349,62 @@ This script will open both an `html` and `json` file allowing you to preview you
349
349
If you've written a custom extractor, please send us a pull request! Passing tests that demonstrate your parser in action will help us evaluate the parser.
350
350
351
351
Sometimes you may find that the site you're parsing doesn't provide certain information. For example, some sites don't have deks, and in those instances, you don't need to write a selector for that field. If there's a test for a selector you don't need, you can just remove that test and make note of it in your pull request.
352
+
353
+
---
354
+
355
+
## Adding Custom Extractor via API
356
+
357
+
As of **version 2.1.1**, you can additionally add custom private extractors via API. Make sure that your custom extractor includes a domain name. Note that extractors added via API will take precedence over the packaged custom extractors.
358
+
359
+
```javascript
360
+
constcustomExtractor= {
361
+
domain:'www.sandiegouniontribune.com',
362
+
title: {
363
+
selectors: ['h1', '.ArticlePage-headline'],
364
+
},
365
+
author: {
366
+
selectors: ['.ArticlePage-authorInfo-bio-name'],
367
+
},
368
+
content: {
369
+
selectors: ['article'],
370
+
},
371
+
};
372
+
373
+
Mercury.addExtractor(customExtractor);
374
+
```
375
+
376
+
---
377
+
378
+
## Passing custom extractor to addExtractor via CLI
379
+
380
+
It's also possible to add a custom parser at runtime via the CLI.
381
+
382
+
### 1. Create your custom extractor in a standalone file.
383
+
384
+
```javascript
385
+
var customExtractor = {
386
+
domain:'postlight.com',
387
+
title: {
388
+
selectors: ['h1'],
389
+
},
390
+
author: {
391
+
selectors: ['.byline-name'],
392
+
},
393
+
content: {
394
+
selectors: ['article'],
395
+
},
396
+
extend: {
397
+
uniqueKeyFromFixture: {
398
+
selectors: ['.single__hero-category'],
399
+
},
400
+
},
401
+
};
402
+
403
+
module.exports= customExtractor;
404
+
```
405
+
406
+
### 2. From the CLI, add the `--add-extractor` param:
0 commit comments