Skip to content

Releases: microlinkhq/metascraper

v5.5.4

04 Jul 10:16
cc6eac9
Compare
Choose a tag to compare

5.5.4 (2019-07-04)

Note: Version bump only for package metascraper

v5.5.3

29 Jun 12:32
Compare
Choose a tag to compare

5.5.3 (2019-06-29)

Note: Version bump only for package metascraper

v5.5.2

20 Jun 17:38
Compare
Choose a tag to compare

5.5.2 (2019-06-20)

Note: Version bump only for package metascraper

v5.5.1

20 Jun 16:43
Compare
Choose a tag to compare

5.5.1 (2019-06-20)

Note: Version bump only for package metascraper

v5.5.0

20 Jun 16:23
Compare
Choose a tag to compare

5.5.0 (2019-06-20)

Features

v5.4.7

20 Jun 09:42
Compare
Choose a tag to compare

5.4.7 (2019-06-20)

Note: Version bump only for package metascraper

v5.4.6

19 Jun 21:54
Compare
Choose a tag to compare

5.4.6 (2019-06-19)

Note: Version bump only for package metascraper

v5.0.0

17 Mar 15:46
2b453b2
Compare
Choose a tag to compare

Breaking Changes

Rules Bundles processed in parallel

Until now, the rules bundles are processed in the interface, being possible passing meta between rules:

({ htmlDom: $, meta, url: baseUrl }) => wrap($ => $('meta[property="og:logo"]').attr('content')),

Now, the bundles rules are processed in parallel, being no possible have shared information between rules, so meta will no more passed.

The only official rule bundler affected by this is metascraper-lang-detector.

Improvements

Add metascraper-readability

The metascraper-readabilityhttp://npm.im/metascraper-readability is based on https://github.com/mozilla/readability.

v4.9.0

10 Jan 18:32
ca32573
Compare
Choose a tag to compare

Remove sanitize-html

The dependency is introducing a bug related to malformed URLs: apostrophecms/sanitize-html#274

In fact, I detected it's no longer necessary since htmlparser2 is present as part of cheerio load method.

Result: Smaller bundler, less parsing time.

Setup CSS Insensitive Rules

One of the things related to sanitize-html was normalized some common things around the HTML markup.

Because this dependency is no more dependency and after discovering that CSS rules can be insensitive, I enabled it properly in where is possible.

Result: Better data detection, less initial parsing time.

Improve Date Rules

Based on the insensitive CSS rules improvement, I was re-checking the bundle set related to metascraper-date.

I detected some interesting improvement opportunities: some rules can be merged into the same, also being possible to convert some rules into more generic, improving the data accurately.

Also, I tried to prioritize update over create, so the output is more associated with the last modification date over the creation date.

Result: Better date accurate, more value detected.

Improve URL detection

The URL detection has been improved for being possible detected more kind of URLs.

An URL is a subtype of URI. The thing that I want to be sure is detecting as much data as possible.

Now the metascraper-helpers related with urls being possible detected URIs, such data image URI encoded on base64 or magnet URIs.

The challenge here is doing that while we still support original functionality. I added a lot of tests to ensure about that.

Result: Better URLs detection, supporting URIs.

v4.6.0

26 Oct 18:02
0ef7ad5
Compare
Choose a tag to compare

Features