-
-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(date): add datePublished
and datemodified
support
#374
Conversation
Signed-off-by: Stiliyan Ivanov <[email protected]>
Signed-off-by: Stiliyan Ivanov <[email protected]>
Signed-off-by: Stiliyan Ivanov <[email protected]>
Thanks for this! My only concern is to add too many date fields as part of the payload. I understand require('metascraper-date')({
datePublished: true,
dateModified: true
}) By doing that we can keep the original date behavior, but also make it possible to be more specific in case you want. Also to make this in a progressive way to see the adoption. I'm thinking into two user cases for that:
waiting for your opinion 🙂 |
Yeah that's a good idea. Adding these dates as options is good approach and won't change the current behavior.
|
LGTM! I think it's worth it to sort the rules to avoid repetition and make them easy to read. Doing that, the default behavior could be easy as add the rules as fallbacks const date = dateRules() || dateModifiedRules() || datePublishedRules() |
Yep, should be cleaner this way. I will work when have more time and update the pull request. |
5e7014f
to
ad9351e
Compare
76ec51a
to
e71c2b0
Compare
adab710
to
2b7048e
Compare
Is this still going to be merged? I need this feature, so if there is still work to be done, let me know so I can do it |
@ghmendonca do you want to lead this PR? I will be happy to merge if after it's updated with the current codebase |
@Kikobeats yes, leave it to me, will work on this later today |
@ghmendonca feel free to open a new PR picking the changes from here 🙂 |
Signed-off-by: Stiliyan Ivanov <[email protected]>
Signed-off-by: Stiliyan Ivanov <[email protected]>
Hey guys, @Kikobeats @ghmendonca This was totally forgotten. I have merged the latest changes I made back in the day. The rules are reordered as we agreed on and config options added. This will lead to breaking changes (the date will be different in certain test cases) as the rules are ordered differently than the previous version. If you want 1:1 backward compatibility we should keep the order for the Probably closest behaviour to the old order will be:
|
Signed-off-by: Stiliyan Ivanov <[email protected]>
Signed-off-by: Stiliyan Ivanov <[email protected]>
Signed-off-by: Stiliyan Ivanov <[email protected]>
Signed-off-by: Stiliyan Ivanov <[email protected]>
Signed-off-by: Stiliyan Ivanov <[email protected]>
Signed-off-by: Stiliyan Ivanov <[email protected]>
I think this pretty much wraps up the pull request. The snapshot should be updated as I have problems with my setup to do it. |
@madwings Thanks for taking care; I can finish it 🙂 |
@madwings @Heheehd @ghmendonca Do you know any site we can use for creating an integration test using these new rules? |
The rule: ``` toDate($ => $filter($, $('[class*="publish" i]'))) ``` is taking more priority than: ``` toDate($ => $('[property*="dc:date" i]').attr('content')) ``` It's okay for this case since it's a fuzzy rule so isn't too deterministic
The rule ``` toDate($ => $('[itemprop="datepublished" i]').attr('content')) ``` It's taking more priority than: ``` toDate($ => $('meta[name="date" i]').attr('content') ``` It's okay for this case since it's a fuzzy rule so isn't too deterministic
Some of the domains we are scraping data from (date) too. If you want I can provide you with a lot more or even better - specific articles. |
Thanks a lot! Working to add some sites as part of the integration tests 🙂 |
@Kikobeats I was testing with this article here -> https://www.mentalfloss.com/article/90967/14-facts-about-wheres-waldo |
@ghmendonca that's actually expected since the new rules set is the latest timestamp vs. first timestamp For example, in a markup like this:
The |
0f0bcec
to
764b024
Compare
764b024
to
af1691e
Compare
datePublished
and datemodified
support
shipped!
🎉 |
With this pull request I propose adding two additional dates to the date package -
datePublished
anddateModified
.Some websites use both dates for their articles. There are use cases when you want to know both if they exist or exactly these two not just a date found in the html.