Skip to content
This repository was archived by the owner on Nov 25, 2023. It is now read-only.

Repeating images #2

Open
ducu opened this issue Oct 15, 2014 · 6 comments
Open

Repeating images #2

ducu opened this issue Oct 15, 2014 · 6 comments

Comments

@ducu
Copy link
Member

ducu commented Oct 15, 2014

Following images should be specified on extralist.txt

fcw.com -> http://fcw.com/design/gig/fcw/2012/img/fcw-logo.png
github.com -> https://github.com/apple-touch-icon-144.png
www.forbes.com -> http://images.forbes.com/images/channels/headercoverstory*
a16z.com -> http://a16z.files.wordpress.com/2014/01/7cb56ea5114a9f0e92d53bf0e171d15d.png
www.gv.com -> http://img.gv.com/wp-content/uploads*

@jonw2000
Copy link
Contributor

jonw2000 commented Feb 1, 2015

This github article is a problem if apple-touch-icon-144.png is excluded
https://github.com/blog/1862-introducing-a-simpler-faster-github-for-mac

Either the icon is not excluded (but this may ruin other github pages) or image resizing is used to resize https://cloud.githubusercontent.com/assets/22635/3517580/2399aa10-06f1-11e4-8671-0923504c594a.png.

@jonw2000
Copy link
Contributor

jonw2000 commented Feb 1, 2015

Added these to extralist but the one it should pick is too big.

images.forbes.com/images/channels/headercoverstory*
i.forbesimg.com/assets/img/anon_avatar.png
blogs-images.forbes.com//files///-300x300.png
images.forbes.com/images/channels/headerfeature*_panel*_original-.

@jonw2000
Copy link
Contributor

jonw2000 commented Feb 1, 2015

Forbes images - have included some regex filters in extralist but the main one is excluded due to image size. Perhaps consider image resampling/resizing using Pillow?

@jonw2000
Copy link
Contributor

jonw2000 commented Feb 1, 2015

a16z.com - the map and office photo are excluded but the article image is not found as it's embedded in the following:

Perhaps a custom extractor can be used here.

@ducu
Copy link
Member Author

ducu commented Feb 1, 2015

Ok so here's one option:
Let summary get up to 3 valid (filtered) images and decide which one to choose based on some measurements such as get the bigger one in size. This is probably a good choice most of the times (e.g. www.forbes.com images, logos everywhere are smaller than the article image etc.).
This should be easy to do, I remember I was playing with this option at the time.

Second, we could have custom extraction handlers based on the link url as you said.
This would be useful if we want to extend summary to get video urls for example (e.g. have special handlers for youtube.com, vimeo.com and so on). This would be the most flexible option but also the most difficult to maintain.

Re image manipulation (resizing etc) we could try http://thumbor.org/ in the future, think I mentioned it before. But let's leave this for later..

@ducu
Copy link
Member Author

ducu commented Feb 1, 2015

Looks like a candidate for PhantomJS: googleandyourbusiness.blogspot.com

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants