No handler for incorrect URLs during Bulk Media Scrapping #12

mrcomicon · 2021-04-24T21:25:31Z

Under the bulk scraping function when it checks if the scenes URL returns a Null, your script assumes if it does that its because of a missing scraper when in fact could be do to multiple reasons such as an incorrect link. The following result is that it assumes if one URL for that site doesn't work, that all URLs for that site wont work because in the subroutine you add the URLs netloc to a blacklist. So for example you have a 20 scenes tagged scrape all from site abc.abc how ever the 5th scene to be scraped has a incorrect link the first 4 scenes will be scraped successfully but the script will simply skip scenes 6-20 as they have the same netloc as the incorrect link. You can fix this by simply removing the portion of the code that adds the netloc and instead just adds the whole URL to missing_scrapers. It could be helpful to output this list to a file for informational reasons. However if you do make this change you would no longer have protection for missing scrappers, perhaps there is a way to use the stash interface to see what scrapers are loaded and add them to a whitelist at the beginning of the script?

niemands · 2021-04-25T10:51:17Z

Thank you for pointing this out, I will take a look into it

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No handler for incorrect URLs during Bulk Media Scrapping #12

No handler for incorrect URLs during Bulk Media Scrapping #12

mrcomicon commented Apr 24, 2021

niemands commented Apr 25, 2021

No handler for incorrect URLs during Bulk Media Scrapping #12

No handler for incorrect URLs during Bulk Media Scrapping #12

Comments

mrcomicon commented Apr 24, 2021

niemands commented Apr 25, 2021