Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No handler for incorrect URLs during Bulk Media Scrapping #12

Open
mrcomicon opened this issue Apr 24, 2021 · 1 comment
Open

No handler for incorrect URLs during Bulk Media Scrapping #12

mrcomicon opened this issue Apr 24, 2021 · 1 comment

Comments

@mrcomicon
Copy link

Under the bulk scraping function when it checks if the scenes URL returns a Null, your script assumes if it does that its because of a missing scraper when in fact could be do to multiple reasons such as an incorrect link. The following result is that it assumes if one URL for that site doesn't work, that all URLs for that site wont work because in the subroutine you add the URLs netloc to a blacklist. So for example you have a 20 scenes tagged scrape all from site abc.abc how ever the 5th scene to be scraped has a incorrect link the first 4 scenes will be scraped successfully but the script will simply skip scenes 6-20 as they have the same netloc as the incorrect link. You can fix this by simply removing the portion of the code that adds the netloc and instead just adds the whole URL to missing_scrapers. It could be helpful to output this list to a file for informational reasons. However if you do make this change you would no longer have protection for missing scrappers, perhaps there is a way to use the stash interface to see what scrapers are loaded and add them to a whitelist at the beginning of the script?

@niemands
Copy link
Owner

Thank you for pointing this out, I will take a look into it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants