You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Under the bulk scraping function when it checks if the scenes URL returns a Null, your script assumes if it does that its because of a missing scraper when in fact could be do to multiple reasons such as an incorrect link. The following result is that it assumes if one URL for that site doesn't work, that all URLs for that site wont work because in the subroutine you add the URLs netloc to a blacklist. So for example you have a 20 scenes tagged scrape all from site abc.abc how ever the 5th scene to be scraped has a incorrect link the first 4 scenes will be scraped successfully but the script will simply skip scenes 6-20 as they have the same netloc as the incorrect link. You can fix this by simply removing the portion of the code that adds the netloc and instead just adds the whole URL to missing_scrapers. It could be helpful to output this list to a file for informational reasons. However if you do make this change you would no longer have protection for missing scrappers, perhaps there is a way to use the stash interface to see what scrapers are loaded and add them to a whitelist at the beginning of the script?
The text was updated successfully, but these errors were encountered:
Under the bulk scraping function when it checks if the scenes URL returns a Null, your script assumes if it does that its because of a missing scraper when in fact could be do to multiple reasons such as an incorrect link. The following result is that it assumes if one URL for that site doesn't work, that all URLs for that site wont work because in the subroutine you add the URLs netloc to a blacklist. So for example you have a 20 scenes tagged scrape all from site abc.abc how ever the 5th scene to be scraped has a incorrect link the first 4 scenes will be scraped successfully but the script will simply skip scenes 6-20 as they have the same netloc as the incorrect link. You can fix this by simply removing the portion of the code that adds the netloc and instead just adds the whole URL to missing_scrapers. It could be helpful to output this list to a file for informational reasons. However if you do make this change you would no longer have protection for missing scrappers, perhaps there is a way to use the stash interface to see what scrapers are loaded and add them to a whitelist at the beginning of the script?
The text was updated successfully, but these errors were encountered: