Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Site support request: 8muses #305

Closed
ShyWest opened this issue Jun 9, 2019 · 6 comments
Closed

Site support request: 8muses #305

ShyWest opened this issue Jun 9, 2019 · 6 comments

Comments

@ShyWest
Copy link

ShyWest commented Jun 9, 2019

Could be possible to add support for 8muses?

It's a pretty standard comic/hentai site, however it uses a rather uncommon category system. Instead of tagging things, the comics and galleries follow tree based hierarchy with arbitrary nodes.

For example, the homepage (https://www.8muses.com/) shows several categories. One of them is Comics (https://www.8muses.com/comics)

The comics section has several links again, where we can find this one: https://www.8muses.com/comics/album/JohnPersons_com-Comics

There we can again find several links to several galleries, like this one: https://www.8muses.com/comics/album/JohnPersons_com-Comics/Pegasus

Which in turn leads to a link to a comic: https://www.8muses.com/comics/album/JohnPersons_com-Comics/Pegasus/2-Hot-Blondes-Submit-to-Big-Black-Cock

Which in turn links to two new sub-categories, to separate two issues. Following them (https://www.8muses.com/comics/album/JohnPersons_com-Comics/Pegasus/2-Hot-Blondes-Submit-to-Big-Black-Cock/Issue-1) we can finally get a page (https://www.8muses.com/comics/picture/JohnPersons_com-Comics/Pegasus/2-Hot-Blondes-Submit-to-Big-Black-Cock/Issue-1/1).

At least the pages and the thumbnails can be easily identifiable since they use different classes on the HTML. Thumbnails that lead to a new list of subcategories seem to always have a title as a div.image-title, while thumbnails that lead to actual pages don't.

The depth level and organization criteria is non-consistent and the metadata seems non-existent except for a breadcrumb at the top showing the full list of nodes. Usually the final nodes have actual names rather than be just "Issue 1", but it may be a good idea to be able to use the full path as a name rather than the actual node name.

The site allows searches. Example: https://www.8muses.com/search?q=foo

I understand if it's not possible or worthwhile to add support for this site considering its oddities. Thank you for your time.

@ShyWest
Copy link
Author

ShyWest commented Jun 9, 2019

Ok, I wrote a prototype and I should have done before sending the ticket. This site limits connections hard. With pauses of five seconds between requests I'm running into troubles trying to download more than 20-30 images. Doesn't seem worth it after all. Sorry for the inconvenience.

@wankio
Copy link
Contributor

wankio commented Jun 9, 2019

i think ripme work well for your purpose

@mikf
Copy link
Owner

mikf commented Jun 9, 2019

This site limits connections hard

I've been browsing this site in my browser and didn't experience anything like that, even if moving at 1-2 pictures per second. Maybe the throttling is location-dependent?

@ShyWest
Copy link
Author

ShyWest commented Jun 10, 2019

Yes, the site works perfectly on a browser. I tried it again to see if I could disguise better my bot and now it works without a hitch even without pauses between downloads. It may have been a fluke, albeit a long one.

Here's the script and a couple of examples if you want to check it:

Not sure how useful is in your case, but as a workaround to bypass javascript hijinks I took the path of the thumbnails and modified it. Thumbnails are under image/th/, while full pages are under image/fl/. Both share the same filename and let's you avoid extra requests. Example:

mikf added a commit that referenced this issue Jun 10, 2019
@mikf
Copy link
Owner

mikf commented Jun 10, 2019

With the help of your and ripme's code, I managed to put an "album" extractor together. Album in a broad sense, since you can input basically any 8muses URL and it will recursively download any albums and subalbums it finds. For example gallery-dl https://www.8muses.com/comics/album/Fakku-Comics would download the whole collection of Fakku h-manga.

One problem is the inability of gallery-dl to create a dynamic amount of sub-directories for nested albums, so it uses the whole album path as name for one directory level. It's a bit inconvenient, but it kind of works.

I couldn't really find too many connection problems while testing it. Sometimes you'd get a dropped connection or a rather slow download speed, but only for some albums.

@ShyWest
Copy link
Author

ShyWest commented Jun 10, 2019

One problem is the inability of gallery-dl to create a dynamic amount of sub-directories for nested albums, so it uses the whole album path as name for one directory level. It's a bit inconvenient, but it kind of works.

Seems like the perfect solution to me, at least for my workflow. There are several comics/galleries whose individual name is "Issue 1", so it's better this way. Thanks for your time and I'm glad I was helpful.

@mikf mikf closed this as completed Jun 14, 2019
@mikf mikf added the nsfw label Jul 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants