Skip to content

Update Parser For Site https://exiledrebelsscanlations.com/ #1040

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bonnetchuu opened this issue Aug 13, 2023 · 6 comments
Closed

Update Parser For Site https://exiledrebelsscanlations.com/ #1040

bonnetchuu opened this issue Aug 13, 2023 · 6 comments

Comments

@bonnetchuu
Copy link

bonnetchuu commented Aug 13, 2023

Hello, I would like to request for a specific update regarding the parser for the site "https://exiledrebelsscanlations.com/".

Despite one of the Advanced Options in the Webtoepub extension being selected "Add Page with Chapters to Chapters List", Webtoepub can no longer grab the Table of Contents page of any novel on this site into the epub. I recall that Webtoepub used to be able to do this, but one of the recent updates to the extension removed this capability from this specific parser.

image

Also, regarding one more thing about this site, when generating the epub, is it possible to not grab the "dir" attribute of all paragraph + heading tags and to also not grab the embedded font attributes in the span tags. So to basically omit the dir and embed font attributes in the p and span tags, and just grab the text ensconced within the paragraph tags itself (+ the < p > tags). These are written extremely lengthy and make the epub look very cluttered, especially since they're written for literally almost every sentence.

image

Thanks for your time.

@yuzucitrusx
Copy link

@dteviot

hi hiiii, i was looking through the open issues before opening a new request and saw that someone else had already requested this fix about the source i was looking to ask about! i didn't wanna open a duplicate request, so is it okay to also make this request via a comment?

@dteviot
Copy link
Owner

dteviot commented Sep 19, 2023

@yuzucitrusx

so is it okay to also make this request via a comment?

Yes. Although I'd point out bonnetchuu has made at least 3 requests above, so you might like to be specific about exactly what it is you want. Although I assume you're asking to have "Add Page with Chapters to Chapters List" fixed.

@dteviot
Copy link
Owner

dteviot commented Sep 19, 2023

@bonnetchuu @yuzucitrusx

Re: Add Page with Chapters to Chapters List not working.
The situation is, the layout of the Table of Contents page is different to the layout of a typical chapter.
In particular, the HTML node that holds the wanted content on a chapter isn't there.
So, I need to add code to find the content node for a ToC page, and have the logic to do this not mess up when page isn't a ToC page. Then, if there's additional processing needed for chapters, I need to make sure that doesn't mess up either.
So, to support getting a ToC page when it's a different layout to a chapter means (worst case) I have to write two parsers for the site, plus enough logic to distinguish between the two cases.
In general, I'd rather spend my time writing a parser for a completely new site.

That said, it looks like this one's an easy fix.

Test versions for Firefox and Chrome have been uploaded to https://drive.google.com/drive/folders/1B_X2WcsaI_eg9yA-5bHJb8VeTZGKExl8?usp=sharing. Pick the one suitable for you, follow the "How to install from Source (for people who are not developers)" instructions at https://github.com/dteviot/WebToEpub/tree/ExperimentalTabMode#user-content-how-to-install-from-source-for-people-who-are-not-developers and let me know how it goes.
Tested with:

For my notes: 10 minutes work (Or 40 if I include writing this rant.)

@bonnetchuu, as regards scrubbing the HTML, that looks like a job for my epub editor tool. Please go to https://github.com/dteviot/EpubEditor and raise the issue there. And I'll see about giving you a script to do the cleanup.

FYI. In general, to make it easier to me to track what I've done, and what still needs doing, please create an issue for each separate thing you ask for. Putting multiple requests in a single issue does NOT make my life easier.

@dteviot
Copy link
Owner

dteviot commented Sep 19, 2023

@bonnetchuu
Actually, looks like I've already done something close to what you want. Have a look at: #964

@yuzucitrusx
Copy link

@dteviot

hi hii, thank you, it fixed the toc issue! i also looked at that other issue you referenced for the span tags, but um i got kind of confused when i tried it out with the script in epubeditor... should i open the issue for epubeditor there and maybe ask for a script specifically for these "span style" tags instead?

@dteviot
Copy link
Owner

dteviot commented Sep 19, 2023

@yuzucitrusx

should i open the issue for epubeditor there and maybe ask for a script specifically for these "span style" tags instead?

Yes, please do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants