-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Differentiate actual paragraphs and LUTE's page splitting #475
Comments
Lute splits by sentence to keep the token count reasonable: some paragraphs can get really long, or the formatting can get weird when people copy/paste text from different places. It's hard to say what the best solution is here, I still feel that splitting paragraphs up is necessary, and I don't have a great answer for this ... unless I do something like first try to group by paragraphs, and then check the page size and only split the paragraph if the page is, say, 50% longer than the maximum size. It's a bit convoluted and tricky, but maybe that would suffice. Does that seem reasonable?
Yes this sounds reasonable and I'm not sure why I did this in the first place. :-)
Yeah that's so tough! Would be nice though, I agree. There's an issue to allow for markdown on import of text files (though tables still wouldn't be possible, b/c it's bananas) Let me know re the paragraph grouping and page size threshold idea ... it could be tricky-ish. |
Allthough the question was not adressed to me: I personally like this idea with a two step process (First grouping paragraphs and only further split if it exceeds a certain threshold). At the moment I still have quit a lot of work in pre-processing books, often setting manual separators (---) to avoid parts of the problems mentoined above. |
I've had annoying paragraph splits as well, in places that didn't make sense and complicated the reading of a tough book. |
I've been thinking about this one on and off for a few days, and don't have a perfect solution for it yet, but probably good enough: I'm considering doing grouping based on paragraphs, and just not bothering to split paragraphs at all, even if the paragraph is huge. For the most part, that should be fine ... if someone is reading James Joyce or Faulkner with Lute, they're probably outside of the target user base anyway. And such users could just add a new page and split a long paragraph manually if they wanted to. With this change, the "max words" thing would become a "threshold", and paragraphs would be added to the current page until the threshold is crossed, at which point a new page would be started. The threshold could be exceeded by any number of tokens -- but it's probably good enough. |
Is your feature request related to a problem? Please describe.
Where to break a parapraph is a stylistic choice by the author and may aid in storytelling. This information is lost because LUTE splits paragraphs to meet the token count per page.
Describe the solution you'd like
Indent original paragraphs, while leaving the "new paragphs" resutled from LUTE's splitting unchanged the same way they are treated now.
If possible, I'd like LUTE to not remove empty lines (or all whitespaces) for the same reason. Many authors use 2 or 3 lines for sub-chapter breaks. Some even differentiate the use of 2 lines or 3 lines.
Describe alternatives you've considered
During book creation, give users the option of "don't split paragraphs" (unchecked by default). Not the best solution performance-wise and probably takes more work.
Additional context
With LUTE we already lose a lot of information such as bold, italic, underline, images, tables. I understand the difficulty of incorporating them, which entails fundamental changes. But paragraph breaks and empty lines are probably the easiest ones to handle and deserve a chance.
The text was updated successfully, but these errors were encountered: