Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.13 enhancement list #139

Open
eshellman opened this issue Oct 21, 2022 · 30 comments
Open

v0.13 enhancement list #139

eshellman opened this issue Oct 21, 2022 · 30 comments

Comments

@eshellman
Copy link
Collaborator

eshellman commented Oct 21, 2022

leave comments on this issue for enhancements desired for the next major version of Ebookmaker.

Possibilities (subject to feasibility and community support)

@gbnewby
Copy link
Collaborator

gbnewby commented Oct 23, 2022

Cover page handling: #132

@eshellman
Copy link
Collaborator Author

We may be able to address the problem in #132 with a css tweak; the difficulty is in testing rather than in the Ebookmaker code.

@asylumcs
Copy link
Contributor

As far as testing for #132, the suggested code is the default cover image handling from Sigil (https://sigil-ebook.com/sigil/). We can benefit from the implied testing by Sigil users.

@gbnewby
Copy link
Collaborator

gbnewby commented Oct 25, 2022

Generate errors for use of blocklisted items. If someone uses <wbr> for example, this should be an ERROR in ebm, so submitters can see in output.txt (via https://ebookmaker.pglaf.org) they cannot use that.

Over time the blocklist will shrink. The blocklist is in the DP Wiki, but ebm code is the canonical source of truth for what HTML or CSS elements/constructs/syntax/variations/etc. are not allowed.

If there are things that are truly harmless (maybe like <br /> instead of <br>), perhaps that should be a WARNING rather than an ERROR. But anything on the blocklist should be an error.

@eshellman
Copy link
Collaborator Author

it looks like the open source paged.js https://pagedjs.org/ will be a nice path to generate high quality PDF from our HTML5 files. I had dinner with the developers on Friday - they use PG files for their demos!

@gbnewby
Copy link
Collaborator

gbnewby commented Oct 31, 2022 via email

@eshellman
Copy link
Collaborator Author

I also talked to the folks from Benetech (Bookshare) about accessibility - they were happy to hear about EPUB3. I was unable to say how well PG is doing with accessible alt attributes so I'm adding some logging to help us understand how much we're complying with guidelines.

@charliehoward4dp
Copy link

Now that DP has made audio files mandatory for all ebooks containing musical scores, please consider adding corresponding support for those files to EBM. All smartphones and many tablets support audio, and the music files typically play for only a few seconds, so they aren't particularly large.

The files will be in a "music" subfolder. In the examples I've seen so far, the links to them are simply <a href="music/xxx.mp3">Listen</a>.

Sometimes in the same folder there's also a corresponding .mxl (compressed MusicXML) file, which is the editable source used to compose the .mp3. When a similar link <a href="music/xxx.mxl">Download MusicXML</a> is clicked, a "Save as" dialog appears in a Browser. I don't know whether or not that should or could be supported by EBM as well.

@eshellman
Copy link
Collaborator Author

@charliehoward4dp take a look at the HTML5 audio element: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/audio I can imagine DP developing guidelines for use of this element - that would be a prerequisite for support in Ebookmaker. We would want to forbid autoplay, come up with size limits, etc.

@LMCantoni
Copy link

@eshellman Hi Eric, I'm the DP Music Coordinator. Thanks so much for considering Charlie's request regarding audio & MusicXML files. I've been following the Slack discussions, so I know that this is for the future, but it would be so great to have this capability. I certainly agree that autoplay should be forbidden. As for size limits, to give you an idea of the typical mp3 size, I'm post-processing a history of chamber music with many music snippets of a few bars each (mainly string quartets), and the average size is around 300-500K.

@eshellman
Copy link
Collaborator Author

@LMCantoni Super! Where is the best place to have the (somewhat technical, somewhat musical) discussions with people who can help?

@charliehoward4dp
Copy link

charliehoward4dp commented Dec 4, 2022

HERE at Dropbox https://www.dropbox.com/s/oldwqs5fcbsxtxy/audiotest.epub?dl=0 is a hand-modified epub3 containing a playable audio file via the <audio> tag. It works with ADE, Calibre, and Nook. It does not work with Kindle for IOS, iBooks, or Google Play Books. The underlying epub3 was generated by eBookMaker. I added the audio file and enabled the necessary HTML.

@LMCantoni
Copy link

@LMCantoni Super! Where is the best place to have the (somewhat technical, somewhat musical) discussions with people who can help?

Would it be worthwhile to have a Slack channel dedicated to the music issue?

@LMCantoni
Copy link

HERE at Dropbox https://www.dropbox.com/s/oldwqs5fcbsxtxy/audiotest.epub?dl=0 is a hand-modified epub3 containing a playable audio file via the <audio> tag. It works with ADE, Calibre, and Nook. It does not work with Kindle for IOS, iBooks, or Google Play Books. The underlying epub3 was generated by eBookMaker. I added the audio file and enabled the necessary HTML.

Thanks, Charlie. Not surprisingly, the audio didn't work with Kindle for Android or Kindle for PC.

@gbnewby
Copy link
Collaborator

gbnewby commented Jan 4, 2023

I'd like to see better presentation of multiple creators (authors etc.) in the head section.

Currently, labels are repeated. For example:

The non-generated file in https://www.gutenberg.org/files/69679/69679-h/69679-h.htm has:

Authors: Charles Francis Adams
Gilbert Nash
Charles Francis Adams III

The generated file in https://www.gutenberg.org/cache/epub/69679/pg69679-images.html has:
Author: Charles Francis Adams
Author: Charles Francis Adams III
Author: Gilbert Nash

The first way is much more visually appealing. Even better would be:
Authors: Charles Francis Adams, Gilbert Nash and Charles Francis Adams III

@gbnewby
Copy link
Collaborator

gbnewby commented Jan 4, 2023

Related to my previous comment: I think it's reasonable to only list the title on the first line of the HTML. I think we will do this for the workflow-provided items as well. This is because handling multiple authors and variants is often challenging.

So, instead of:
The Project Gutenberg eBook of Wessagusset and Weymouth, by Charles Francis Adams et al.

I'd be pleased with:
The Project Gutenberg eBook of Wessagusset and Weymouth

@gbnewby
Copy link
Collaborator

gbnewby commented Jan 4, 2023

I'd like the cover to be the very first thing people see, in all formats. (This is something I'm also working with the production team on, via the Workflow system). This is already done for the ereader formats. For example, see https://www.gutenberg.org/ebooks/69703 where the epub starts with the cover image, but the HTML (both native & generated) doesn't show the cover at all.

I'd prefer the cover to be the first thing people see. Even if it's a generated cover or boring cover. The existing header (metadata) & license blurb can appear afterwards.

@eshellman
Copy link
Collaborator Author

eshellman commented Jan 4, 2023

With regard to the repeated authors, what we actually have in the database is a list of creators, which includes illustrators, translators, editors, etc. Since the order of authors is usually presumed to be significant, we shouldn't want to change that in the generated list. but it seems the order may not be preserved in the db - will need to look at that.

Provided we can reconstitute the author order, it's probably easy to do "author1, author2, illustrator1 (ill), illustrator2(ill), translator1 (trl)".

Except.... the Creator names in our database are stored in the form "Lastname, Firstname, Suffix, (Parenthetical)" So there will inevitably be some reconstituted authorlists which are either mangled or ugly. The advantage of putting each creator on a separate line is that it will accurately reflect the contents of the database, which can be updated as need be.

@eshellman
Copy link
Collaborator Author

Incorporating the cover into the HTML5 presentation is a good idea. One thing to consider is that the cover is often used as a representation of the book on other websites so clicking the cover to see the... cover again might not be the best UI, especially on small screens. Of course, seeing the license blurb first is not optimal, either.

The difficult thing will be figuring out whether the cover is already there first - we definitely don't want duplicate covers in the html - that already occurs too often in the epub files.

@eshellman
Copy link
Collaborator Author

sample authors:
"Quintus, Smyrnaeus, active 4th century"
"Du Bois, W. E. B. (William Edward Burghardt)"
"Library of Congress. Copyright Office"
"John of Damascus, Saint"
"John Murray (Firm)"
"Caine, Hall, Sir"
"Plato (spurious and doubtful works)"

@gbnewby
Copy link
Collaborator

gbnewby commented Jan 4, 2023 via email

@eshellman
Copy link
Collaborator Author

luckily there are no author names that include ';'

@gbnewby
Copy link
Collaborator

gbnewby commented Apr 9, 2023

On the creator list (the few comments above this one): I looked in the code, and it seems we are only using the dc.authors field. I.e., generated HTML only includes the Authors, not the Illustrators, Editors or Translators.

I would like those other types of creators to be presented when they exist (there are some other creator types that could be of interest but seem less important.. it might be just as easy to list all creator types when they exist).

Simulated example building on the prior comments above:

Authors: John Murray (Firm); Du Bois, W. E. B. (William Edward Burghardt)
Illustrator: Caine, Hall, Sir
Editors: Plato; Socrates

... you get the idea? When there is more than one, use "Authors" instead of "Author."
When there are multiple, separate with a semicolon.
Only list Author/Illustrator/Editor/Translator when those field exist.

Thanks for considering. I was going to make a PR myself but the data structure for the creators (like dc.authors), and the pstyle() function for formatting, made it more challenging than just updating the iteration loops.

@eshellman
Copy link
Collaborator Author

dc.authors is a list of author objects, each of which has a name, a creator role, a birthdate and a deathdate. So editors, translators etc are in fact listed. The list is ordered so separating out the creators by roles may re-order the list of creators. We use the MARC list for roles. The importance of a role may depend on the type of work.

@gbnewby
Copy link
Collaborator

gbnewby commented Apr 10, 2023

Ok, then it sounds like this will require first aggregating each type of role then emitting them in groups like my example.

@eshellman
Copy link
Collaborator Author

It looks like this is already done - look at https://gutenberg.org/ebooks/830

@gbnewby
Copy link
Collaborator

gbnewby commented Apr 10, 2023

Indeed, the multiple roles seem to be listed. But they are not consolidated as in my example above.

See for example: https://gutenberg.org/cache/epub/27991/pg27991-images.html

The creator listing in the HTML5:
Author: Georgette Leblanc
Author: Maurice Maeterlinck
Editor: Frederick Orville Perkins
Translator: Alexander Teixeira de Mattos

But should be:
Authors: Georgette Leblanc; Maurice Maeterlinck
Editor: Frederick Orville Perkins
Translator: Alexander Teixeira de Mattos

@eshellman
Copy link
Collaborator Author

How important is consolidation? Consolidation adds considerable complexity to the code and makes it harder for downstream users to parse, or to translate, when that is desired. Currently we only need the singular form of the role name in english. Adding 's' fails for 2 of the roles in our db. We do not consolidate roles in the bibrec page, which would not be able to use the same consolidation code. Issues related to line wrapping are exacerbated.

@gbnewby
Copy link
Collaborator

gbnewby commented Apr 10, 2023 via email

@eshellman
Copy link
Collaborator Author

In particular, our own author parsing code (used by e.g. online ebookmaker). doesn't handle the suggested consolidated format correctly and would need to be revised. The format that ebookmaker expects (not my code) is comma delimited, not semicolon delimited. Yes, we need to be able to scrape our own files.

It's ironic that you use 27991 as an example, because the book has only one author, Georgette Leblanc (a.k.a. Madame Maurice Maeterlinck) https://en.wikipedia.org/wiki/Georgette_Leblanc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants