v0.13 enhancement list #139

eshellman · 2022-10-21T21:22:25Z

leave comments on this issue for enhancements desired for the next major version of Ebookmaker.

Possibilities (subject to feasibility and community support)

support for inline SVG in submissions attributes appear to be downcased, causing validation errors #135
log use of unsupported elements/attributes as error/warning/critical
generate pdf as standard output implement pdf output with pagedjs #215
revive LaTeX as a source format
EPUB3 Audio Add HTML5/EPUB3 audio in 0.13 #214
better flexbox support
improve author list in generated header
add kepub output Consider kepub #169
reflow credits in text header reflow credits for .txt #220
make chunker more graceful large child elements of body cause chunker to emit empty chunks #224

gbnewby · 2022-10-23T16:33:53Z

Cover page handling: #132

eshellman · 2022-10-23T17:51:03Z

We may be able to address the problem in #132 with a css tweak; the difficulty is in testing rather than in the Ebookmaker code.

asylumcs · 2022-10-23T18:20:54Z

As far as testing for #132, the suggested code is the default cover image handling from Sigil (https://sigil-ebook.com/sigil/). We can benefit from the implied testing by Sigil users.

gbnewby · 2022-10-25T03:45:23Z

Generate errors for use of blocklisted items. If someone uses <wbr> for example, this should be an ERROR in ebm, so submitters can see in output.txt (via https://ebookmaker.pglaf.org) they cannot use that.

Over time the blocklist will shrink. The blocklist is in the DP Wiki, but ebm code is the canonical source of truth for what HTML or CSS elements/constructs/syntax/variations/etc. are not allowed.

If there are things that are truly harmless (maybe like <br /> instead of <br>), perhaps that should be a WARNING rather than an ERROR. But anything on the blocklist should be an error.

eshellman · 2022-10-31T16:12:46Z

it looks like the open source paged.js https://pagedjs.org/ will be a nice path to generate high quality PDF from our HTML5 files. I had dinner with the developers on Friday - they use PG files for their demos!

gbnewby · 2022-10-31T16:27:13Z

That does look promising. I didn't see the PG examples on their Examples page. It looks like they support use of some directives in HTML to influence page layout. That could be of interest to DPers. I like what I saw about image resizing.

…

On Mon, Oct 31, 2022 at 9:12 AM Eric Hellman ***@***.***> wrote: it looks like the open source paged.js https://pagedjs.org/ will be a nice path to generate high quality PDF from our HTML5 files. I had dinner with the developers on Friday - they use PG files for their demos! — Reply to this email directly, view it on GitHub <#139 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFQRDLQBNDUHQAOP3HLPZ23WF7VYTANCNFSM6AAAAAARLQVIMQ> . You are receiving this because you commented.Message ID: ***@***.***>

eshellman · 2022-10-31T16:28:15Z

I also talked to the folks from Benetech (Bookshare) about accessibility - they were happy to hear about EPUB3. I was unable to say how well PG is doing with accessible alt attributes so I'm adding some logging to help us understand how much we're complying with guidelines.

charliehoward4dp · 2022-11-07T20:02:45Z

Now that DP has made audio files mandatory for all ebooks containing musical scores, please consider adding corresponding support for those files to EBM. All smartphones and many tablets support audio, and the music files typically play for only a few seconds, so they aren't particularly large.

The files will be in a "music" subfolder. In the examples I've seen so far, the links to them are simply <a href="music/xxx.mp3">Listen</a>.

Sometimes in the same folder there's also a corresponding .mxl (compressed MusicXML) file, which is the editable source used to compose the .mp3. When a similar link <a href="music/xxx.mxl">Download MusicXML</a> is clicked, a "Save as" dialog appears in a Browser. I don't know whether or not that should or could be supported by EBM as well.

eshellman · 2022-11-08T14:55:35Z

@charliehoward4dp take a look at the HTML5 audio element: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/audio I can imagine DP developing guidelines for use of this element - that would be a prerequisite for support in Ebookmaker. We would want to forbid autoplay, come up with size limits, etc.

LMCantoni · 2022-12-02T16:30:40Z

@eshellman Hi Eric, I'm the DP Music Coordinator. Thanks so much for considering Charlie's request regarding audio & MusicXML files. I've been following the Slack discussions, so I know that this is for the future, but it would be so great to have this capability. I certainly agree that autoplay should be forbidden. As for size limits, to give you an idea of the typical mp3 size, I'm post-processing a history of chamber music with many music snippets of a few bars each (mainly string quartets), and the average size is around 300-500K.

eshellman · 2022-12-03T23:57:46Z

@LMCantoni Super! Where is the best place to have the (somewhat technical, somewhat musical) discussions with people who can help?

charliehoward4dp · 2022-12-04T01:08:23Z

HERE at Dropbox https://www.dropbox.com/s/oldwqs5fcbsxtxy/audiotest.epub?dl=0 is a hand-modified epub3 containing a playable audio file via the <audio> tag. It works with ADE, Calibre, and Nook. It does not work with Kindle for IOS, iBooks, or Google Play Books. The underlying epub3 was generated by eBookMaker. I added the audio file and enabled the necessary HTML.

LMCantoni · 2022-12-04T12:42:30Z

@LMCantoni Super! Where is the best place to have the (somewhat technical, somewhat musical) discussions with people who can help?

Would it be worthwhile to have a Slack channel dedicated to the music issue?

LMCantoni · 2022-12-04T13:03:14Z

HERE at Dropbox https://www.dropbox.com/s/oldwqs5fcbsxtxy/audiotest.epub?dl=0 is a hand-modified epub3 containing a playable audio file via the <audio> tag. It works with ADE, Calibre, and Nook. It does not work with Kindle for IOS, iBooks, or Google Play Books. The underlying epub3 was generated by eBookMaker. I added the audio file and enabled the necessary HTML.

Thanks, Charlie. Not surprisingly, the audio didn't work with Kindle for Android or Kindle for PC.

gbnewby · 2023-01-04T16:08:23Z

I'd like to see better presentation of multiple creators (authors etc.) in the head section.

Currently, labels are repeated. For example:

The non-generated file in https://www.gutenberg.org/files/69679/69679-h/69679-h.htm has:

Authors: Charles Francis Adams
Gilbert Nash
Charles Francis Adams III

The generated file in https://www.gutenberg.org/cache/epub/69679/pg69679-images.html has:
Author: Charles Francis Adams
Author: Charles Francis Adams III
Author: Gilbert Nash

The first way is much more visually appealing. Even better would be:
Authors: Charles Francis Adams, Gilbert Nash and Charles Francis Adams III

gbnewby · 2023-01-04T16:09:36Z

Related to my previous comment: I think it's reasonable to only list the title on the first line of the HTML. I think we will do this for the workflow-provided items as well. This is because handling multiple authors and variants is often challenging.

So, instead of:
The Project Gutenberg eBook of Wessagusset and Weymouth, by Charles Francis Adams et al.

I'd be pleased with:
The Project Gutenberg eBook of Wessagusset and Weymouth

gbnewby · 2023-01-04T16:12:58Z

I'd like the cover to be the very first thing people see, in all formats. (This is something I'm also working with the production team on, via the Workflow system). This is already done for the ereader formats. For example, see https://www.gutenberg.org/ebooks/69703 where the epub starts with the cover image, but the HTML (both native & generated) doesn't show the cover at all.

I'd prefer the cover to be the first thing people see. Even if it's a generated cover or boring cover. The existing header (metadata) & license blurb can appear afterwards.

eshellman · 2023-01-04T17:32:26Z

With regard to the repeated authors, what we actually have in the database is a list of creators, which includes illustrators, translators, editors, etc. Since the order of authors is usually presumed to be significant, we shouldn't want to change that in the generated list. but it seems the order may not be preserved in the db - will need to look at that.

Provided we can reconstitute the author order, it's probably easy to do "author1, author2, illustrator1 (ill), illustrator2(ill), translator1 (trl)".

Except.... the Creator names in our database are stored in the form "Lastname, Firstname, Suffix, (Parenthetical)" So there will inevitably be some reconstituted authorlists which are either mangled or ugly. The advantage of putting each creator on a separate line is that it will accurately reflect the contents of the database, which can be updated as need be.

eshellman · 2023-01-04T17:43:46Z

Incorporating the cover into the HTML5 presentation is a good idea. One thing to consider is that the cover is often used as a representation of the book on other websites so clicking the cover to see the... cover again might not be the best UI, especially on small screens. Of course, seeing the license blurb first is not optimal, either.

The difficult thing will be figuring out whether the cover is already there first - we definitely don't want duplicate covers in the html - that already occurs too often in the epub files.

eshellman · 2023-01-04T18:01:58Z

sample authors:
"Quintus, Smyrnaeus, active 4th century"
"Du Bois, W. E. B. (William Edward Burghardt)"
"Library of Congress. Copyright Office"
"John of Damascus, Saint"
"John Murray (Firm)"
"Caine, Hall, Sir"
"Plato (spurious and doubtful works)"

gbnewby · 2023-01-04T18:16:27Z

My suggestion was not to fix the weirdness in how authors are presented. The idea is just to not repeat the Author: tag. Separating by semicolon could work, for example: Authors: Du Bois, W. E. B. (William Edward Burghardt); John Murray (Firm); Library of Congress. Copyright Office

…

On Wed, Jan 4, 2023 at 11:02 AM Eric Hellman ***@***.***> wrote: sample authors: "Quintus, Smyrnaeus, active 4th century" "Du Bois, W. E. B. (William Edward Burghardt)" "Library of Congress. Copyright Office" "John of Damascus, Saint" "John Murray (Firm)" "Caine, Hall, Sir" "Plato (spurious and doubtful works)" — Reply to this email directly, view it on GitHub <#139 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFQRDLSMQVSVVX6IKGSGDHDWQW3KBANCNFSM6AAAAAARLQVIMQ> . You are receiving this because you commented.Message ID: ***@***.***>

eshellman · 2023-01-04T19:32:05Z

luckily there are no author names that include ';'

gbnewby · 2023-04-09T22:05:45Z

On the creator list (the few comments above this one): I looked in the code, and it seems we are only using the dc.authors field. I.e., generated HTML only includes the Authors, not the Illustrators, Editors or Translators.

I would like those other types of creators to be presented when they exist (there are some other creator types that could be of interest but seem less important.. it might be just as easy to list all creator types when they exist).

Simulated example building on the prior comments above:

Authors: John Murray (Firm); Du Bois, W. E. B. (William Edward Burghardt)
Illustrator: Caine, Hall, Sir
Editors: Plato; Socrates

... you get the idea? When there is more than one, use "Authors" instead of "Author."
When there are multiple, separate with a semicolon.
Only list Author/Illustrator/Editor/Translator when those field exist.

Thanks for considering. I was going to make a PR myself but the data structure for the creators (like dc.authors), and the pstyle() function for formatting, made it more challenging than just updating the iteration loops.

eshellman · 2023-04-10T15:04:33Z

dc.authors is a list of author objects, each of which has a name, a creator role, a birthdate and a deathdate. So editors, translators etc are in fact listed. The list is ordered so separating out the creators by roles may re-order the list of creators. We use the MARC list for roles. The importance of a role may depend on the type of work.

gbnewby · 2023-04-10T15:08:38Z

Ok, then it sounds like this will require first aggregating each type of role then emitting them in groups like my example.

eshellman · 2023-04-10T15:26:15Z

It looks like this is already done - look at https://gutenberg.org/ebooks/830

gbnewby · 2023-04-10T15:38:39Z

Indeed, the multiple roles seem to be listed. But they are not consolidated as in my example above.

See for example: https://gutenberg.org/cache/epub/27991/pg27991-images.html

The creator listing in the HTML5:
Author: Georgette Leblanc
Author: Maurice Maeterlinck
Editor: Frederick Orville Perkins
Translator: Alexander Teixeira de Mattos

But should be:
Authors: Georgette Leblanc; Maurice Maeterlinck
Editor: Frederick Orville Perkins
Translator: Alexander Teixeira de Mattos

eshellman · 2023-04-10T16:05:59Z

How important is consolidation? Consolidation adds considerable complexity to the code and makes it harder for downstream users to parse, or to translate, when that is desired. Currently we only need the singular form of the role name in english. Adding 's' fails for 2 of the roles in our db. We do not consolidate roles in the bibrec page, which would not be able to use the same consolidation code. Issues related to line wrapping are exacerbated.

gbnewby · 2023-04-10T23:12:13Z

I think consistency is extremely important, and that already has been part of our discussion. The format I suggested is what's been used for a very long time. The main variation is the representation of names in the database isn't always great for presenting as-is, but this issue exists regardless of consolidation. My suggested format should be straightforward algorithmically. We can discuss as desired to ensure challenges can be addressed. The bibrec section is a table, and the creators are presented as a hyperlink in that table. I don't mind consolidating those records, but don't see it as needed for consistency. Since it's tabular data, a single key-value pair makes sense to me - i.e., status quo. The book itself is targeted at human readers, and it seems obvious to me that consolidation is a friendlier way of presenting the creator roles. Anyone trying to scrape the books for metadata is going down a pathway that we don't support. But even if they did, I don't see how the recommended format is less amenable to automation than the current non-consolidated format. ~ Greg

…

On Mon, Apr 10, 2023 at 9:06 AM Eric Hellman ***@***.***> wrote: How important is consolidation? Consolidation adds considerable complexity to the code and makes it harder for downstream users to parse, or to translate, when that is desired. Currently we only need the singular form of the role name in english. Adding 's' fails for 2 of the roles in our db. We do not consolidate roles in the bibrec page, which would not be able to use the same consolidation code. Issues related to line wrapping are exacerbated. — Reply to this email directly, view it on GitHub <#139 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFQRDLVJBF5R7CX4UEW2QPLXAQVXHANCNFSM6AAAAAARLQVIMQ> . You are receiving this because you commented.Message ID: ***@***.***>

eshellman · 2023-04-11T02:10:28Z

In particular, our own author parsing code (used by e.g. online ebookmaker). doesn't handle the suggested consolidated format correctly and would need to be revised. The format that ebookmaker expects (not my code) is comma delimited, not semicolon delimited. Yes, we need to be able to scrape our own files.

It's ironic that you use 27991 as an example, because the book has only one author, Georgette Leblanc (a.k.a. Madame Maurice Maeterlinck) https://en.wikipedia.org/wiki/Georgette_Leblanc

eshellman mentioned this issue Oct 21, 2022

EPUB content.opf file not reflecting presence of an <svg> #136

Open

eshellman added the discussion label Dec 4, 2022

eshellman mentioned this issue Jan 6, 2023

Generated HTML5 inadequately protects h2 in header #150

Closed

eshellman mentioned this issue Apr 11, 2023

Treatment of multiple authors #181

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.13 enhancement list #139

v0.13 enhancement list #139

eshellman commented Oct 21, 2022 •

edited

Loading

gbnewby commented Oct 23, 2022

eshellman commented Oct 23, 2022

asylumcs commented Oct 23, 2022

gbnewby commented Oct 25, 2022 •

edited

Loading

eshellman commented Oct 31, 2022

gbnewby commented Oct 31, 2022 via email

eshellman commented Oct 31, 2022

charliehoward4dp commented Nov 7, 2022

eshellman commented Nov 8, 2022

LMCantoni commented Dec 2, 2022

eshellman commented Dec 3, 2022

charliehoward4dp commented Dec 4, 2022 •

edited

Loading

LMCantoni commented Dec 4, 2022

LMCantoni commented Dec 4, 2022

gbnewby commented Jan 4, 2023

gbnewby commented Jan 4, 2023

gbnewby commented Jan 4, 2023

eshellman commented Jan 4, 2023 •

edited

Loading

eshellman commented Jan 4, 2023

eshellman commented Jan 4, 2023

gbnewby commented Jan 4, 2023 via email

eshellman commented Jan 4, 2023

gbnewby commented Apr 9, 2023

eshellman commented Apr 10, 2023

gbnewby commented Apr 10, 2023

eshellman commented Apr 10, 2023

gbnewby commented Apr 10, 2023

eshellman commented Apr 10, 2023

gbnewby commented Apr 10, 2023 via email

eshellman commented Apr 11, 2023

v0.13 enhancement list #139

v0.13 enhancement list #139

Comments

eshellman commented Oct 21, 2022 • edited Loading

gbnewby commented Oct 23, 2022

eshellman commented Oct 23, 2022

asylumcs commented Oct 23, 2022

gbnewby commented Oct 25, 2022 • edited Loading

eshellman commented Oct 31, 2022

gbnewby commented Oct 31, 2022 via email

eshellman commented Oct 31, 2022

charliehoward4dp commented Nov 7, 2022

eshellman commented Nov 8, 2022

LMCantoni commented Dec 2, 2022

eshellman commented Dec 3, 2022

charliehoward4dp commented Dec 4, 2022 • edited Loading

LMCantoni commented Dec 4, 2022

LMCantoni commented Dec 4, 2022

gbnewby commented Jan 4, 2023

gbnewby commented Jan 4, 2023

gbnewby commented Jan 4, 2023

eshellman commented Jan 4, 2023 • edited Loading

eshellman commented Jan 4, 2023

eshellman commented Jan 4, 2023

gbnewby commented Jan 4, 2023 via email

eshellman commented Jan 4, 2023

gbnewby commented Apr 9, 2023

eshellman commented Apr 10, 2023

gbnewby commented Apr 10, 2023

eshellman commented Apr 10, 2023

gbnewby commented Apr 10, 2023

eshellman commented Apr 10, 2023

gbnewby commented Apr 10, 2023 via email

eshellman commented Apr 11, 2023

eshellman commented Oct 21, 2022 •

edited

Loading

gbnewby commented Oct 25, 2022 •

edited

Loading

charliehoward4dp commented Dec 4, 2022 •

edited

Loading

eshellman commented Jan 4, 2023 •

edited

Loading