Originally Posted by mtravellerh
As you're aware yourself, there are still some problems with illustration sizes in ePub. For a first version, this is superb work!
Thanks! You are right that there could be better support for cover image extraction and/or auto cover page generation. I do plan on incorporating these features in new releases (they're already reserved on the GUI options screen). For most cases, an option similar to calibre's "Remove first image from source file" should suffice, but I also might want to detect <img> tags with a src image with "cover" in it's name or alt text with same.
The Perl script could also allow one to use a generic cover page that is placed in, say, the install directory, for example. While this cover image would be external to the source .htm for .mobi ebooks, the other ebook formats would probable use a cover.htm page in addition before the source .htm. This way it' would be more compatible for all formats.
As an aside for Iphone/Ipod users: Use this to convert your Gutenberg books to ePub for Stanza. They will be perfectly formatted and you can autogenerate a cover. Best solution so far!
If I impose a (maximum) fixed width and height of 66% for 600x800 screens, that would mean that any large image would be reduced to 400x555 which would be acceptable to most ebook readers. I could use the "width=66%" or just use "width=400" and adjust the height after examining the actual image to determine it's image dimensions and aspect ratio. I'll experiment some more here...
Also missing, but a worthy addition is to autogenerate a Table of Contents ("TOC") and place it at the end. However, most PG HTML versions already have a "Contents" section, so I'll wait and see if there is demand for such a TOC feature.
Obviously, working with HTML as a starting point makes it easier to get all the "bells and whistles" we are used to seeing in hand-crafted ebooks created by those, like yourself, that do a marvellous job! In future, once the experimental nature of PG .mobi and .epub offerings become more standard, I can switch to using those as input instead of the HTML versions.
Working with .txt may require much more "polishing" by hand. Currently, GutenMark transforms any .txt only ebooks into acceptable .htm ebooks. I may incorporate this ability withing the Perl script using gut.pl
While GuteBook cannot be expected to properly detect and handle ALL PG quirks and idiosyncrasies, it makes a valiant attempt. I can improve the Perl script to "accomodate" any easily fixed quirk once it is made known which EText-No. PG ebook displays it. If you experience any formatting glitches, you can post your findings/fixes here and discuss/support their inclusion into future versions of GuteBook.
The squeaky wheel...