12-30-2008, 05:02 PM | #1 |
GuteBook/Mobi2IMP Creator
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
Creating multiple ebook formats from same source files!
I want to improve the way I make .imp ebooks from (rich) .html and up to now my MO has been to perfect what I would like displayed on my hardware reader ebook-wise (pardon the pun) and then make other ebook formats for uploading here. Other formats include .prc/.mobi (Mobipocket) and .lrf/.epub (Sony).
Currently, if I want to produce a .imp, .prc and .lrf (and in the future .epub) ebook for uploading here, then I will "nail it" using eBook Publisher for the .imp ebook, then use a copy of the .opf with Mobipocket Creator and finally a command-line Calibre program, either html2lrf.exe or opf2html.exe. However, I see from using the (software) Mobipocket Reader and Calibre lrfviewer/ebook-viewer that my other ebook formats suffer from shortcomings of my .html source files used. As an example, I converted a PG offering entitled Little Stories for Little Children by Anonymous using the HTML .zip (22896-h.zip). I produced a .imp/.epub (using a beta eBook Publisher), .prc (using Mobipocket Creator) and .lrf/.epub (using Calibre v0.4.121). I attach as Little Stories for Little Children.zip my source files as revised by me including the original 22896-h.htm as well as a simple "diffs" .txt file to see what I changed. Now to the problems:
I would appreciate any comments from those that prepare .prc/.mobi ebooks from .html on how they would change my source .html files to better make a Mobipocket ebook. The same goes for those who make Sony ebooks from .html. In the end, I hope we can standardize the creation of ebooks from .html so that the best possible ebooks can be created from a single (multi-purpose) source. Please upload any better Mobipocket or Sony (or eBookwise) ebook you can make using my source below (please upload source/diffs as well). Any thoughts in this regard? Last edited by nrapallo; 12-30-2008 at 06:18 PM. Reason: typo |
01-04-2009, 12:25 AM | #2 |
GuteBook/Mobi2IMP Creator
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
Perhaps not too many people use .html as a source; BD seems so popular amongst the veteran uploaders here.
For those that do work with .html as a source, be sure to read the Project Gutenberg guidelines on producing their .html ebooks here. Especially useful for those old (the format not the ebook creators ) .txt die-hards is Section H.13. How can I make a HTML version from my plain text file? therein! |
Advert | |
|
01-04-2009, 09:21 AM | #3 | |
The Grand Mouse 高貴的老鼠
Posts: 72,159
Karma: 308792702
Join Date: Jul 2007
Location: Norfolk, England
Device: Kindle Voyage
|
I've taken a quick look at your source and compared it to the original to see what you've changed, and when you've left.
For the few Mobipocket ebooks I've created, I tend to work from HTML too. I'd advise simplifying the HTML even more - getting rid of the page numbers, for example, as Mobipocket doesn't seem to obey the invisible attribute. More specifically, in the CSS I'd eliminate any body text font size specification, and the justification of the body text, and any general margins on the body text. Those choices out to be left up to the reader through the software. For identifying links in the document, Mobipocket prefers the use of id rather than name. I always use page-break-before because I specify it for certain headings (e.g. chapter headings), which seems to make more sense than adding it to the last paragraph of a chapter. On a stylistic note, I prefer to specify no space between paragraphs (by setting margin-top and margin-bottom to 0em), but to have a text indent on the first line of each paragraph, except the first paragraph of each chapter. But that it a personal preference. I do like the idea of coming up with a 'generic' HTML format that works with the software to create multiple formats. I've avoided creating LRF or other files, because I can't test how they'll look. For Mobipocket, you need an extra item in the opf file in the metadata part of the manifest: <x-metadata><EmbeddedCover>images\illus-0001-1.jpg</EmbeddedCover></x-metadata> and also a guide item to the table of contents is a good idea, just before the end of the package <guide><reference type="toc" title="Table of Contents" href="Little%20Stories%20for%20Little%20Children.h tm%23contents"></reference></guide> Anyway, I attach a zip of my html,opf & images (for which I corrected the white point), along with my prc. Paul Quote:
Last edited by pdurrant; 07-20-2009 at 12:29 PM. Reason: yes, id not is |
|
01-04-2009, 09:51 AM | #4 |
Karmaniac
Posts: 2,553
Karma: 11499146
Join Date: Oct 2008
Location: Miami FL
Device: PRS-505, Jetbook, + Mini, +Color, Astak Ez Reader Pro, PPW1, Aura H2O
|
I generally work with MS Word,and save as a doc file; or occasionally as an HTML from there.
This is good enough for standard books. Books with a lot of references (like the bible, or encyclopedia) can better be edited in HTML; because HTML takes longer to edit and finalize; but you're more able to find formatting errors, or you're more able to custom tailor the HTML. BD generally strips HTML from all excessive info, appart from the body text, and pastes it in 'header1', 'header2', or paragraph text. So there's not really a reason why not to use a doc instead of html (for a normal book). I mean,there's really not that much HTML in a normal book. BD is not capable of calling a "header2" (<H2>) a subtitle. But generally Header1 will become a chapter title. The last thing I've learned BD recognizes is references and bookmarks. "a href" and "a name". MS Word creates a lot of overload on HTML files, in case you plan on creating the HTML from there, and it often takes a lot of pruning. Often you can save about 15-25% of space, just removing unnecessary data from MS Word HTML's. I prefer creating them in openoffice writer, since it tends to leave less of a mess behind. I've also been thinking of publishing the html sources, since apart from a Sony Reader, I have no device to compare my ebooks on,and generally only release the LRF file. Besides, probably like others on the forums, the (hand) creation of an LRF file already takes time enough, and probably there are many out there who won't mind sharing the original sources, for others to convert. If it where as simple as just running it through a convertor it would be ok, however, I see many books posted by people on the forum with very lousy formatting! I'm not talking about guys just starting out posting books,and not have it 100% together yet, like those that have some border issues, or font size issues. But those uploading files, almost as if a text file with a few added pictures was put into an LRF jacket and published. Sadly, some of the best uploaders, also have some of the worst formatting in their books. It may be because of automatic conversions. I mean: Titles are not aligned,and starting from mid-page, pagebreaks are missing, text has a lousy formatting (last word of a line always appears on next line), lettertype is just TOO BIG to comfortably read it in medium or large (on a Sony PRS-505; eg.nly 10-15 words fit a screen in Large view),etc... Maybe conversion tools have improved the last months, and faithfully can convert one format into another without loss of formatting quality.. After all, most of the formats are for 800x600 screen resolutions,so formatting, fonts and sizes should not differ much from one or the other ebook. I'd honestly prefer a lot of the files uploaded to this site, to be removed and reformatted by hand. Because apart from the text which you can read, the formatting as well as the covers of books are just done horrible on many books... I mean, one of the posting guidelines is to not post a book if it only took you a few minutes of work to create them. Then there would be no benefit in uploading the files,and you might as well just read the txt or html file on your reader directly downloaded from the Gutenberg (or siminar) website(s). Last edited by ProDigit; 01-04-2009 at 10:18 AM. |
01-04-2009, 11:05 AM | #5 |
Technogeezer
Posts: 7,233
Karma: 1601464
Join Date: Nov 2006
Location: Virginia, USA
Device: Sony PRS-500
|
I seldom use HTML to produce ebooks. I have used DB for a long time and I have had more problems with HTML sources than any other format. DOC or RTF seem to work better and even a well prepared TXT file seems more adapt than HTML. I use the BD TOC functions rather than importing any TOC from the outside.
That said, I reviewed the LRF output and there are sevral things I would have liked to see that I did not and several things I saw that I would liked to have not seen. While I did see page breaks for the main stories, I did not see page breaks for the TOC, title, and other front material. This produced a run-on situation with the title of the book split between the bottom of one page and the top of the next. I saw all of the original page references. Many of the PG HTML sources place these on the side away from the body of the text and that is fine. Given the narrow column width of most readers, this is not viable. Some PG HTML sources put them inline (as this LRF output was) and for long pages of text it is not too bad, here with very short pages it is a major intrusion. Some PG texts have even put the page number within a word when it is split over two pages. I wish you all the possible success with the project Nick. If anyone can pull it off, I believe you can. |
Advert | |
|
01-04-2009, 11:13 AM | #6 |
GuteBook/Mobi2IMP Creator
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
@pdurrant
Thanks for the feedback for creating Mobipocket ebooks from HTML. I gather there are two areas of concern, the underlying .html and .opf coding. I will look at your changes and see what I can use/compromise/discard when I re-make a .imp ebook from it. It will better help find that "common ground". @ProDigit HTML (even filtered HTML) from Word can be a "nightmare". I usually try running the Word HTML through TidyGUI and select the Configuration setting to handle that the source file is from Word 2000+. I then search and replace large spans of my default formatted paragraphs to strip the in-line style <p style="">. It does a decent job, but not as good as initially starting from .html. One trick I've used in the past with great sucess is when starting with a webpage or similar .html, copy the displayed text and paste it into a blank email created by MS Outlook Express 6. Then click the Source tab at the bottom of your email message and Select All and Copy that HTML 4.0 code into a new text file opened by your favorite text editor. Then save that file as your starting HTML base for the ebook. It avoids using Word, but requires some quick search and replaces to get rid of http:// references to images and requires you to manually copy images to the source directory. Just food for thought. Oh, by the way, any chance of showing me what you would/could change to make my above .lrf/.epub version better suited to your own preferences. Can you post a .lrf to see the results? with source? |
01-04-2009, 11:24 AM | #7 | ||||
GuteBook/Mobi2IMP Creator
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
|
Quote:
Quote:
Quote:
In the end, I do agree that they do not belong in the ebook version though. Quote:
|
||||
01-04-2009, 11:46 AM | #8 | |
Karmaniac
Posts: 2,553
Karma: 11499146
Join Date: Oct 2008
Location: Miami FL
Device: PRS-505, Jetbook, + Mini, +Color, Astak Ez Reader Pro, PPW1, Aura H2O
|
Quote:
Even if I'd remove the lines (<HR> in html),the TOC and Title/Author always appear on a separate page regardless. So I generally remove them, hoping to save some space. |
|
01-04-2009, 11:47 AM | #9 |
book creator
Posts: 9,656
Karma: 3856660
Join Date: Oct 2008
Location: Luxembourg
Device: Kindle Scribe
|
I always use Html files as source files myself. But I am coming from a different point of view as I do (or rather did) first and foremost Mobi books. And you are perfectly right: <pre> is not very well supported at all by mobi. I make my htmls as simple as possible, using mostly only header and paragraph texts. If I need special formatting, I use breaks, aligns, bigger and smaller fonts. These 3 work fine with all formats and keep it simple.
I avoid BD for the same reason Nick does. BD tends to play around with my perfectly good HTML code. I used to create a mobi file first and then LRF and IMP by importing that PRC file into Calibre and Mobi2IMP. I have changed that method somewhat now by using my HTML source file with Calibre and thus creating LRF and Epub, because epubs are easier and better to create that way. |
01-04-2009, 12:48 PM | #10 | |
Reticulator of Tharn
Posts: 618
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
|
Quote:
|
|
01-04-2009, 12:54 PM | #11 | |
zeldinha zippy zeldissima
Posts: 27,827
Karma: 921169
Join Date: Dec 2007
Location: Paris, France
Device: eb1150 & is that a nook in her pocket, or she just happy to see you?
|
Quote:
and as for imp support, it would be nice, but as long as we can use nick's mobi2imp, to me it is a secondary priority. |
|
01-04-2009, 01:31 PM | #12 | |
Reticulator of Tharn
Posts: 618
Karma: 400000
Join Date: Jan 2007
Location: EST
Device: Sony PRS-505
|
Quote:
The Mobipocket "container" support is pretty much all done, including a few features I don't think other open source Mobipocket generators have (UTF-8 encoded content and "uncrossable" boundaries for "non-linear" content like footnotes and the table of contents). Rendering HTML+CSS into Mobipocket mark-up is... going. Not to rag on Mobipocket unnecessarily, but the Mobipocket HTML rendering engine is more limited and quirkier than I would have thought possible . My basic strategy is to emulated full CSS-based rendering for what Mobipocket can support, and emulate CSS-less rendering for what it can't. So the idea is that authors would just write markup which degrades cleanly and not worry about what features are or are not supported. For example, Mobipocket doesn't support floated blocks, so for any floats I also ignore explicit CSS 'display's. The tricky bits are tables and lists. I was considering ignoring Mobipocket's built-in list support and just rendering them explicitly -- generating sequences based on 'list-style-type' etc. But for that I was trying to use Mobipocket's table support, which is reeeeally quirky. For example: if the specified width of a cell is too small to contain its content, then it just disappears -- poof!, isn't rendered at all. So anyway, it's getting there . |
|
01-04-2009, 01:34 PM | #13 |
creator of calibre
Posts: 44,327
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
An alternative is to render complex markup as images (see for example the --render-tables-as-images option in html2lrf)
|
01-04-2009, 01:37 PM | #14 |
frumious Bandersnatch
Posts: 7,531
Karma: 19000001
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
The biggest problem with mobipocket, format-wise, is that (as far as I know) it only allows to set vertical spacing above elements, so if you want to have some space below a chapter heading, or a piece of poetry, or an indented block... you have to set the space (with "height") in the element below that (a paragraph, a div, or whatever).
I create mobipocket books with html2mobi, from very simple HTML. Basically, I use only <P>, <DIV>, <I>, <B>, <Hx>, <A>, <IMG>... (sometimes <BR>, <SUP>, <HR>... if needed), and the only properties are "align", "height" (for vertical space) and "width" (for first-line paragraph indent (and "href" for <A>, "src" for <IMG>). The pagebreaks I add them with <mbp:pagebrak/>, and the guide items are defined in the <head>. If someone is interested in seeing any of my source files, just ask by PM. EDIT: Oh, and I encode everything in ASCII, so I write —, é, etc. (or rather let a program write them). Last edited by Jellby; 01-04-2009 at 01:40 PM. |
01-04-2009, 01:45 PM | #15 | ||
frumious Bandersnatch
Posts: 7,531
Karma: 19000001
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
Quote:
Quote:
Something a bit unrelated, is there a way (in CSS) to set the punctuation after the labels in a list, i.e., having "1:" or "1.-" instead of "1."? |
||
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
<Command Line> Add multiple books in multiple formats | himitsu | Calibre | 8 | 09-25-2010 11:07 PM |
Bug: entries with multiple formats trigger multiple conversions | flinx1 | Calibre | 12 | 05-21-2010 06:23 AM |
Error Converting Zip Files w/ Multiple Formats | TheHeartlessHero | Calibre | 2 | 04-10-2010 10:54 AM |
Process for creating several eBook formats from MS Word doc | jinlo | Workshop | 10 | 06-12-2009 11:05 AM |
Free eBook in multiple DRM-free formats | cmwilson | Deals and Resources (No Self-Promotion or Affiliate Links) | 46 | 05-20-2009 10:03 AM |