Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 09-20-2009, 07:21 PM   #1
jackie_w
Wizard
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 2,880
Karma: 4200035
Join Date: Sep 2009
Location: UK
Device: Sony PRS-350, PB360, Kobo Glo/AuraHD/Aura6"/AuraH2O
Calibre PDF conversions - LRF/EPUB vs RTF

I have been experimenting with Calibre's PDF conversions with varying results.

The thing that struck me most is the difference in the success of the PDF conversion depending on the choice of output format. LRF, EPUB, LIT are much better than RTF. -- See my 2nd post for details.

For many books you would just read the LRF/EPUB once and move onto the next book. However, sometimes you have a favourite book you will want to read many times and you are prepared to put some effort into making it look as good as possible. In these cases being able to convert the PDF to something easily editable, like RTF, is a real bonus. You can then re-upload your finished labour-of-love to Calibre for posterity.

My question is this. Are the problems converting to RTF a Calibre defect or some limitation of the RTF format? Or could it be some conversion paramenter I haven't set correctly?

Whilst I'm on the subject, it would also be nice to have HTML as an output format which is easily editable. I know you can convert LIT to HTML with 3rd party products (with variable results) but doing everything in Calibre would be great. Are there any plans for this?

I'd also like to say that, despite anything I may have said in this post, Calibre's conversion of PDF (for the novels I've tried) is still way better than most of the other PDF-to-Word converter programs I've been able to try.

-- Jackie
jackie_w is offline   Reply With Quote
Old 09-20-2009, 07:29 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,329
Karma: 5382313
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
calibre's RTF output is rather new and could use improvement. Note that you can edit EPUB files using for example, Sigil a free EPUB editor.

You can also get the HTML from calibre's PDf conversion by using the debug settings, it will output the HTML to the specified directory.

Oh and calibre's PDF conversion is going to get even better
kovidgoyal is offline   Reply With Quote
Old 09-20-2009, 07:29 PM   #3
jackie_w
Wizard
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 2,880
Karma: 4200035
Join Date: Sep 2009
Location: UK
Device: Sony PRS-350, PB360, Kobo Glo/AuraHD/Aura6"/AuraH2O
(... continued )

For anyone who's interested in the detailed findings…

PDF source was created from MSWord using CutePDF, one of those pseudo-print utilities that direct your print file to a PDF.

I got the following results when converting to LRF, EPUB or LIT using Calibre v6.13
(Look&Feel fields, Input-char-encoding and Transliterate-unicode-to-ASCII, both set to blank):-

Very good:
smart quotes (double and single)
italics, bold, bold italics
m-dash, ellipsis
currency symbols (dollar, cents, GB Pound, Euro),
Western European accented chars (e-acute, cedilla etc) Sorry, no experience with Eastern European languages so didn't try them.
fractions

OK:
Small Caps (converted to standard caps)
Ordinals (converted to standard lowercase)
Subscript, superscript (converted to standard size in EPUB/LRF – sort-of-OK, looked better in LIT)

Not so good:
Strikethrough and underline (just the underlying standard text)
Graphics (appears in converted file but not in the right place)


Using exactly the same PDF input and Calibre conversion parameters, these were the results when converting to RTF:-

All of the following special chars were replaced with a question-mark (?)
smart quotes (double and single)
m-dash, ellipsis
currency symbols (except the dollar),
fractions
accented chars (e-acute, cedilla etc)

All the other things gave the same result as for LRF, EPUB, LIT.

I tried each of the following in the Look&Feel Input-char-encoding field but none of them made any improvements:-
UTF8, UTF-8, latin1, iso-8859-1, windows-1252, cp1252

Switching on the Look&Feel Transliterate-unicode-to-ASCII before converting to RTF, did improve things a lot:-

Smart quotes, double & single, became standard quotes, double & single.
- easy enough to edit back to smart quotes in MSWord if you want to put in the effort

m-dash became -- (again, easy to edit)

Currency symbols were converted to acceptable text equivalents, e.g euro became EU (although I would have thought that GBP might be better than PS for British pound sterling)

Fractions were converted to standard size chars e.g. 1/2 and 3/4

Accented chars were converted to their standard base chars, e.g e-acute became e
- not really a problem for me, as a Brit, but not so good for French, German etc

Regards, Jackie
jackie_w is offline   Reply With Quote
Old 09-20-2009, 07:51 PM   #4
jackie_w
Wizard
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 2,880
Karma: 4200035
Join Date: Sep 2009
Location: UK
Device: Sony PRS-350, PB360, Kobo Glo/AuraHD/Aura6"/AuraH2O
Quote:
Originally Posted by kovidgoyal View Post
You can also get the HTML from calibre's PDf conversion by using the debug settings, it will output the HTML to the specified directory.

Oh and calibre's PDF conversion is going to get even better
Kovid you always respond so unbelievably fast! Thank you.

I didn't know about the HTML output via debug. I shall have to give that a try.

Re: Sigil - I don't know anything about EPUB other than reading them, so I might need to psych myself up before looking at that.

I'm glad to hear you've got long-term plans for PDF conversions. in my brief experience, the paragraph reconstruction could use some refinement. I found some paragraphs were being combined for no apparent reason - although it wouldn't seriously affect the reading experience.

On a plus note, I'd just like to say that Calibre's recognition of italics was better than anything else I tried.
jackie_w is offline   Reply With Quote
Old 09-20-2009, 08:00 PM   #5
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,329
Karma: 5382313
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Just look at the debug section of the conversion options, it's pretty self explanantory
kovidgoyal is offline   Reply With Quote
Old 09-20-2009, 08:36 PM   #6
jackie_w
Wizard
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 2,880
Karma: 4200035
Join Date: Sep 2009
Location: UK
Device: Sony PRS-350, PB360, Kobo Glo/AuraHD/Aura6"/AuraH2O
Debug

Thanks, Kovid.

I've now tried it. It looks very promising. I need to give it a go with a 'proper' PDF and try to decide which of the 4 different versions of HTML (input, parsed, processed, structure) will be the best candidate for beautifying.



Do you think it's just me, or do you think that perhaps others may not know that DEBUG produces useable HTML? Perhaps an explicit reference in your Help Text in the DEBUG box may make it more obvious. After all there's plenty of room in there.

--Jackie
jackie_w is offline   Reply With Quote
Old 09-20-2009, 09:10 PM   #7
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,329
Karma: 5382313
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Sure, will be in next release.
kovidgoyal is offline   Reply With Quote
Old 09-21-2009, 08:36 PM   #8
jackie_w
Wizard
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 2,880
Karma: 4200035
Join Date: Sep 2009
Location: UK
Device: Sony PRS-350, PB360, Kobo Glo/AuraHD/Aura6"/AuraH2O
Follow-up

Kovid,

Following your advice yesterday, I have been experimenting with the various stages of HTML output from the convert-ebook debug option. I have tried 2 different PDFs and found something strange with both of them.

The Parsed, Processed and Structure HTML versions all had far too much italic and bold when viewed in my browser (Firefox).

I finally tracked the problem down to a few strange HTML tags, namely <b/> and/or <i/> which appeared at the Parsed stage and still remained at the Structure-stage. They were not present in the Input-stage HTML.

Both these tags caused problems with the text following. If viewed in the browser all the remaining text following an <i/> was italic. Similarly, all the text following a <b/> was bold. So by the time there had been one of each, the remaining text to the end of file was bold and italic.

The good news is that when I manually deleted the strange tags, all the text became correct again in the browser, i.e. no intended bold and italic were lost, so I was able to carry on experimenting.

Are these strange tags meant to be there?
jackie_w is offline   Reply With Quote
Old 09-21-2009, 11:02 PM   #9
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,329
Karma: 5382313
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
yeah they are removed in the final stage creation of the epub. Your browser is interpreting the files as HTML when they are really XHTML.

A <i/> tag is a self-closed italic tag which is the same as

<i></i>
kovidgoyal is offline   Reply With Quote
Old 09-22-2009, 08:25 AM   #10
jackie_w
Wizard
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 2,880
Karma: 4200035
Join Date: Sep 2009
Location: UK
Device: Sony PRS-350, PB360, Kobo Glo/AuraHD/Aura6"/AuraH2O
Quote:
Originally Posted by kovidgoyal View Post
yeah they are removed in the final stage creation of the epub.
Oops! Sorry for my ignorance of XHTML.

However, are you sure that they are removed. I've just had a closer look at the resulting EPUB.

An <i/> tag is located at page 21 (Chapter 1) - EPUB text stays italic until page 105 (beg Chapter 4).

Similarly a <b/> tag at page 1016 (Chapter 26) - EPUB text stays bold until page 1095 (Chapter 28)

For completeness, I created an LRF with all the same settings. The LRF has bold and italics in all the right places. The <i/> and <b/> tags do still exist in the debug HTML. Think I'll stick with reading LRFs for the time being.
jackie_w is offline   Reply With Quote
Old 09-22-2009, 12:27 PM   #11
jackie_w
Wizard
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 2,880
Karma: 4200035
Join Date: Sep 2009
Location: UK
Device: Sony PRS-350, PB360, Kobo Glo/AuraHD/Aura6"/AuraH2O
... and there's more ...

Since I wrote the above I've found out how to look at the HTML inside the EPUB and can see that it's split into pieces.

The <i/> tags have become <i class="calibre4"/> and
the <b/> tags have become <b class="calibre5"/>

So the formatting is correcting itself at the HTML split points.
jackie_w is offline   Reply With Quote
Old 09-22-2009, 02:17 PM   #12
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 26,329
Karma: 5382313
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I don't see this with my PDF files. Open a ticket and atatch one of your PDF files to it.
kovidgoyal is offline   Reply With Quote
Old 09-22-2009, 02:49 PM   #13
jackie_w
Wizard
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 2,880
Karma: 4200035
Join Date: Sep 2009
Location: UK
Device: Sony PRS-350, PB360, Kobo Glo/AuraHD/Aura6"/AuraH2O
Kovid, I'm happy to open a ticket but I can't really attach the PDF as it's a commercial one with the DRM removed (by someone who knew what they were doing). It's also large - about 7Mb.
Could you do your tests with a small selection from one or more of the debug stages and/or one of the epub sections? There were about 10 of these - only 3 with the problem tags.
If so perhaps you could specify a combo which would suffice and I'll do my best.

In the meantime I'll see if I can find a way to extract one of the PDF's problem pages without destroying the problem. Any suggestions for free software which will do this gratefully accepted.

-- jackie
jackie_w is offline   Reply With Quote
Old 09-22-2009, 03:12 PM   #14
jackie_w
Wizard
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 2,880
Karma: 4200035
Join Date: Sep 2009
Location: UK
Device: Sony PRS-350, PB360, Kobo Glo/AuraHD/Aura6"/AuraH2O
I've found a PDF Splitter so ignore the above. I should be able to add something useful to the ticket.
jackie_w is offline   Reply With Quote
Old 09-22-2009, 04:06 PM   #15
jackie_w
Wizard
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 2,880
Karma: 4200035
Join Date: Sep 2009
Location: UK
Device: Sony PRS-350, PB360, Kobo Glo/AuraHD/Aura6"/AuraH2O
Ticket #3564 created
jackie_w is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
LRFTools. Convert LRF to EPUB, HTML, PDF and RTF elinares LRF 279 07-31-2011 12:48 AM
Conversions from RTF (to mobi/epub) Gwen Morse Calibre 6 10-14-2010 07:00 AM
How to create non-embedded Unicode EPUB,LRF,TXT,RTF,PDF alexmobile Sony Reader 1 09-23-2009 11:04 PM
[Old Thread] unable to convert ebooks(rtf, txt,lit,html,pdf) to lrf in calibre .4.131 jackdeth191 Calibre 9 05-02-2009 03:55 AM
Rtf, LRF or epub ? edman Sony Reader 10 01-17-2009 01:13 AM


All times are GMT -4. The time now is 09:57 PM.


MobileRead.com is a privately owned, operated and funded community.