View Single Post
Old 06-08-2010, 06:03 AM   #21
orwell2k
Addict
orwell2k can extract oil from cheeseorwell2k can extract oil from cheeseorwell2k can extract oil from cheeseorwell2k can extract oil from cheeseorwell2k can extract oil from cheeseorwell2k can extract oil from cheeseorwell2k can extract oil from cheeseorwell2k can extract oil from cheeseorwell2k can extract oil from cheese
 
orwell2k's Avatar
 
Posts: 357
Karma: 1112
Join Date: Oct 2008
Location: Euroland
Device: PocketBook 360°, BeBook (Hanlin V3), iRex DR1000S, iPad
Quote:
Originally Posted by crangirl View Post
And if you want to convert a PDF.. Epub or FB2? Or leave it as a PDF?
Depends on the PDF:

(1) If it is formated for an eReader (e.g. smaller page size, like 15 x 10 cm pages rather than A4) it may be OK on the reader as is, assuming the pages have been created properly and not just shrunk to a smaller size (this means the font is also shrunk and will be difficult to read).

(2) With a 'normal' PDF with A4 size pages it is often better to convert, but conversion is not always easy.

Often the PDF has header and footer text (page numbers, book title, chapter, etc.). Often when you export from Adobe Acrobat these header/footer pieces of text get interspersed amongst the normal text, requiring a lot of clean-up.

What I have done in the past is to 'trim' the PDF - in Acrobat I set page cropping to remove the header/footer space, then save a new PDF then export from that to HTML in Acrobat. This often solves the header/footer problem. You need Acrobat Pro and I think it is under 'Tools > Print Production > Crop Pages'

I did notice recently that when I exported a PDF to HTML it seems to lose the header/footer info automatically, so maybe later versions of Acrobat do a better job?

Even once you solve the header/footer issue, if needed, the export often causes some problem - character recognition and styles. Sometimes things like italics are lost - perhaps not a big deal but I like to keep the intended styles.

Of more concern is character recognition - there are some common errors that seem to crop up in Adobe's export process (again, maybe later Acrobat versions do a better job?).

Often I have seen things like 'cl' interpreted as 'd' (so closed becomes dosed, which passes spell-checking), and 'fl' or 'if' or 'fi' often get seen as a single charcter of those letter - there must be some valid pairing in the character set or something - anyway, a global search/replace can fix it, but you have to know about it!

There are probably others, and some PDFs have more errors than others - I have had some clean PDFs almost error-free in conversion/export to HTML from Acrobat. Others are strewn with errors.

Once you have a clean source file (HTML, DOC, whatever) then it's your choice as to whether you go ePub (using something like Sigil) or FB2 (using BookDesigner, OOoFBTools, Any2FB2, or another FB2 conversion tool).
orwell2k is offline   Reply With Quote