MobileRead Forums - View Single Post

chyron8472 · 02-12-2011, 01:22 AM

Quote:

Originally Posted by cheektocheek

What exactly i need to clean up from the file once its html?

It sounds as too much hassle just to fix 1 file and make it appear properly.

First, I suggest Sigil for editing. You can use it to edit HTML and non-DRM ePub books.

Second, It's not the Kindle's fault for some PDFs to not look exactly right sometimes. It's the PDF file format's fault. As quoted from the Calibre user manual on converting files:

Quote:

from http://calibre-ebook.com/user_manual...-pdf-documents
PDF documents are one of the worst formats to convert from. They are a fixed page size and text placement format. Meaning, it is very difficult to determine where one paragraph ends and another begins. Calibre will try to unwrap paragraphs using a configurable, Line Un-Wrapping Factor. [...]

Also, they often have headers and footers as part of the document that will become included with the text. Use the Search and Replace panel to remove headers and footers to mitigate this issue. If the headers and footers are not removed from the text it can throw off the paragraph unwrapping. [...]

Some limitations of PDF input (when converting in Calibre) are:

* Complex, multi-column, and image based documents are not supported.
* Extraction of vector images and tables from within the document is also not supported.
* Some PDFs use special glyphs to represent ll or ff or fi, etc. Conversion of these may or may not work depending on just how they are represented internally in the PDF.
* Some PDFs store their images upside down with a rotation instruction, calibre currently doesn’t support that instruction, so the images will be rotated in the output as well.
* Links and Tables of Contents are not supported

To re-iterate PDF is a really, really bad format to use as input. If you absolutely must use PDF, then be prepared for an output ranging anywhere from decent to unusable, depending on the input PDF.