MobileRead Forums - View Single Post - pdf to epub

ldolse · 04-25-2011, 12:47 AM

Ah, well if that's the direction open up the epub with a zip utility and look inside. The most likely reason is also in the FAQ, but it's not just not explicitly describing the symptoms the same way you are:

Quote:

My pdf converted, but it doesn't contain any text, or the text is all garbled

Many pdfs are actually made up of many images of scanned books, one image for each page. Many of these types of pdfs use hidden OCR (optical character recognition - i.e. machine reading) text underneath the images, but not all of them do. When there is no OCR text at all, you will often get a conversion that has no text, or is made up only of images. If the pdf uses hidden OCR text, in most cases no editing was done to the OCR, and depending on the text quality and OCR engine the resulting text can be quite awful. There isn't anything you can do with a pdf like this in Calibre. Your best bet is to use real OCR software like ABBYY Finereader or Acrobat Professional to convert the document. There are also open source OCR projects such as Tesseract and OCRopus.

The epub would be larger because the images probably get converted to a format with less compression/greater bit depth.