View Full Version : Convert PDF to EPUB in Text not pictures.


looloo
09-26-2012, 06:20 PM
How come when I convert my PDFs into EPUBs the pages turn into a full page picture? How do I convert it so that the words are text. I want to highlight the text. If it's converted into pictures I can't highlight anything. Anybody know? Thanks

Toxaris
09-27-2012, 03:34 AM
You need to perform OCR on the pdf first. This pdf only contains images and cannot be converted in a normal way. Actually, pdf conversion is almost never a good idea.

Ti-Ron
09-27-2012, 08:42 AM
From my (little) experience, converting from PDF to ePub (and vice versa) is a PITA. Even with Caliber, stuff are getting funky, in a bad way.

Maybe you could copy/past all you PDF into Word and do a ePub from there?

Building it by hand via Sigil would be longuer but more efficient, I think!
Good luck! :)

looloo
09-27-2012, 10:16 AM
Yea, but I can only find this file in PDF format. What's this OCR thing you mentioned and how to I apply it to my PDF file? Thanks

Man Eating Duck
09-27-2012, 10:42 AM
Yea, but I can only find this file in PDF format. What's this OCR thing you mentioned and how to I apply it to my PDF file? ThanksOCR stands for Optical Character Recognition, and is basically performed by a program which "reads" the image and translates it to editable text. You need specialised software for this. The full version of Acrobat can do basic OCR which it then overlays in your document, and specialised tools like FineReader are more advanced and will give you some approximation to layout like tables, indents and the like. I'm not aware of any good free tools, my limited experience with them indicate that they have a way to go yet.

No matter which software you use you will have to correct a great many misreads. Depending on the quality of your images, error percentages will usually vary from about 80%-98%. The software will also have to guess semantic layout, like what text is headings and whether a paragraph crosses a page boundary or is really *two* paragraphs (this is not always clear even to a human reader).

As Ti-Ron states it is generally a lost cause to convert PDFs to a meaningful format. Read them as is, or prepare to do a significant amount of work after conversion if quality is important to you.

Good luck!

Toxaris
09-27-2012, 11:54 AM
From my (little) experience, converting from PDF to ePub (and vice versa) is a PITA. Even with Caliber, stuff are getting funky, in a bad way.

Maybe you could copy/past all you PDF into Word and do a ePub from there?

Building it by hand via Sigil would be longuer but more efficient, I think!
Good luck! :)

Doing it like that will not work, since the PDF is basically a collection of images in this case.

curiousgeorge
10-03-2012, 10:58 AM
plus its a bad idea to OCR because if its an unusual serif font it will interpret another character which will than spit out crap from OCR. OCR turns what is an image into searchable text. I have seen lately where people take an image based PDF and export them all out to images and write one html file with all the images. Yes this works, BUT its a crap way to making an epub and I hate it.