09-26-2012, 06:20 PM | #1 |
Junior Member
Posts: 2
Karma: 10
Join Date: Sep 2012
Device: iPad
|
Convert PDF to EPUB in Text not pictures.
How come when I convert my PDFs into EPUBs the pages turn into a full page picture? How do I convert it so that the words are text. I want to highlight the text. If it's converted into pictures I can't highlight anything. Anybody know? Thanks
|
09-27-2012, 03:34 AM | #2 |
Wizard
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
You need to perform OCR on the pdf first. This pdf only contains images and cannot be converted in a normal way. Actually, pdf conversion is almost never a good idea.
|
09-27-2012, 08:42 AM | #3 |
Connoisseur
Posts: 80
Karma: 1023042
Join Date: Nov 2011
Device: Kobo Touch, iPad
|
From my (little) experience, converting from PDF to ePub (and vice versa) is a PITA. Even with Caliber, stuff are getting funky, in a bad way.
Maybe you could copy/past all you PDF into Word and do a ePub from there? Building it by hand via Sigil would be longuer but more efficient, I think! Good luck! |
09-27-2012, 10:16 AM | #4 |
Junior Member
Posts: 2
Karma: 10
Join Date: Sep 2012
Device: iPad
|
Yea, but I can only find this file in PDF format. What's this OCR thing you mentioned and how to I apply it to my PDF file? Thanks
|
09-27-2012, 10:42 AM | #5 | |
Addict
Posts: 254
Karma: 69786
Join Date: May 2006
Location: Oslo, Norway
Device: Kobo Aura, Sony PRS-650
|
Quote:
No matter which software you use you will have to correct a great many misreads. Depending on the quality of your images, error percentages will usually vary from about 80%-98%. The software will also have to guess semantic layout, like what text is headings and whether a paragraph crosses a page boundary or is really *two* paragraphs (this is not always clear even to a human reader). As Ti-Ron states it is generally a lost cause to convert PDFs to a meaningful format. Read them as is, or prepare to do a significant amount of work after conversion if quality is important to you. Good luck! |
|
09-27-2012, 11:54 AM | #6 | |
Wizard
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
Quote:
|
|
10-03-2012, 10:58 AM | #7 |
Connoisseur
Posts: 53
Karma: 10
Join Date: Aug 2012
Location: Nashville, Tn
Device: ipad, Kindle Fire
|
plus its a bad idea to OCR because if its an unusual serif font it will interpret another character which will than spit out crap from OCR. OCR turns what is an image into searchable text. I have seen lately where people take an image based PDF and export them all out to images and write one html file with all the images. Yes this works, BUT its a crap way to making an epub and I hate it.
|
10-27-2014, 05:52 AM | #8 | |
Junior Member
Posts: 1
Karma: 10
Join Date: Oct 2014
Device: none
|
OCR technology
Quote:
_________________________________________________ Tags: pdf conversion; PDF converting project |
|
10-27-2014, 11:08 AM | #9 |
Wizard
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
No, almost all the 'free' or open source OCR programs use Tesseract. The quality is mediocre and not really useful for OCR of books. Most people I know use ABBYY. Around 6 years ago I used Omnipage a lot.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to convert pdf to an ebook with pictures for Kindle? | arash84 | Workshop | 0 | 08-12-2012 08:06 PM |
Pictures from PDF to EPub or LRF | eugentango | Recipes | 6 | 06-12-2012 11:10 PM |
Pdf to MOBI as pictures not as text | Rikkaruohimus | Conversion | 4 | 01-28-2012 08:54 AM |