Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 09-26-2012, 06:20 PM   #1
looloo
Junior Member
looloo began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Sep 2012
Device: iPad
Convert PDF to EPUB in Text not pictures.

How come when I convert my PDFs into EPUBs the pages turn into a full page picture? How do I convert it so that the words are text. I want to highlight the text. If it's converted into pictures I can't highlight anything. Anybody know? Thanks
looloo is offline   Reply With Quote
Old 09-27-2012, 03:34 AM   #2
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
You need to perform OCR on the pdf first. This pdf only contains images and cannot be converted in a normal way. Actually, pdf conversion is almost never a good idea.
Toxaris is offline   Reply With Quote
Old 09-27-2012, 08:42 AM   #3
Ti-Ron
Connoisseur
Ti-Ron ought to be getting tired of karma fortunes by now.Ti-Ron ought to be getting tired of karma fortunes by now.Ti-Ron ought to be getting tired of karma fortunes by now.Ti-Ron ought to be getting tired of karma fortunes by now.Ti-Ron ought to be getting tired of karma fortunes by now.Ti-Ron ought to be getting tired of karma fortunes by now.Ti-Ron ought to be getting tired of karma fortunes by now.Ti-Ron ought to be getting tired of karma fortunes by now.Ti-Ron ought to be getting tired of karma fortunes by now.Ti-Ron ought to be getting tired of karma fortunes by now.Ti-Ron ought to be getting tired of karma fortunes by now.
 
Posts: 80
Karma: 1023042
Join Date: Nov 2011
Device: Kobo Touch, iPad
From my (little) experience, converting from PDF to ePub (and vice versa) is a PITA. Even with Caliber, stuff are getting funky, in a bad way.

Maybe you could copy/past all you PDF into Word and do a ePub from there?

Building it by hand via Sigil would be longuer but more efficient, I think!
Good luck!
Ti-Ron is offline   Reply With Quote
Old 09-27-2012, 10:16 AM   #4
looloo
Junior Member
looloo began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Sep 2012
Device: iPad
Yea, but I can only find this file in PDF format. What's this OCR thing you mentioned and how to I apply it to my PDF file? Thanks
looloo is offline   Reply With Quote
Old 09-27-2012, 10:42 AM   #5
Man Eating Duck
Addict
Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.Man Eating Duck juggles neatly with hedgehogs.
 
Posts: 254
Karma: 69786
Join Date: May 2006
Location: Oslo, Norway
Device: Kobo Aura, Sony PRS-650
Quote:
Originally Posted by looloo View Post
Yea, but I can only find this file in PDF format. What's this OCR thing you mentioned and how to I apply it to my PDF file? Thanks
OCR stands for Optical Character Recognition, and is basically performed by a program which "reads" the image and translates it to editable text. You need specialised software for this. The full version of Acrobat can do basic OCR which it then overlays in your document, and specialised tools like FineReader are more advanced and will give you some approximation to layout like tables, indents and the like. I'm not aware of any good free tools, my limited experience with them indicate that they have a way to go yet.

No matter which software you use you will have to correct a great many misreads. Depending on the quality of your images, error percentages will usually vary from about 80%-98%. The software will also have to guess semantic layout, like what text is headings and whether a paragraph crosses a page boundary or is really *two* paragraphs (this is not always clear even to a human reader).

As Ti-Ron states it is generally a lost cause to convert PDFs to a meaningful format. Read them as is, or prepare to do a significant amount of work after conversion if quality is important to you.

Good luck!
Man Eating Duck is offline   Reply With Quote
Old 09-27-2012, 11:54 AM   #6
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
Quote:
Originally Posted by Ti-Ron View Post
From my (little) experience, converting from PDF to ePub (and vice versa) is a PITA. Even with Caliber, stuff are getting funky, in a bad way.

Maybe you could copy/past all you PDF into Word and do a ePub from there?

Building it by hand via Sigil would be longuer but more efficient, I think!
Good luck!
Doing it like that will not work, since the PDF is basically a collection of images in this case.
Toxaris is offline   Reply With Quote
Old 10-03-2012, 10:58 AM   #7
curiousgeorge
Connoisseur
curiousgeorge began at the beginning.
 
Posts: 53
Karma: 10
Join Date: Aug 2012
Location: Nashville, Tn
Device: ipad, Kindle Fire
plus its a bad idea to OCR because if its an unusual serif font it will interpret another character which will than spit out crap from OCR. OCR turns what is an image into searchable text. I have seen lately where people take an image based PDF and export them all out to images and write one html file with all the images. Yes this works, BUT its a crap way to making an epub and I hate it.
curiousgeorge is offline   Reply With Quote
Old 10-27-2014, 05:52 AM   #8
mindylynn0
Junior Member
mindylynn0 began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Oct 2014
Device: none
OCR technology

Quote:
Originally Posted by Toxaris View Post
You need to perform OCR on the pdf first. This pdf only contains images and cannot be converted in a normal way. Actually, pdf conversion is almost never a good idea.
Hi, as for OCR tech, would you have any suggestion? Is there any open source besides Goolge Tesseract engine?

_________________________________________________
Tags: pdf conversion; PDF converting project
mindylynn0 is offline   Reply With Quote
Old 10-27-2014, 11:08 AM   #9
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
No, almost all the 'free' or open source OCR programs use Tesseract. The quality is mediocre and not really useful for OCR of books. Most people I know use ABBYY. Around 6 years ago I used Omnipage a lot.
Toxaris is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
How to convert pdf to an ebook with pictures for Kindle? arash84 Workshop 0 08-12-2012 08:06 PM
Pictures from PDF to EPub or LRF eugentango Recipes 6 06-12-2012 11:10 PM
Pdf to MOBI as pictures not as text Rikkaruohimus Conversion 4 01-28-2012 08:54 AM


All times are GMT -4. The time now is 06:10 PM.


MobileRead.com is a privately owned, operated and funded community.