Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 11-04-2010, 10:23 AM   #1
jcleaver
Member
jcleaver began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Nov 2010
Device: Nook
PDF to epub advice needed.

I know that pdf is not a good source for epub, however I only have access to the pdf. On my Nook, the pdf is very small and I have to strain my eyes in order to read it. So, I wanted to convert to epub, hoping to be able to use different fonts. When i try that, my viewer states hat there is only 3 pages, when in fact there are 424 pages. I can view the whole thing on my PC, but my Nook only will show the cover page.

So, I had an idea that i thought would work, but it isn't. I thought maybe I could convert the PDF to a Word doc first; and then convert to epub. When I try that, the only thing I get in a word doc is a graphical representation of the pdf, instead of editable text which I want.

Any recommendations? If there are any good pdf to word convertors, please let me know. I tried 3 or 4 of them with the same result.
jcleaver is offline   Reply With Quote
Old 11-04-2010, 10:42 AM   #2
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by jcleaver View Post
Any recommendations? If there are any good pdf to word convertors, please let me know. I tried 3 or 4 of them with the same result.
You have images of pages, and no text. To get text you need to OCR the images. Adobe Acrobat can do it. Nothing can do it perfectly. You have a lot of work ahead of you to get a good result.
Starson17 is offline   Reply With Quote
Old 11-04-2010, 10:50 AM   #3
JMikeD
Evangelist
JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.JMikeD is as sexy as a twisted cruller doughtnut.
 
JMikeD's Avatar
 
Posts: 452
Karma: 15000
Join Date: Jul 2008
Device: Various and sundry
Quote:
Originally Posted by jcleaver View Post
Any recommendations? If there are any good pdf to word convertors, please let me know. I tried 3 or 4 of them with the same result.
I've tried many converters over the last several years. They all give very similar output. The problem seems to be inherent in the way info is stored in a PDF file, not with the quality of the converter. There is some info that is just not there. And it depends on whether or not the PDF was generated from a text file or contains images of the text (scanned).
JMikeD is offline   Reply With Quote
Old 11-04-2010, 12:11 PM   #4
jcleaver
Member
jcleaver began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Nov 2010
Device: Nook
Thanks for the replies. It seems it may be faster for me to type it from scratch. i did play with an OCR solution, and it wasn't good. It got most words correct, but it took forever to find all the little mistakes. And that was just 1 page. I literally could have typed the page faster than proofreading the converted page.
jcleaver is offline   Reply With Quote
Old 11-04-2010, 07:52 PM   #5
thrawn_aj
quantum mechanic
thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.thrawn_aj ought to be getting tired of karma fortunes by now.
 
thrawn_aj's Avatar
 
Posts: 705
Karma: 483827
Join Date: Aug 2010
Location: NorCal
Device: Nook1, Samsung Transform, Nook2
Quote:
Originally Posted by jcleaver View Post
Thanks for the replies. It seems it may be faster for me to type it from scratch. i did play with an OCR solution, and it wasn't good. It got most words correct, but it took forever to find all the little mistakes. And that was just 1 page. I literally could have typed the page faster than proofreading the converted page.
That's strange. I haven't done any scanning or OCR myself but I have seen files (usually PDF) that other people have OCR'd. Unless the source paper book was really crappy, most of the words come through alright and should require only minor editing/proofing subsequently.

Anyway, I have one thing that may help you. I noticed (based on a suggestion by another MR member - I forget who ) that mobipocket creator is MUCH more intelligent at processing PDF files into html (when it creates a publication). It removes headers and footers and even hardcoded page numbers that are scanned in and appear as flating numbers. You can then use its raw html file (which, again, is extraordinarily well-formatted considering it's generated by a program) as the input for Calibre AFTER editing the html (in a plain text editor) and using regular expressions and the like on it directly. I cleaned up several old PDFs I had this way into remarkably clean ePUBs.

Of course, the input PDF to mobicreator should be an OCR'd PDF (not page images).
thrawn_aj is offline   Reply With Quote
Old 11-05-2010, 05:59 AM   #6
mrmikel
Book Twiddler
mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.mrmikel ought to be getting tired of karma fortunes by now.
 
Posts: 1,909
Karma: 1405001
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
If it gets most of the words correct, then it is possible that the errors left may be repetitive. You may be able to take it into Word or Open Office to correct some types, save it out as html (filtered in the case of Word) then use a text editor to search and replace for others.

All that said, it can be a fair amount of work to clean up an OCRed document. In one I am working on now, many a lower and upper case R has become an e. Hard to search for these, but not so hard as a spell check or grammar check.
mrmikel is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Cover advice needed! NickSpalding Writers' Corner 4 04-25-2010 08:10 PM
Help Needed for PDF to Epub Conversion saurabh Morankar ePub 9 12-04-2009 05:10 PM
Adobe ePub vs. MS LIT -- Some Advice Needed rhadin Sony Reader 8 04-20-2009 06:52 PM
Advice needed jensen3112 Which one should I buy? 3 04-03-2006 08:50 AM


All times are GMT -4. The time now is 09:50 AM.


MobileRead.com is a privately owned, operated and funded community.