11-04-2010, 10:23 AM | #1 |
Member
Posts: 11
Karma: 10
Join Date: Nov 2010
Device: Nook
|
PDF to epub advice needed.
I know that pdf is not a good source for epub, however I only have access to the pdf. On my Nook, the pdf is very small and I have to strain my eyes in order to read it. So, I wanted to convert to epub, hoping to be able to use different fonts. When i try that, my viewer states hat there is only 3 pages, when in fact there are 424 pages. I can view the whole thing on my PC, but my Nook only will show the cover page.
So, I had an idea that i thought would work, but it isn't. I thought maybe I could convert the PDF to a Word doc first; and then convert to epub. When I try that, the only thing I get in a word doc is a graphical representation of the pdf, instead of editable text which I want. Any recommendations? If there are any good pdf to word convertors, please let me know. I tried 3 or 4 of them with the same result. |
11-04-2010, 10:42 AM | #2 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
You have images of pages, and no text. To get text you need to OCR the images. Adobe Acrobat can do it. Nothing can do it perfectly. You have a lot of work ahead of you to get a good result.
|
Advert | |
|
11-04-2010, 10:50 AM | #3 |
Evangelist
Posts: 473
Karma: 15000
Join Date: Jul 2008
Device: Various and sundry
|
I've tried many converters over the last several years. They all give very similar output. The problem seems to be inherent in the way info is stored in a PDF file, not with the quality of the converter. There is some info that is just not there. And it depends on whether or not the PDF was generated from a text file or contains images of the text (scanned).
|
11-04-2010, 12:11 PM | #4 |
Member
Posts: 11
Karma: 10
Join Date: Nov 2010
Device: Nook
|
Thanks for the replies. It seems it may be faster for me to type it from scratch. i did play with an OCR solution, and it wasn't good. It got most words correct, but it took forever to find all the little mistakes. And that was just 1 page. I literally could have typed the page faster than proofreading the converted page.
|
11-04-2010, 07:52 PM | #5 | |
quantum mechanic
Posts: 705
Karma: 483827
Join Date: Aug 2010
Location: NorCal
Device: Nook1, Samsung Transform, Nook2
|
Quote:
Anyway, I have one thing that may help you. I noticed (based on a suggestion by another MR member - I forget who ) that mobipocket creator is MUCH more intelligent at processing PDF files into html (when it creates a publication). It removes headers and footers and even hardcoded page numbers that are scanned in and appear as flating numbers. You can then use its raw html file (which, again, is extraordinarily well-formatted considering it's generated by a program) as the input for Calibre AFTER editing the html (in a plain text editor) and using regular expressions and the like on it directly. I cleaned up several old PDFs I had this way into remarkably clean ePUBs. Of course, the input PDF to mobicreator should be an OCR'd PDF (not page images). |
|
Advert | |
|
11-05-2010, 05:59 AM | #6 |
Color me gone
Posts: 2,089
Karma: 1445295
Join Date: Apr 2008
Location: Central Oregon Coast
Device: PRS-300
|
If it gets most of the words correct, then it is possible that the errors left may be repetitive. You may be able to take it into Word or Open Office to correct some types, save it out as html (filtered in the case of Word) then use a text editor to search and replace for others.
All that said, it can be a fair amount of work to clean up an OCRed document. In one I am working on now, many a lower and upper case R has become an e. Hard to search for these, but not so hard as a spell check or grammar check. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Cover advice needed! | NickSpalding | Writers' Corner | 4 | 04-25-2010 08:10 PM |
Help Needed for PDF to Epub Conversion | saurabh Morankar | ePub | 9 | 12-04-2009 05:10 PM |
Adobe ePub vs. MS LIT -- Some Advice Needed | rhadin | Sony Reader | 8 | 04-20-2009 06:52 PM |
Advice needed | jensen3112 | Which one should I buy? | 3 | 04-03-2006 08:50 AM |