03-05-2009, 01:22 PM | #1 |
Junior Member
Posts: 6
Karma: 10
Join Date: Mar 2009
Device: kindle
|
Free/Shareware PDF converters with OCR capability?
I'm trying to convert a set of classic, illustrated children's books ([url=http://www.archive.org/details/merryadventureso00pylerich]Howard Pyle's books of Robin Hood, King Arthur, etc.) from public-domain .pdfs to ebooks I can read on my kindle.
Problem is, they're image-based PDFs, and heavy with illustration. Some pdf converters can't process them at all; some strip out all the illustrations and just convert the text; some convert every page into an image, which leaves the images excellent (well, apart from the "digitized by' watermarks on every page which I'd like to crop out) but makes the text too small to easily read. The only PDF converter I've found that seemed able to process them the way I'd like is ABBYY -- but that has a fifty-page limit on the trial version, which isn't enough for even one book, much less Pyle's collected works. So as best I can figure out, I need a pdf converter that can do OCR of text and will also leave in the various images. Anyone have any pointers? Thanks! |
03-11-2009, 04:28 PM | #2 |
Grand Sorcerer
Posts: 5,185
Karma: 25133758
Join Date: Nov 2008
Location: SF Bay Area, California, USA
Device: Pocketbook Touch HD3 (Past: Kobo Mini, PEZ, PRS-505, Clié)
|
I don't think there are any shareware or free converters that will do the careful inclusion of both text & graphics. A few of them try, but tend to botch it. (And I'm not sure what those are; I remember trying to work with them and giving up and going back to FineReader.)
Adobe's Capture Reviewer was another versatile OCR program--but it was also expensive, and FR is better in many ways. (Not all ways. Capture Reviewer lets you set fonts and kerning; FineReader is atrocious at that.) |
Advert | |
|
03-17-2009, 11:53 AM | #3 |
Technogeezer
Posts: 7,233
Karma: 1601464
Join Date: Nov 2006
Location: Virginia, USA
Device: Sony PRS-500
|
A lot of what you are doing has already been done by the folks at Project Gutenberg. Try here for a listing of the books they have already converted. The zipped HTML files contain the images if available. I have used PG as the basis for many books I have posted.
Flogiston has already converted and posted several of Pyle's books to LRF format for the Sony. Robin is posted here. Sadly, no PRC version for the Kindle. |
03-20-2009, 09:27 AM | #4 | |
Junior Member
Posts: 6
Karma: 10
Join Date: Mar 2009
Device: kindle
|
Quote:
The only problem I see with the Gutenberg versions is they don't include the little in-line text blurbs Pyle put on either side of the page describing the action -- "Robin meets a stranger on the bridge" or whatever -- and they only have one of the three King Arthur books uploaded. Still, though, I should be able to convert those HTML pages to kindle format fairly easily, so thanks, that is a lot of the work already done for me. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Need good PDF viewing and library capability | puglover333 | Which one should I buy? | 2 | 05-09-2010 04:02 PM |
Support your local Sherrif - Shareware free to copy ebooks | columbus | Reading Recommendations | 4 | 06-02-2009 10:11 PM |
ABC Amber Free E-Book Converters Updated | RWood | Deals and Resources (No Self-Promotion or Affiliate Links) | 8 | 08-29-2008 10:09 AM |
eInk, highlight capability, pdf native support? | rai | Which one should I buy? | 9 | 07-26-2007 06:25 AM |