![]() |
#1 |
Karmaniac
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,553
Karma: 11499146
Join Date: Oct 2008
Location: Miami FL
Device: PRS-505, Jetbook, + Mini, +Color, Astak Ez Reader Pro, PPW1, Aura H2O
|
Quick help needed!! Which PDF creator for scanning books?
Hi!
![]() I need help quite quickly actually .. ![]() I'm moving next month most likely, and I wanted (wince I'm not working) to spend the whole day scanning old books and convert them to PDF. (so I can throw away some books I no longer need). I have a working scanner, now I need a tool that can work on the scans to convert them to text. Your help in suggesting software, is very much appreciated! I have 'till december, after which I hope to at least have scanned 20 books... (should be 1 per day on average). Anyways, already thanks in advance! and keep the ideas coming! ![]() |
![]() |
![]() |
![]() |
#2 | |
Holy S**T!!!
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,213
Karma: 108401
Join Date: Jun 2008
Location: San Diego, California!!
Device: Kindle and iPad
|
Quote:
My favorite program for scan to text is Adobe Acrobat. That said, a copy of Acrobat 8.0 comes with the Fujitsu ScanSnap scanner. I know you said you already have a working scanner, but you should take the time to look at the ScanSnap. First, because it does come bundled with Acrobat, and second, because it scans both sides of the page at one time. I've scanned several hundreds of pages with it in just an hour or two. It also automatically corrects any page tilt and eliminates blank pages. Granted, it's about $400, and I note that you said you were not working at present .... so I understand that could be an issue. But, even without the scanner, I would still recommend Acrobat (but Standard only .... you wouldn't need pro .... and version 8.0 would be fine for what you are doing, don't bother to get 9.0). |
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Karmaniac
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,553
Karma: 11499146
Join Date: Oct 2008
Location: Miami FL
Device: PRS-505, Jetbook, + Mini, +Color, Astak Ez Reader Pro, PPW1, Aura H2O
|
yeah, I searched and so far Adobe acrobat seems a bit cheaper than 400.
Would you recommend the older version (ver 6)? They sell it for $60 online! |
![]() |
![]() |
![]() |
#4 |
frumious Bandersnatch
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 7,546
Karma: 19001583
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
My advice is: by all means keep the scans as images, even if you also convert them to text. OCR is not perfect and it will have many errors, saving the images will allow you (or anyone else) to check the actual content and formatting in the original book. At least until the text is proofread proofread and converted to a well-formatted ebook.
|
![]() |
![]() |
![]() |
#5 | |
Holy S**T!!!
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,213
Karma: 108401
Join Date: Jun 2008
Location: San Diego, California!!
Device: Kindle and iPad
|
Quote:
I recommend the scanner simply because it is fast, excellent, and leaves you with a tilt corrected PDF that is immensely legible. I don't think I would go back as far as version 6.0. I wish I had seen your post before I gave away my copy of 8.0 standard. I'll give the guy I gave it to a call. If he's not going to use it .... I could mail it to you. |
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 988
Karma: 12653
Join Date: Apr 2008
Device: None of your business
|
If you do scan the books and trash the originals make sure you keep the covers as proof of ownership at the very least.
-MJ |
![]() |
![]() |
![]() |
#7 |
Karmaniac
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,553
Karma: 11499146
Join Date: Oct 2008
Location: Miami FL
Device: PRS-505, Jetbook, + Mini, +Color, Astak Ez Reader Pro, PPW1, Aura H2O
|
Thanks for all suggestions!
![]() Scanning goes fine, and I've tested 2 programs yet: Adobe Acrobat, and Scansoft Omnipage 15. Both work good, but adobe works faster with my scanner(I don't always need to reselect the color and resolution). A 100 page book takes about 30 minutes to scan. then about 5 minutes until it's converted. The only bad thing is, in Adobe there's no other way than to copy paste the text into word to check for spelling and scan errors. Also, I noted Adobe's ok for English, but sucks at any other language. But English books seem to progress better than Omnipage. So far I'd advise everyone Adobe out of the 2 programs. Also, Omnipage crashed a few times, and if you make an error, you need to start all over again, On Adobe you can just continue scanning, and switch pages, or delete some much easier! Only pitty my Dutch books don't get converted well at all! |
![]() |
![]() |
![]() |
#8 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,442
Karma: 300001
Join Date: Sep 2006
Location: Belgium
Device: PRS-500/505/700, Kindle, Cybook Gen3, Words Gear
|
ABBYY Finereader is the best OCR program (supports many languages too).
|
![]() |
![]() |
![]() |
#9 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 988
Karma: 12653
Join Date: Apr 2008
Device: None of your business
|
Me thinks you'd need a Dutch edition of the OCR software to do a good job on those... Just a thought...
![]() -MJ |
![]() |
![]() |
![]() |
#10 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,442
Karma: 300001
Join Date: Sep 2006
Location: Belgium
Device: PRS-500/505/700, Kindle, Cybook Gen3, Words Gear
|
Finereader includes Dutch language rules and spellcheck dictionaries.
http://finereader.abbyy.com/?param=137542 |
![]() |
![]() |
![]() |
#11 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,679
Karma: 145864619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
I would not go PDF ata ll. Once you do, getting it out of PDF is going to be a real hassle. Best bet is to scan it and once OCRed, load it into your word processor of choice and clean up/format from there. Then you can convert to whatever format you want. But you'll still have a good copy to work from should you someday want to change the format.
|
![]() |
![]() |
![]() |
#12 |
Holy S**T!!!
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,213
Karma: 108401
Join Date: Jun 2008
Location: San Diego, California!!
Device: Kindle and iPad
|
|
![]() |
![]() |
![]() |
#13 | |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
Quote:
Dale |
|
![]() |
![]() |
![]() |
#14 | |
Karmaniac
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,553
Karma: 11499146
Join Date: Oct 2008
Location: Miami FL
Device: PRS-505, Jetbook, + Mini, +Color, Astak Ez Reader Pro, PPW1, Aura H2O
|
Quote:
just copy paste the text. Adobe PDF can also save as HTML, but if you save it as PDF, it would take the scanned document and overlay it with a layer of invisible text. That way your text just looks like the scanned document, and you are able to copy paste the text out of there. Images can easily be copied, saved as a png or jpg file. On HTML I haven't tested it yet, but I think you'll be left with images and the OCR'ed text,which if you don't see the original scan, can be quite hard (if not impossible) to read. I also found it a pitty that OCR (nomatter which program you're using) needs at least 200DPI. I mean, most software (I'm using a trial here) cost $400. but it really needs about 300DPI to convert text normally? I mean,I can perfectly read text scanned in 100 or even 75DPI. So I don't really think the software is worth the $400. If it was able to convert text flawlessly from 75DPI I could think of paying little more than $80 for it, but definitely not 400. On 300DPI, a scanned A4 document looks like 4 screens of 1280x800, and actually uses up quite some space on the harddrive. And 300DPI is not that impressing to convert text from. It takes ages to scan a book in this resolution (the scanner scans slower on high (foto) resolutions). Just to give you an idea, I scanned a 150 page book with near to no pictures. It took 12MB in PDF. After conversion you can get that to 3MB in size, but the reader won't read those documents, only the PC does. This book in text format takes up around 800kb, and about the same for LRF with pictures & cover included! Last edited by ProDigit; 12-02-2008 at 02:01 PM. |
|
![]() |
![]() |
![]() |
#15 |
Evangelist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 416
Karma: 14682
Join Date: May 2008
Location: SF Bay Area
Device: Nook HD, Nook for Windows 8
|
My friends put significant work into image processing for scanning in A7, so you'll get much better results (clearer scans with smaller file size) with A7 or A8 than A6.
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Scanning in your own books | gazza | News | 125 | 01-24-2016 04:42 PM |
DR1000 Got a used DR1000S, quick set-up suggestions needed! | marvinhowru | iRex | 7 | 10-15-2010 10:50 AM |
Scanning books - New need help | Sporadic | Workshop | 9 | 04-19-2009 01:11 PM |
Scanning pages: how many dpi to convert to PDF? | Ammon | Workshop | 4 | 12-28-2008 03:16 PM |
Scanning books | Nate the great | Lounge | 10 | 11-04-2007 01:20 AM |