Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 11-28-2008, 08:22 PM   #1
ProDigit
Karmaniac
ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.
 
Posts: 2,553
Karma: 11499146
Join Date: Oct 2008
Location: Miami FL
Device: PRS-505, Jetbook, + Mini, +Color, Astak Ez Reader Pro, PPW1, Aura H2O
Quick help needed!! Which PDF creator for scanning books?

Hi!

I need help quite quickly actually ..

I'm moving next month most likely, and I wanted (wince I'm not working) to spend the whole day scanning old books and convert them to PDF. (so I can throw away some books I no longer need).

I have a working scanner, now I need a tool that can work on the scans to convert them to text.

Your help in suggesting software, is very much appreciated!
I have 'till december, after which I hope to at least have scanned 20 books... (should be 1 per day on average).

Anyways, already thanks in advance! and keep the ideas coming!
ProDigit is offline   Reply With Quote
Old 11-28-2008, 08:38 PM   #2
RickyMaveety
Holy S**T!!!
RickyMaveety lived happily ever after.RickyMaveety lived happily ever after.RickyMaveety lived happily ever after.RickyMaveety lived happily ever after.RickyMaveety lived happily ever after.RickyMaveety lived happily ever after.RickyMaveety lived happily ever after.RickyMaveety lived happily ever after.RickyMaveety lived happily ever after.RickyMaveety lived happily ever after.RickyMaveety lived happily ever after.
 
RickyMaveety's Avatar
 
Posts: 5,213
Karma: 108401
Join Date: Jun 2008
Location: San Diego, California!!
Device: Kindle and iPad
Quote:
Originally Posted by ProDigit View Post
Hi!

I need help quite quickly actually ..

I'm moving next month most likely, and I wanted (wince I'm not working) to spend the whole day scanning old books and convert them to PDF. (so I can throw away some books I no longer need).

I have a working scanner, now I need a tool that can work on the scans to convert them to text.

Your help in suggesting software, is very much appreciated!
I have 'till december, after which I hope to at least have scanned 20 books... (should be 1 per day on average).

Anyways, already thanks in advance! and keep the ideas coming!
What I would recommend depends on whether you plan to cut the books up or not. Since you are talking about throwing them away, I'm thinking you do plan to cut them up, and that is probably the best way to get a good scan.

My favorite program for scan to text is Adobe Acrobat. That said, a copy of Acrobat 8.0 comes with the Fujitsu ScanSnap scanner. I know you said you already have a working scanner, but you should take the time to look at the ScanSnap. First, because it does come bundled with Acrobat, and second, because it scans both sides of the page at one time. I've scanned several hundreds of pages with it in just an hour or two.

It also automatically corrects any page tilt and eliminates blank pages. Granted, it's about $400, and I note that you said you were not working at present .... so I understand that could be an issue.

But, even without the scanner, I would still recommend Acrobat (but Standard only .... you wouldn't need pro .... and version 8.0 would be fine for what you are doing, don't bother to get 9.0).
RickyMaveety is offline   Reply With Quote
Advert
Old 11-28-2008, 09:04 PM   #3
ProDigit
Karmaniac
ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.
 
Posts: 2,553
Karma: 11499146
Join Date: Oct 2008
Location: Miami FL
Device: PRS-505, Jetbook, + Mini, +Color, Astak Ez Reader Pro, PPW1, Aura H2O
yeah, I searched and so far Adobe acrobat seems a bit cheaper than 400.
Would you recommend the older version (ver 6)?
They sell it for $60 online!
ProDigit is offline   Reply With Quote
Old 11-29-2008, 03:50 AM   #4
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,546
Karma: 19001583
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
My advice is: by all means keep the scans as images, even if you also convert them to text. OCR is not perfect and it will have many errors, saving the images will allow you (or anyone else) to check the actual content and formatting in the original book. At least until the text is proofread proofread and converted to a well-formatted ebook.
Jellby is offline   Reply With Quote
Old 11-29-2008, 12:37 PM   #5
RickyMaveety
Holy S**T!!!
RickyMaveety lived happily ever after.RickyMaveety lived happily ever after.RickyMaveety lived happily ever after.RickyMaveety lived happily ever after.RickyMaveety lived happily ever after.RickyMaveety lived happily ever after.RickyMaveety lived happily ever after.RickyMaveety lived happily ever after.RickyMaveety lived happily ever after.RickyMaveety lived happily ever after.RickyMaveety lived happily ever after.
 
RickyMaveety's Avatar
 
Posts: 5,213
Karma: 108401
Join Date: Jun 2008
Location: San Diego, California!!
Device: Kindle and iPad
Quote:
Originally Posted by ProDigit View Post
yeah, I searched and so far Adobe acrobat seems a bit cheaper than 400.
Would you recommend the older version (ver 6)?
They sell it for $60 online!
I would not get the scanner simply to get the Acrobat that is bundled with it. Good grief no.

I recommend the scanner simply because it is fast, excellent, and leaves you with a tilt corrected PDF that is immensely legible.

I don't think I would go back as far as version 6.0. I wish I had seen your post before I gave away my copy of 8.0 standard. I'll give the guy I gave it to a call. If he's not going to use it .... I could mail it to you.
RickyMaveety is offline   Reply With Quote
Advert
Old 11-29-2008, 12:45 PM   #6
mjh215
Guru
mjh215 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermjh215 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermjh215 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermjh215 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermjh215 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermjh215 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermjh215 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermjh215 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermjh215 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermjh215 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermjh215 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
Posts: 988
Karma: 12653
Join Date: Apr 2008
Device: None of your business
If you do scan the books and trash the originals make sure you keep the covers as proof of ownership at the very least.

-MJ
mjh215 is offline   Reply With Quote
Old 11-29-2008, 06:02 PM   #7
ProDigit
Karmaniac
ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.
 
Posts: 2,553
Karma: 11499146
Join Date: Oct 2008
Location: Miami FL
Device: PRS-505, Jetbook, + Mini, +Color, Astak Ez Reader Pro, PPW1, Aura H2O
Thanks for all suggestions!

Scanning goes fine, and I've tested 2 programs yet:
Adobe Acrobat, and Scansoft Omnipage 15.
Both work good, but adobe works faster with my scanner(I don't always need to reselect the color and resolution).

A 100 page book takes about 30 minutes to scan.
then about 5 minutes until it's converted.
The only bad thing is, in Adobe there's no other way than to copy paste the text into word to check for spelling and scan errors.

Also, I noted Adobe's ok for English, but sucks at any other language.
But English books seem to progress better than Omnipage.

So far I'd advise everyone Adobe out of the 2 programs.
Also, Omnipage crashed a few times, and if you make an error, you need to start all over again,
On Adobe you can just continue scanning, and switch pages, or delete some much easier!

Only pitty my Dutch books don't get converted well at all!
ProDigit is offline   Reply With Quote
Old 11-29-2008, 06:08 PM   #8
igorsk
Wizard
igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.
 
Posts: 3,442
Karma: 300001
Join Date: Sep 2006
Location: Belgium
Device: PRS-500/505/700, Kindle, Cybook Gen3, Words Gear
ABBYY Finereader is the best OCR program (supports many languages too).
igorsk is offline   Reply With Quote
Old 11-29-2008, 06:09 PM   #9
mjh215
Guru
mjh215 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermjh215 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermjh215 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermjh215 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermjh215 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermjh215 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermjh215 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermjh215 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermjh215 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermjh215 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentametermjh215 can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
Posts: 988
Karma: 12653
Join Date: Apr 2008
Device: None of your business
Me thinks you'd need a Dutch edition of the OCR software to do a good job on those... Just a thought...


-MJ
mjh215 is offline   Reply With Quote
Old 11-29-2008, 06:17 PM   #10
igorsk
Wizard
igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.igorsk ought to be getting tired of karma fortunes by now.
 
Posts: 3,442
Karma: 300001
Join Date: Sep 2006
Location: Belgium
Device: PRS-500/505/700, Kindle, Cybook Gen3, Words Gear
Finereader includes Dutch language rules and spellcheck dictionaries.
http://finereader.abbyy.com/?param=137542
igorsk is offline   Reply With Quote
Old 11-29-2008, 06:47 PM   #11
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 79,679
Karma: 145864619
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
I would not go PDF ata ll. Once you do, getting it out of PDF is going to be a real hassle. Best bet is to scan it and once OCRed, load it into your word processor of choice and clean up/format from there. Then you can convert to whatever format you want. But you'll still have a good copy to work from should you someday want to change the format.
JSWolf is online now   Reply With Quote
Old 11-29-2008, 07:06 PM   #12
RickyMaveety
Holy S**T!!!
RickyMaveety lived happily ever after.RickyMaveety lived happily ever after.RickyMaveety lived happily ever after.RickyMaveety lived happily ever after.RickyMaveety lived happily ever after.RickyMaveety lived happily ever after.RickyMaveety lived happily ever after.RickyMaveety lived happily ever after.RickyMaveety lived happily ever after.RickyMaveety lived happily ever after.RickyMaveety lived happily ever after.
 
RickyMaveety's Avatar
 
Posts: 5,213
Karma: 108401
Join Date: Jun 2008
Location: San Diego, California!!
Device: Kindle and iPad
Quote:
Originally Posted by igorsk View Post
ABBYY Finereader is the best OCR program (supports many languages too).
And, I believe that the Abbyy software comes with the Scansnap as well .... although it's not in the running for this project.
RickyMaveety is offline   Reply With Quote
Old 12-01-2008, 12:37 PM   #13
DaleDe
Grand Sorcerer
DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.DaleDe ought to be getting tired of karma fortunes by now.
 
DaleDe's Avatar
 
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
Quote:
Originally Posted by ProDigit View Post
Thanks for all suggestions!

Scanning goes fine, and I've tested 2 programs yet:
Adobe Acrobat, and Scansoft Omnipage 15.
Both work good, but adobe works faster with my scanner(I don't always need to reselect the color and resolution).

A 100 page book takes about 30 minutes to scan.
then about 5 minutes until it's converted.
The only bad thing is, in Adobe there's no other way than to copy paste the text into word to check for spelling and scan errors.

Also, I noted Adobe's ok for English, but sucks at any other language.
But English books seem to progress better than Omnipage.

So far I'd advise everyone Adobe out of the 2 programs.
Also, Omnipage crashed a few times, and if you make an error, you need to start all over again,
On Adobe you can just continue scanning, and switch pages, or delete some much easier!

Only pitty my Dutch books don't get converted well at all!
Adobe also makes a special scanner version of Acrobat that has better foreign language support I believe. Check OCR in the wiki for a list of several programs that can be considered.

Dale
DaleDe is offline   Reply With Quote
Old 12-02-2008, 01:57 PM   #14
ProDigit
Karmaniac
ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.ProDigit ought to be getting tired of karma fortunes by now.
 
Posts: 2,553
Karma: 11499146
Join Date: Oct 2008
Location: Miami FL
Device: PRS-505, Jetbook, + Mini, +Color, Astak Ez Reader Pro, PPW1, Aura H2O
Quote:
Originally Posted by JSWolf View Post
I would not go PDF ata ll. Once you do, getting it out of PDF is going to be a real hassle.
Not really,
just copy paste the text. Adobe PDF can also save as HTML, but if you save it as PDF, it would take the scanned document and overlay it with a layer of invisible text.
That way your text just looks like the scanned document, and you are able to copy paste the text out of there.

Images can easily be copied, saved as a png or jpg file.

On HTML I haven't tested it yet, but I think you'll be left with images and the OCR'ed text,which if you don't see the original scan, can be quite hard (if not impossible) to read.

I also found it a pitty that OCR (nomatter which program you're using) needs at least 200DPI.
I mean, most software (I'm using a trial here) cost $400. but it really needs about 300DPI to convert text normally?
I mean,I can perfectly read text scanned in 100 or even 75DPI.

So I don't really think the software is worth the $400.
If it was able to convert text flawlessly from 75DPI I could think of paying little more than $80 for it, but definitely not 400.
On 300DPI, a scanned A4 document looks like 4 screens of 1280x800, and actually uses up quite some space on the harddrive. And 300DPI is not that impressing to convert text from. It takes ages to scan a book in this resolution (the scanner scans slower on high (foto) resolutions).

Just to give you an idea, I scanned a 150 page book with near to no pictures.
It took 12MB in PDF.
After conversion you can get that to 3MB in size, but the reader won't read those documents, only the PC does.
This book in text format takes up around 800kb, and about the same for LRF with pictures & cover included!

Last edited by ProDigit; 12-02-2008 at 02:01 PM.
ProDigit is offline   Reply With Quote
Old 12-02-2008, 02:01 PM   #15
Jim Lester
Evangelist
Jim Lester is less competitive than you.Jim Lester is less competitive than you.Jim Lester is less competitive than you.Jim Lester is less competitive than you.Jim Lester is less competitive than you.Jim Lester is less competitive than you.Jim Lester is less competitive than you.Jim Lester is less competitive than you.Jim Lester is less competitive than you.Jim Lester is less competitive than you.Jim Lester is less competitive than you.
 
Jim Lester's Avatar
 
Posts: 416
Karma: 14682
Join Date: May 2008
Location: SF Bay Area
Device: Nook HD, Nook for Windows 8
Quote:
Originally Posted by ProDigit View Post
yeah, I searched and so far Adobe acrobat seems a bit cheaper than 400.
Would you recommend the older version (ver 6)?
They sell it for $60 online!
My friends put significant work into image processing for scanning in A7, so you'll get much better results (clearer scans with smaller file size) with A7 or A8 than A6.
Jim Lester is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Scanning in your own books gazza News 125 01-24-2016 04:42 PM
DR1000 Got a used DR1000S, quick set-up suggestions needed! marvinhowru iRex 7 10-15-2010 10:50 AM
Scanning books - New need help Sporadic Workshop 9 04-19-2009 01:11 PM
Scanning pages: how many dpi to convert to PDF? Ammon Workshop 4 12-28-2008 03:16 PM
Scanning books Nate the great Lounge 10 11-04-2007 01:20 AM


All times are GMT -4. The time now is 09:19 AM.


MobileRead.com is a privately owned, operated and funded community.