Quote:
Originally Posted by JSWolf
I would not go PDF ata ll. Once you do, getting it out of PDF is going to be a real hassle.
|
Not really,
just copy paste the text. Adobe PDF can also save as HTML, but if you save it as PDF, it would take the scanned document and overlay it with a layer of invisible text.
That way your text just looks like the scanned document, and you are able to copy paste the text out of there.
Images can easily be copied, saved as a png or jpg file.
On HTML I haven't tested it yet, but I think you'll be left with images and the OCR'ed text,which if you don't see the original scan, can be quite hard (if not impossible) to read.
I also found it a pitty that OCR (nomatter which program you're using) needs at least 200DPI.
I mean, most software (I'm using a trial here) cost $400. but it really needs about 300DPI to convert text normally?
I mean,I can perfectly read text scanned in 100 or even 75DPI.
So I don't really think the software is worth the $400.
If it was able to convert text flawlessly from 75DPI I could think of paying little more than $80 for it, but definitely not 400.
On 300DPI, a scanned A4 document looks like 4 screens of 1280x800, and actually uses up quite some space on the harddrive. And 300DPI is not that impressing to convert text from. It takes ages to scan a book in this resolution (the scanner scans slower on high (foto) resolutions).
Just to give you an idea, I scanned a 150 page book with near to no pictures.
It took 12MB in PDF.
After conversion you can get that to 3MB in size, but the reader won't read those documents, only the PC does.
This book in text format takes up around 800kb, and about the same for LRF with pictures & cover included!