Quote:
Originally Posted by daredevil22r
The rason I ask about the pdf's is that I have quite a few books in that format that I would just hate having to pay for since I already own them. I don't mind converting them. It is that you are correct, it takes too much time and it doesn't seem to be worth it. I don't get images, just the text.
|
I just feed the pdf file through an OCR program "Readiris Pro for Hewlett Packard" that came with a cheap printer/scanner/fax combo from HP.
It can process pdf files containing bitmaps AND pdf files containing text. It deploys the same clever algorithms to identify columns of text, pictures, graphs, tables, paragraphs, ... with the "vector + font + text" pdf files as with bitmap files. The results are great.
Before i discovered that Readiris can process pdf files directly I ws creating bitmaps from pdf file by using GsView program and then I ran the resulting bitmaps through a VERY old version of Recognita OCR program (version 1.something! (FIFTEEN years old!)), that came with semi-proffesional scanner UMAX.
GsView is just a graphical front-end for Ghostscript.