View Single Post
Old 02-12-2019, 12:41 AM   #17
DNSB
Bibliophagist
DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.DNSB ought to be getting tired of karma fortunes by now.
 
DNSB's Avatar
 
Posts: 46,950
Karma: 169810634
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
Quote:
Originally Posted by Ghitulescu View Post
I am curious too why it does this way.
Yes, I have read the FAQ, the Sticky and some of the related threads, yet no clear answer why:

... so I have too a PDF coming out of scanner.
In Reader I can select the text, and it's rather correct. So the PDF file also contains text, not only images.
Yet, calibre outputs a bunch of images, one per page.

So, again, why calibre does not use the "hidden text"?
Yes, I know it's not best to use PDFs... but what to do when the only source is one of them?!
I have one commercially produced scan of a book from the 1870's* The images are not the greatest but they are what the original book looked like. The text plane is useful for searching but when I take a close look at it, it has a multitude of OCR errors which make it painful to read.

* The book was originally a two volume set which includes household tips (use sulphuric acid on your windows to prevent frost), photography including creating your own wet plates and much else. If I was stranded on a desert island, that is a set of books that would be handy though by today's standards, many of the items would be considered extreme safety hazards.
DNSB is offline   Reply With Quote