View Single Post
Old 01-23-2011, 11:25 AM   #1
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
python based pdf conversion tools

Hi Kovid, John

I noticed that calibre has quite a large set of python based pdf tools. Unfortunately PDF is a format I know little about. Given that non-DRM Topaz can be unpacked into an svg image of each page plus an html version of the page (imperfect since based on internal OCR), is it possible to use the python based pdf tools of calibre to create an image based pdf **with** text information to allow searching?

Right now, I can either use calibre on the imperfect html to get an ebook OR on the set of svg images which I can use calibre to convert to pdf. Both of these work but both lose something. What I would like to do is merge those two things to get a pdf that is image based but with search capabilities that is effectively a perfect copy of what is in the Topaz book.

Any ideas of what code to look at or even if it is possible to merge images based PDFs with info for text searching would be greatly appreciated.

Thanks,

KevinH
KevinH is offline   Reply With Quote