MobileRead Forums - View Single Post

KevinH · 01-23-2011, 12:25 PM

Hi Kovid, John

I noticed that calibre has quite a large set of python based pdf tools. Unfortunately PDF is a format I know little about. Given that non-DRM Topaz can be unpacked into an svg image of each page plus an html version of the page (imperfect since based on internal OCR), is it possible to use the python based pdf tools of calibre to create an image based pdf **with** text information to allow searching?

Right now, I can either use calibre on the imperfect html to get an ebook OR on the set of svg images which I can use calibre to convert to pdf. Both of these work but both lose something. What I would like to do is merge those two things to get a pdf that is image based but with search capabilities that is effectively a perfect copy of what is in the Topaz book.

Any ideas of what code to look at or even if it is possible to merge images based PDFs with info for text searching would be greatly appreciated.

Thanks,

KevinH

01-23-2011, 12:25 PM	#1
KevinH Sigil Developer Posts: 9,165 Karma: 6565346 Join Date: Nov 2009 Device: many	python based pdf conversion tools Hi Kovid, John I noticed that calibre has quite a large set of python based pdf tools. Unfortunately PDF is a format I know little about. Given that non-DRM Topaz can be unpacked into an svg image of each page plus an html version of the page (imperfect since based on internal OCR), is it possible to use the python based pdf tools of calibre to create an image based pdf with text information to allow searching? Right now, I can either use calibre on the imperfect html to get an ebook OR on the set of svg images which I can use calibre to convert to pdf. Both of these work but both lose something. What I would like to do is merge those two things to get a pdf that is image based but with search capabilities that is effectively a perfect copy of what is in the Topaz book. Any ideas of what code to look at or even if it is possible to merge images based PDFs with info for text searching would be greatly appreciated. Thanks, KevinH