|01-23-2011, 12:25 PM||#1|
Join Date: Nov 2009
python based pdf conversion tools
Hi Kovid, John
I noticed that calibre has quite a large set of python based pdf tools. Unfortunately PDF is a format I know little about. Given that non-DRM Topaz can be unpacked into an svg image of each page plus an html version of the page (imperfect since based on internal OCR), is it possible to use the python based pdf tools of calibre to create an image based pdf **with** text information to allow searching?
Right now, I can either use calibre on the imperfect html to get an ebook OR on the set of svg images which I can use calibre to convert to pdf. Both of these work but both lose something. What I would like to do is merge those two things to get a pdf that is image based but with search capabilities that is effectively a perfect copy of what is in the Topaz book.
Any ideas of what code to look at or even if it is possible to merge images based PDFs with info for text searching would be greatly appreciated.
|01-23-2011, 12:39 PM||#2|
creator of calibre
Join Date: Oct 2006
Location: Mumbai, India
I'm afraid I know no way to create an image based PDF backed by text using the libraries calibre contains.
|Thread Tools||Search this Thread|
|Thread||Thread Starter||Forum||Replies||Last Post|
|Import ebook conversion in python script||erollisi||Calibre||1||08-19-2010 10:43 PM|
|PHP based tools for hand-made epubs||aarcane||ePub||0||03-06-2010 01:08 AM|
|command-line conversion tools in 0.6.0?||WayneD||Calibre||3||06-03-2009 02:30 PM|
|Conversion Tools for OSX||chippewapub||Workshop||7||09-17-2007 10:29 AM|
|any EBK (culturecomm) conversion tools?||Unregistered||Workshop||5||09-07-2006 01:35 PM|