|01-23-2011, 11:25 AM||#1|
Join Date: Nov 2009
python based pdf conversion tools
I noticed that calibre has quite a large set of python based pdf tools. Unfortunately PDF is a format I know little about. Given that non-DRM Topaz can be unpacked into an svg image of each page plus an html version of the page (imperfect since based on internal OCR), is it possible to use the python based pdf tools of calibre to create an image based pdf **with** text information to allow searching?
Right now, I can either use calibre on the imperfect html to get an ebook OR on the set of svg images which I can use calibre to convert to pdf. Both of these work but both lose something. What I would like to do is merge those two things to get a pdf that is image based but with search capabilities that is effectively a perfect copy of what is in the Topaz book.
Any ideas of what code to look at or even if it is possible to merge images based PDFs with info for text searching would be greatly appreciated.
|01-23-2011, 11:39 AM||#2|
Creator of calibre
Join Date: Oct 2006
Location: Mumbai, India
I'm afraid I know no way to create an image based PDF backed by text using the libraries calibre contains.
Notice to all: I can not provide assistance with DRM removal, for legal reasons, so please do not contact me about it.
|Thread Tools||Search this Thread|
|Thread||Thread Starter||Forum||Replies||Last Post|
|Import ebook conversion in python script||erollisi||Calibre||1||08-19-2010 09:43 PM|
|PHP based tools for hand-made epubs||aarcane||ePub||0||03-06-2010 12:08 AM|
|command-line conversion tools in 0.6.0?||WayneD||Calibre||3||06-03-2009 01:30 PM|
|Conversion Tools for OSX||chippewapub||Workshop||7||09-17-2007 09:29 AM|
|any EBK (culturecomm) conversion tools?||Unregistered||Workshop||5||09-07-2006 12:35 PM|