Quote:
Originally Posted by dhdurgee
That should do the trick. As long as it is clear how this works it can be dealt with.
On a totally separate subject, I know that your tool does OCR for the purpose of enabling searches to work on a scanned document. Is it possible now, or would an enhancement make sense, to offer an option to create a mobi or epub document with the OCR text and the graphics/images from the source as opposed to a PDF document? I am aware that no OCR is perfect, so there will certainly be errors in the output, but the ebook format will be a much more flexible fit to a dedicated reader than even the current output of the package.
This might be beyond your design goals for this tool, but I think there might be others beside myself interested in a native format document for a dedicated reader from less than perfect sources. Your tool does well with adapting a scanned PDF document for a dedicated reader. This might be a logical further step along the path.
Dave
|
I have thought about other output formats off and on, but there are several other tools that will do that kind of thing (calibre, for one), and probably better than I could, so I'm trying to keep k2pdfopt focused on its core functionality. You could use k2pdfopt (or any number of other tools) to OCR the text in a file and then do the conversion in calibre after the OCR is created. This is fraught with errors, though, and inevitably requires a lot of hand editing. See some of the threads listed in my
pdf conversion tips.