You can add duplicated caption text to the list of possible errors.
Some people use Mobipocket Creator and feed its html output into Calibre or Sigil.
Whatever you do, plan on some work.
BTW these only work if the PDFs have actual text and are not just containers for scans of the page. PDFs containing just images will have to be OCRed and the resulting product, often a mess, cleaned up. Then you get to appreciate that a 2% error rate means an error on every page times the number of pages to correct.
|