View Single Post
Old 08-24-2016, 07:20 PM   #5
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 21,825
Karma: 30277270
Join Date: Mar 2012
Location: Sydney Australia
Device: none
dwig and eschwartz

If you have Abby Fine Reader and Word 2007 or higher, you could use Toxaris' e-Book Tools add-in for Word which he has built specifically to handle the various glitches introduced by OCR conversion etc. And which he continues to enhance

I convert dozens of public domain PDF documents (not books as such) a week. In most cases I can decide after looking at a PDF whether it's worth my time converting and editing it.

Because most of the PDFs are recently created I've found Abby Fine Reader doesn't 'buy me much'. So my workflow for converting PDF's is:
  • convert the PDF into PRC using Mobi Creator;
  • convert the PRC to RTF with calibre;
  • apply one of three relatively simple Word Templates to the RTF;
  • use Epub-Tools functions, VBA macros etc to knock the RTF into shape
  • save the document as DOCX;
  • convert the DOCX to ePUB with calibre, or occasionally import it into the calibre editor.
The workflow is optimised to be time efficient for me. I don't enjoy fiddling with dozens of obscure OCR and conversion settings, I don't seek perfect conversions (I doubt such things exist in any sphere), and I don't worship finely crafted optimal markup. That said, the resultant code is pretty clean; I avoid fancy typography, and I use Word styles almost exclusively. I rarely need to edit the final ePub code.

The ePubs I create are not published in the public domain, I make them available to a few colleagues, and they do the same for me and the others. They have their own workflows optimised to their peccadilloes

BR

Last edited by BetterRed; 08-24-2016 at 07:22 PM.
BetterRed is offline   Reply With Quote