Quote:
Originally Posted by exaltedwombat
@Tex2002ans, Thanks for the response.
Yes, I have Toxaris's EPUB Tools. I'm afraid the Postprocess OCR function adds a lot of spurious scenebreaks. (And then, with this 500 page book, attempting to Generate EPUB fails with an 'out of memory' error on this powerful PC with 24GB RAM, but that's another problem.) Calibre, with Heuristic Processing turned on, does a rather better job, but there will still be a dozen false paragraph breaks in each chapter needing manual intervention.
The point is, the export from PDF to docx retains ALL the paragraph indentations. If they could only be marked in some way...?
|
I am late to the party, sorry about that. If you run out of memory, the Word document is not correct or you ran into the issue Word 365 had recently. That the document is 500 pages is not relevant, not even on 1GB (my development machine has 1 GB memory).
Also, I *never* use the export function for PDF. I rather re-OCR them. The results are much better than that. It was also mentioned that you can fine tune the scene detection.