View Single Post
Old 12-29-2018, 03:34 PM   #26
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
Quote:
Originally Posted by exaltedwombat View Post
@Tex2002ans, Thanks for the response.

Yes, I have Toxaris's EPUB Tools. I'm afraid the Postprocess OCR function adds a lot of spurious scenebreaks. (And then, with this 500 page book, attempting to Generate EPUB fails with an 'out of memory' error on this powerful PC with 24GB RAM, but that's another problem.) Calibre, with Heuristic Processing turned on, does a rather better job, but there will still be a dozen false paragraph breaks in each chapter needing manual intervention.

The point is, the export from PDF to docx retains ALL the paragraph indentations. If they could only be marked in some way...?
I am late to the party, sorry about that. If you run out of memory, the Word document is not correct or you ran into the issue Word 365 had recently. That the document is 500 pages is not relevant, not even on 1GB (my development machine has 1 GB memory).

Also, I *never* use the export function for PDF. I rather re-OCR them. The results are much better than that. It was also mentioned that you can fine tune the scene detection.
Toxaris is offline   Reply With Quote