MobileRead Forums - View Single Post

Hitch · 12-17-2018, 04:30 PM

Quote:

Originally Posted by elibrarian

I only work in clean text/xhtml - not Word. That's probably the only difference. Every other piece of software I've ever used to get textlayers from pdfs have exported each and every space and linebreak for every (EVERY!) single line - except FlexiPDF.

That said, I just tried to export the textlayer from an OCR'ed pdf I've got from the Royal Library of Copenhagen, to Word, and using the standard settings, I'll admit it stinks. But if you press "Format" on the right bottom of the export dialogue, you can alter the standard settings. I removed everything, except "Text Output" and "De-hyphenate" for the Word-export, and got a nice Word-doc with none of the issues, you mention. Not perfect (because the OCR from the Royal Library is not perfect), but very, very usable.

You'll probably have to fiddle with the settings to get exactly what you want, but I think you might be a little too fast condemning FlexiPDF.

Regards,

Kim

Well, yes, exporting to plain text would certainly make things less complicated, but even the HTML export I tried was lame-ish. I shall try it again, to see what I get. I certainly don't want to send it to the Guillotine prematurely, but...we'll see!

Hitch