Best tools for editing PDFs prior to conversion?
I'm ridding myself of physical books. This has involved having a few books scanned that I'm unable to find in electronic form.
My scanned books are supplied as PDFs. The visible pages are images, but they have backing text. When I view the PDFs in Moon+ Reader Pro the Text to Speech works, though with page numbers and titles included. Visually, though, they're a mess, as PDFs always are.
If I use Calibre to convert to EPUBs the result is visually much cleaner, but Text to Speech doesn't work at all. If I unzip the EPUB file too look inside it's clear why - there is no text, only a collection of images.
Most EPUBs seem to contain a collection of HTML files, and I've used ordinary text editors to clean them up, on occasion.
I'm wondering if there are any tools that would allow me to extract the text from a PDF file in a usable format. If I just had a text file containing the text that I could clean up prior to conversion that would be ideal.
Thoughts?
|