Thanks much for the infos about layers.
Quote:
Originally Posted by Tex2002ans
Didn't you already say in Post #28 that you ran this PDF through Finereader? Finereader should have carried over italics and other formatting for you.
|
Because FineReader did
not carry formatting, I wanted to try other tools, especially since the PDF contained two layers, so it made sense to extract the "text" layer and see how it compared with running the PDF through FineReader.
Turns out it's still a bit of work to…
- Re-add formatting (bold, italics, etc.)
- Some hyphenated words weren't corrected by FineReader (but much better than starting from raw text from pdttotext, since FineReader uses a dictionary to fix most of those)
- Re-add footnotes
- Takes pictures of tables and… pictures, and insert them
- Build a ToC