MobileRead Forums - View Single Post

lumpynose · 06-07-2019, 07:58 PM

Quote:

Originally Posted by exaltedwombat

I'm cleaning up a conversion from PDF. The code's pretty clean, but I'm using Ctrl-Space a lot to strip unwanted paragraph breaks, Ctrl-C/Ctrl-V to add missing ones, Clips to surround a paragraph with formatting code... Everything except actually inputting text really!

That's similar to what I'm doing. I'm working with PDF scans of old magazines from archive.org where they contain the OCR'd text (SumatraPDF will save just the text with Save As).

Sometimes they have an empty line between paragraphs, sometimes not. But it starts as a big hunk of text between the body and /body tags; no html tags. I thought body didn't allow text directly, that it needs to be in a p, div, etc. although epubcheck doesn't complain.