View Single Post
Old 06-07-2019, 07:58 PM   #22
lumpynose
Wizard
lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.lumpynose ought to be getting tired of karma fortunes by now.
 
Posts: 1,086
Karma: 6719822
Join Date: Jul 2012
Device: Palm Pilot M105
Quote:
Originally Posted by exaltedwombat View Post
I'm cleaning up a conversion from PDF. The code's pretty clean, but I'm using Ctrl-Space a lot to strip unwanted paragraph breaks, Ctrl-C/Ctrl-V to add missing ones, Clips to surround a paragraph with formatting code... Everything except actually inputting text really!
That's similar to what I'm doing. I'm working with PDF scans of old magazines from archive.org where they contain the OCR'd text (SumatraPDF will save just the text with Save As).

Sometimes they have an empty line between paragraphs, sometimes not. But it starts as a big hunk of text between the body and /body tags; no html tags. I thought body didn't allow text directly, that it needs to be in a p, div, etc. although epubcheck doesn't complain.
lumpynose is offline   Reply With Quote