View Single Post
Old 02-21-2023, 04:15 PM   #2013
willus
Fuzzball, the purple cat
willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.willus ought to be getting tired of karma fortunes by now.
 
willus's Avatar
 
Posts: 1,303
Karma: 11087488
Join Date: Jun 2011
Location: California
Device: iPad
The information I get about a PDF file from the muPDF library is simply a list of characters and what their X,Y positions on the page are. There is no information to indicate either new-line or a paragraph--you have to infer this solely from the character positions, so I'd have to deduce from the line spacings or indentation if a paragraph was indended, which will be quite error prone. It's probably easier to hand edit the unicode text out of k2pdfopt (-ocrout option).
willus is offline   Reply With Quote