MobileRead Forums - View Single Post - k2pdfopt: optimizes PDFs for viewing on e-readers

willus · 02-21-2023, 05:15 PM

The information I get about a PDF file from the muPDF library is simply a list of characters and what their X,Y positions on the page are. There is no information to indicate either new-line or a paragraph--you have to infer this solely from the character positions, so I'd have to deduce from the line spacings or indentation if a paragraph was indended, which will be quite error prone. It's probably easier to hand edit the unicode text out of k2pdfopt (-ocrout option).

02-21-2023, 05:15 PM	#2013
willus Fuzzball, the purple cat Posts: 1,313 Karma: 11087488 Join Date: Jun 2011 Location: California Device: iPad	The information I get about a PDF file from the muPDF library is simply a list of characters and what their X,Y positions on the page are. There is no information to indicate either new-line or a paragraph--you have to infer this solely from the character positions, so I'd have to deduce from the line spacings or indentation if a paragraph was indended, which will be quite error prone. It's probably easier to hand edit the unicode text out of k2pdfopt (-ocrout option).