MobileRead Forums - View Single Post

compurandom · 11-02-2020, 07:16 AM

Quote:

Originally Posted by davidfor

Separating the page and word count for PDF might make sense.

I have planned to do a pdf metadata plugin that would extract everything it could find. Page count is on the list, and is trivial. But I don't know when I'll get around to it -- maybe end of next month.

Quote:

How often does it actually fail.

I've had two or three PDFs fail, a couple of markdowns fail (!!) and one or two extremely large epubs with lots of pictures fail.

For the pdfs -- I wonder if it would be possible to modify it to extract the text directly with a pdf tool instead of using pdf2html.

edit: And yes, at least some of those really did fail. One created >5G of images from a <20M pdf and filled up /tmp.