View Single Post
Old 11-02-2020, 07:16 AM   #1396
compurandom
Guru
compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.compurandom ought to be getting tired of karma fortunes by now.
 
Posts: 919
Karma: 417282
Join Date: Jun 2015
Device: kobo aura h2o, kobo forma
Quote:
Originally Posted by davidfor View Post
Separating the page and word count for PDF might make sense.
I have planned to do a pdf metadata plugin that would extract everything it could find. Page count is on the list, and is trivial. But I don't know when I'll get around to it -- maybe end of next month.

Quote:
How often does it actually fail.
I've had two or three PDFs fail, a couple of markdowns fail (!!) and one or two extremely large epubs with lots of pictures fail.

For the pdfs -- I wonder if it would be possible to modify it to extract the text directly with a pdf tool instead of using pdf2html.


edit: And yes, at least some of those really did fail. One created >5G of images from a <20M pdf and filled up /tmp.

Last edited by compurandom; 11-02-2020 at 07:19 AM.
compurandom is offline   Reply With Quote