My interest is just getting better reflowable paragraphs on fiction. I tried cxpdfhtml.py on a novel and was surprised at how well the "break on short lines" approach worked, although I haven't read in depth to find the not-short-enough lines.
I was wondering if you are considering (or anyone else has implemented) detection of paragraphs based on indentation?
|