That logic is basically what I'm referring to, it's just with a single regex replacement in Python. That is then combined with a median line length calculation so that only lines approaching the document median have the regex applied (an extra safety to prevent short lines without punctuation from being wrapped). This is what we're doing already for PDF post processing.
There will always be docs that this won't work for, but I think that we can handle the majority of the cases where a user needs to hand edit to fix this sort of thing.
|